This podcast is a ghost - my venture into generative AI

All my friends and colleagues know by now that I have a bit of a tech crush on AI, especially when it comes to its potential in humanitarian communications. Lately, I’ve been toying around with the idea of branching out from the written word. Podcasts, my friends, have been on my mind. They've become such a buzzing hive of knowledge these days, and I was itching to see how AI could fit into my work.

Here’s the thing: one of the key challenges in our field is the ability to communicate complex narratives effectively. Sure, we have reports, surveys, and even personal accounts that help shed light on the pressing issues. But let's face it, how many of us are truly captivated by a 100-page PDF, no matter how important the content might be? Hearing stories from the main characters themselves gives a personal touch that keeps audiences engaged.

Enter podcasts. They're like the cool older cousin of traditional reports - engaging, accessible, and easy to consume. The main problem is, they’re usually expensive to produce. You’d need to hire someone to write the script, equipment to record and somebody to operate it, and a host to narrate the stories. But with AI, you’ve got easy (and nearly free) access to all these services; suddenly podcasts are much more attainable. I could have my own corner on Spotify too, and a trio of AI tools promised to provide this opportunity without burning a hole in my pocket.

Preparation

Kick-starting this adventure is our trusty and beloved ChatGPT, a language model that can easily whip up compelling narratives, mimicking the skills of an experienced writer in a fraction of the time. But as impressive as its capabilities are, ChatGPT isn’t infallible. To a discerning eye, the AI's unique lexicon and sentence structure can be relatively easy to detect. It’s a minute detail but one that needs to be taken into account when aiming for authenticity.

In the world of generative AI, effective prompts are a skill you need to master - and one I consider myself quite good at.

Next up is Midjourney, our visual wizard that’s been getting better and better throughout the past year. Providing the faces of the people to accompany our AI-generated scripts, Midjourney generated unique, compelling images that can truly bring a story to life. But, much like a temperamental artist, it has its quirks. Attention to detail is essential, as faces and hands can occasionally venture into the uncanny valley territory, if not crafted with care. It's a challenging feat, but for an innovative project like this, Midjourney's potential is undeniable.

The real game-changer, however, is Play.ht. This AI voice generation tool is a genuine revelation. Not only is it incredibly user-friendly, but its output is uncannily natural. Far from the monotonous drone of archetypical robot voices, the narrations crafted by Play.ht possess an impressive degree of emotional resonance. They exhibit a nuance and fluidity that, to an unsuspecting ear, could easily pass off as human. I could choose among over 30 male and female voices with different ages, speaking styles and speeds. I could even create a custom AI voice based on my own, but I decided against it - nobody wants to hear my weird, unplaceably European accent.

Production

Producing a podcast episode from scratch was an education, to say the least. For a single, seven-minute episode, I put in around an hour of work. Now, that might seem like a lot, but let's break down what that time included: adding the script, selecting voices, and reviewing the output—all child's play, really. The real work came in the finer details, like refining the narration, exporting the episode, and choosing just the right sounds and music to create an immersive auditory experience.

I found the voices generated by Play.ht to be stunningly natural, almost unsettlingly so - the speakers even took breaths in! It was fascinating to explore the intonations and inflections of these AI-generated narrators. Still, the occasional misplaced excitement when discussing serious topics like impoverished children or derelict refugee camps called for a few re-generations of the audio.

It took 8 takes to get that intro sounding right! But with Play.ht, it was only a button press (or 8) away.

The main limitation, I found, was in the diversity of the accents available for the AI voices. Currently, we are limited to the good old British, American, and Canadian accents. It's not ideal for depicting the diverse characters that often make up the heart and soul of humanitarian stories. But I'm not losing sleep over it - it's only a matter of time before these platforms broaden their linguistic horizons. And let's face it, it does add a sense of charm to the podcast that betrays its AI origins. Plus, I suppose it's not all that different from the necessity of dubbing or translating in traditional podcast production.

Along the way, there were some interesting hiccups to navigate. For instance, our robot narrator had some difficulty pronouncing UNESCO, but a little bit of creative spelling (cue "YouNESCO") did the trick.

To add a final touch of flair to the episodes, I used VEED.io, a free and remarkably easy-to-use tool, to weave in music and ambient sounds. It's amazing how these audio layers can transport the listener right into the heart of the narrative.

By the time I got to producing the third episode, I had gathered quite the bag of tricks. I honed my skills in finessing pauses, tweaking pronunciations, and even orchestrating ad breaks. One of my favourite moves? Downloading audio clips of Bengali or Syrian speakers and using our lovely AI with a British accent to “translate” what they were saying. The learning curve was steep but rewarding, and every adjustment made the podcast that much more believable.

Promotion

So, once I had a shiny, new podcast ready to share with the world, I focused my efforts on getting the word out. It was time to dive into promotion.

For once, I decided to roll up my sleeves and design the promotional materials myself. I jumped onto Canva, and with the help of Midjourney-generated images, I whipped up some aesthetically pleasing visuals. I kept the design simple, opting for a template that could easily be customized for each episode.

The Twitter header image was just a simple title, an icon, and some very real-looking AI-generated humans.

Next, I turned to my trusted AI sidekick, ChatGPT, to draft promotional posts for Twitter. I know that it would have probably been better to use at least one other platform, but Instagram literally asked me to take a selfie with some random numbers in order to create a new account and I felt like that was too much to do for a fake podcast, and also - honestly - beneath me. So, I decided to focus on just one outlet.

Now, here's where I veered off the straight and narrow. The ethical implications of this are murky, but let's be real: in the vast ocean of social media, a little credibility can go a long way. When I check out a new podcast, a decent follower count catches my eye. It's a sort of social proof, a tacit endorsement from fellow users. So, I invested $16 in buying a starting point of 250 followers. Call it a push-start for my social media presence. I then started replying to all kinds of humanitarian accounts - UN agencies, IFRC, EU institutions, Amnesty International, etc. - in order to get the podcast’s name out there.

I focused on accounts I would have liked to be followed by - with ChatGPT-generated replies and quote tweets, I could approach this tactic a lot more effectively and send out more messages every day.

Conclusion

So here we are, three episodes deep into the phantom podcast experiment, and I’ve got to tell you - while the process has been automated to a great degree, I don’t think it’s quite time to hang up my boots and leave it all to AI just yet. Sure, the AI did the heavy lifting, but it was up to me to make the key decisions to ensure the final product was polished and engaging.

There's something genuinely educational about diving head-first into unfamiliar waters. I've learned more about podcast creation, production, and promotion over the last few weeks than I ever thought I would. It was a crash course in crafting narratives for audio, understanding the mechanics of voiceover nuances, designing appealing promotional materials, and even a sneak peek into the psychology of social media following. Not to mention, I’m now fully capable of saying UNESCO in five different AI languages (a skill I didn’t know I needed, but here we are).

What really struck me, however, was how AI - an entity of zero emotion or personal judgement - could help amplify human stories in a way that was both efficient and effective. There's so much potential in using generative AI for humanitarian communications. It can help organizations create a variety of content quickly and on a budget, reaching broader audiences and thereby increasing their impact. It’s like having an army of very capable interns who don’t need sleep or coffee breaks.

Looking forward, I can see how the next few years will see an increase in the use of AI tools in the humanitarian communications sector. We're just scratching the surface of its potential, and there's an exciting journey ahead. And as for my little ghostly podcast? Well, seven more episodes are on the cards to complete the series. From there, who knows? Maybe I’ll venture into AI-generated vlogs next. Stay tuned, because one thing is for sure - the road to telling human stories is changing, and we're not putting the brakes on anytime soon.

Previous
Previous

Empathy isn't zero-sum: takeaways from two tragedies at sea

Next
Next

Rebuilding Sri Lanka's tourism in the wake of adversity