Have you ever imagined how your favorite podcast might sound in Mandarin or Spanish? Spotify is now developing a new AI-powered function that will do that. Today (September 25), a new tool called Voice Translation will go live. It will let you listen to some podcast episodes in foreign languages in the speaker’s actual voice—or at least a convincing imitation of it.
The technology, created by Spotify with assistance from OpenAI’s Whisper automated speech recognition (ASR) system, converts audio recordings into text using a speech-to-text generative AI model and mimics the voice of the original speaker using a voice replication model.
The first presenters to participate in this new segment are Steven Bartlett, Bill Simmons, Monica Padman, Lex Fridman, and Dax Shepard. However, not all of the episodes of their podcasts will be available immediately in different languages. Instead, keep an eye out for the Spanish-language versions of the episodes “Interview with Yuval Noah Harari” on the Lex Fridman Podcast, “Kristen Bell, by God’s Grace, Returns” on Armchair Expert, and “Interview with Dr. Mindy Pelz” on The Diary of a CEO with Steven Bartlett.
According to Spotify, further episodes will be made accessible over the next several days and weeks, with French and German translations following soon after. With additional voice-translated episodes planned to be posted to a specific Voice Translations hub, you can locate these in the Now Playing View on your mobile or desktop app.
Following the buzz surrounding OpenAI’s ChatGPT, the top music streaming platforms were ready to join the generative AI gold rush. While machine learning has been used to better propose new songs based on patterns and trends in your listening behavior (think of your Discover Weekly playlist), there have been a few new applications for the technology.
One of them is Spotify’s AI DJ, which suggests new songs using an AI-generated voice. In addition, there are innumerable music generators, including ones from Meta and Google, as well as Universal Music’s partnership with Endel to use AI to create background audio like forest noises and running water. However, the notion of employing generative AI to create podcasts from scratch was unquestionably the scariest.
The Joe Rogan AI Experience and the Hackers News Recap, to name a couple, are two generative AI podcasts that have emerged as a result of research in the field. The largest criticism of these, apart from issues with copyright and privacy, was the absence of a dynamic discussion, which is the foundation of the finest podcasts.
It’s probably the reason they didn’t get on, but the notion of using generative AI to translate podcasts is exactly the kind of use case I can support. Assuming that the pace and vibrancy of dialogue can be translated, machine learning can be used to make fascinating shows more accessible to a wider audience. Now I have to track down and download all the podcasts in other languages that I have been missing.