Bold Business Logo

New Voices from Text-to-Speech Tool for DIY Voiceovers

Cartoon of people at microphones doing text to speech for video voice over.

People love the various characters in the animation world. We fell in love with the cute characters of the Disney movies or our childhood favorite Sesame Street and TV cartoon series. The Lion King fascinated both kids and adults with the talking animals but nobody paid attention on the process of creating these characters and making them talk.

This bold idea and new technology will have many applications, from cartoon heroes to game characters. The developers will have only to choose a standard voice and manipulate it using the sliders to shape a new voice persona.

Creating voiceovers for these characters of animated movies takes a lot of effort and talent. The producers need to audition voice actors who will give life to the characters. This process alone takes so much time because the producers need to find the perfect voice that will give justice to the character. The most time-consuming part will be the recording, which will involve the voice actors delivering their lines to perfectly match the action of the characters. Producers have to set aside a big budget for this production aspect alone.

Technology to Mimic the Human Voice

Creating voiceovers minus the voice actors is a new frontier in technology. There have been countless attempts at creating voices from text, but early attempts at this have produced robotic, boring, and monotonous voices. Later version of the technology produced natural-sounding voiceover from any text input that was no longer robotic but sill boring and monotonous. The developers added pauses and accents for a real professional voiceover, along with grammar-based guidelines that used special punctuation in the text. The complaint of users was that the speech voices had bad articulation, putting accents on the wrong syllables and words.

 

A team at IBM Research-Haifa worked to improve the technology of Watson Text to Speech in creating customizable voices. The vision is to create new, distinct, and expressive voices in an automated voice creation process that is fast and flexible. Watson itself is already capable of speaking in 9 languages as of 2016.

The team’s vision has been realized in a cooperative endeavor between the IBM Research Education team and Sesame Street. The team took part in an IBM-Sesame Street pilot held in April to May 2017 at the Gwinnett Country Public Schools in Georgia. It used IBM’s Watson Education technology and content from Sesame Workshop in the classroom for the first time, with an app for learning new vocabulary. Synthesized voices for the new Sesame characters helped create new voices that kids would love, the way they loved the familiar characters of Ernie, Elmo, and Big Bird.

The IBM Virtual Voice Creator is a tool that uses three text standard voices used in the text-to-speech technology used at Watson Developer Cloud TTC service using American English. The tool can transform the standard voices into new voices by changing the different parameters.

The IBM Virtual Voice Creator works like a mixing console used by sound engineers, but this one is for voice manipulation. Adjusting the sliders will control and change each vocal aspect, including the pitch, speed, breathiness, and timber. There is no limit on the number of possible combinations. Playing around with the controls will result in the creation of new voices that express different emotions – from happy to sad, and so on.

This bold idea and new technology will have many applications, from cartoon heroes to game characters. The developers will have only to choose a standard voice and manipulate it using the sliders to shape a new voice persona. The application reads text to achieve the audio input without having to use voice actors and the time-consuming recording at the studios.

Don't miss out!

The Bold Wire delivers our latest global news, exclusive top stories, career
opportunities and more.

Thank you for subscribing!