top of page
Writer's pictureChristopher McHale

Unpacking the Impact of Speech-to-Speech AI Technology on the VoiceOver Community


Two sound waves with the text old woman to young girl

Back in the old days, when sampling arrived on the music scene, we figured it was the end of violin players. It wasnt.


But it changed the music making game. it feels the same here. Speech-to-speech. What does it mean?


The speech-to-speech process refers to the conversion of spoken language into digital signals that can then be processed, understood, and converted back into spoken language. This involves several key steps, each leveraging advanced technologies and algorithms. Here’s a simplified description of the process:


1. Speech Capture: The process begins with the capture of spoken words through a microphone. This device converts the sound waves of speech into analog electrical signals.


2. Analog-to-Digital Conversion (ADC): These analog signals are then converted into digital data through an analog-to-digital converter. This digital data represents the speech in a form that computers can process.


3. Pre-processing: The digital speech data often undergoes pre-processing to improve its quality. This can include noise reduction to eliminate background sounds, normalization to adjust volume levels, and filtering to remove frequencies that are not relevant to human speech.


4. Speech Recognition: The pre-processed digital data is then fed into a speech recognition system. This system uses algorithms and machine learning models to analyze the speech and convert it into text. It involves understanding the phonetics and linguistics of the spoken language to accurately transcribe the words.


5. Natural Language Understanding (NLU): If the speech-to-speech system is designed to understand and respond to queries or commands (as in virtual assistants), the text is then processed by natural language understanding algorithms. NLU interprets the meaning of the text, recognizing the intent behind the spoken words.


6. Response Generation: Based on the understanding of the query or command, the system then generates a response. This response could be a direct answer, an action (like playing a song or setting a reminder), or a synthesized speech response.


7. Text-to-Speech (TTS) Synthesis: If the response is in the form of speech, a text-to-speech synthesis system converts the textual response back into speech. This involves generating artificial speech that mimics human speech patterns, including intonation and rhythm, to produce a natural-sounding voice output.


8. Digital-to-Analog Conversion (DAC): The digital speech output is then converted back into analog signals using a digital-to-analog converter.


9. Sound Generation: Finally, the analog signals are emitted through a speaker, producing the sound waves that can be heard as speech by the human ear.


Throughout this process, sophisticated technologies and models, such as deep learning and neural networks, are employed to handle the complexities of human language and speech. These technologies enable the accurate recognition of spoken words and the generation of coherent and contextually appropriate responses.


How does AI affect the VoiceOver community?


I read a script into a recording--pace, inflection, humanized the way my writer's ear hears it-- and then I can change my voice into any voice I choose.


Does something like this affect the talented voiceover community? Talent wins every time, so no, at the top levels of this industry.


I see the shake out of this being very much like what happened in the music world. The best violin players never lost a job.


However, let's be honest. This technology is amazing, and it's very much developing. We work in a competitive marketplace, and it's not ethics driving the situation, it's the bottom line, and who gets there quickest with the best result. Marketplace pressures are driving the bus, and no amount of wishful thinking will change things.


It's never been easy to make a living as a VoiceOver, and do not believe anybody who says it is. The best of the best will always have work, so make sure you measure yourself against the best. Talent at the highest level is a gift. Hard work makes the difference, but hard work over talent is the winning formula in an AI-infused world.



8 views0 comments

Comments


bottom of page