In Stanley Kubrick's epic 1968 science fiction film, 2001: A Space Odyssey, one of the most memorable characters of the film isn't a person, but a renegade computer that disobeys orders named "Hal."
For me, Hal's human-like voice was one of the eeriest and most unforgettable parts of the movie. Of course, the voice behind Hal sounded real because it wasn't produced by a computer, but instead belonged to an actor--a human being.
But a new development from researchers at Google's DeepMind unit, which is working to develop super-intelligent computers, promises to bring what once belonged to the realm of science fiction (and Hollywood) closer to reality. Last week, they announced a breakthrough in producing text-to-speech (TTS), or speech synthesis, using artificial intelligence.
In blind tests using samples in North American English and Mandarin Chinese, DeepMind's WaveNet algorithm beat the best systems in use today (some of which were developed by Google as well) by as much as 50 percent. In their blog post, the team at DeepMind share a few short audio samples of text to speech that were produced using WaveNet versus other methods. Here's an audio sample created by WaveNet:
Voice recognition systems have made great strides in recent years, with Apple, Amazon, and Google offering devices and applications that make use of this technology. Mark Bennett, the international director of Google Play, which sells Android apps, told an Android developer conference in London last week that 20 percent of mobile searches using Google are made by voice, not written text.
Though scientists have trained computers to understand the human voice, their ability to train computers to speak like a human has lagged. Computer-generated speech still sounds choppy and robotic, unlike the smooth-talking Hal in Kubrick's film.
While it may not match the quality of the human voice quite yet, DeepMind's WaveNet does represent a major leap forward in the quality of artificially-generated speech. Here are a few highlights of how it works:
It uses AI to predict speech patterns.
Rather than piece together pre-recorded audio samples like traditional TTS systems, WaveNet uses neural networks which imitate brain function. It's the same technology they used to develop AlphaGo, which beat the top-ranked player in the strategy game Go. WaveNet combines what it knows from pre-recorded samples with a predictive modeling algorithm using artificial intelligence to form speech waves.
It uses raw audio samples--and lots of data.
Most TTS systems piece together pre-recorded audio samples to produce speech based on a given text. WaveNet, by contrast, analyzes the raw waveforms of audio signals generated at 16,000 samples per second, and then constructs new waveforms to generate speech. The drawback to this approach, however, is that it requires enormous computational power, meaning we're unlikely to see broad-based applications of this technology soon.
It can even make music.
Using the same algorithm, DeepMind's researchers also trained WaveNet to create improvised piano pieces that sound like a human is at the keyboard.
Once researchers at DeepMind have resolved the technological barriers to bringing their technology into the mainstream, what kinds of applications are we likely to see? And how will this impact the millions of people whose jobs are dependent on the use of their voice?
Voice artists--like the professionals who record animated movies, TV and radio commercials, audiobooks, and podcast intros today--may have good reason to feel concerned. You can also imagine the potential impact this could have on the hundreds of thousands of people who handle customer service inquiries.
And the rest of us?
Let's see where this technology takes us...
What do you think about this new development? What kinds of applications will this enable? How will this impact jobs? Share your thoughts in the comments below.
Thanks for reading! Check out more of my posts about writing, education, technology, and more here on LinkedIn (and be sure to reach out and connect). And listen to in-depth conversations with great writers on my podcast, Write With Impact. I'm also on Twitter @glennleibowitz.
You can’t go past a news paper, radio show or television news story these days without being flooded by all things Bitcoin or Crypto Currency. Some say it’s the new world of money while others suggest its all just a passing fad. Whatever your position or preference of...
This week I announced a suite of measures for the Government to consider when it comes to small and medium sized business and what we can all be doing as we start to look at emerging from the COVID19 lockdown. The reality is that a good number of small business owners,...
As someone who has been working in suicide prevention for some years now i know that often having a simple conversation can make all of the difference when a loved one is doing it tough. COVID 19 and the lock down tends to amplify how we feel when we are isolation or a...
We know that mob out there are uncertain as to what the COVID-19 / Corona Virus means for them – this can cause us all to panic and some in community more so that others. Panic attacks can compound the situation so we gather some information about what you can do now t...
Don’t forget our elders can suffer in silence too: suicide prevention
Many people think that mental health and suicide are not topics that impact our elders but they could not be more wrong. The data tells us there continues to be an emerging trend when it comes to peop...