Understanding Hawking

The Truth About... Speech Synthesizers

Charles Arthur
Thursday 25 June 1998 18:02 EDT
Comments

Your support helps us to tell the story

From reproductive rights to climate change to Big Tech, The Independent is on the ground when the story is developing. Whether it's investigating the financials of Elon Musk's pro-Trump PAC or producing our latest documentary, 'The A Word', which shines a light on the American women fighting for reproductive rights, we know how important it is to parse out the facts from the messaging.

At such a critical moment in US history, we need reporters on the ground. Your donation allows us to keep sending journalists to speak to both sides of the story.

The Independent is trusted by Americans across the entire political spectrum. And unlike many other quality news outlets, we choose not to lock Americans out of our reporting and analysis with paywalls. We believe quality journalism should be available to everyone, paid for by those who can afford it.

Your support makes all the difference.

STEPHEN HAWKING, it is reported, is considering replacing his "android" voice synthesizer with one made by British Telecom which offers an English accent. It is a sign of how quickly computers are moving that such a change seems overdue. But synthesizing speech entirely through a computer (rather than, as railway timetables do, generating sentences by stringing together pre-recorded individual words) is not a new phenomenon.

The first attempts were made at the laboratories of Bell, the telephone company. In 1936, a Bell Labs scientist, H W Dudley, invented the world's first electronic speech synthesizer: it required an operator with a keyboard and foot pedals to supply "prosody" - the pitch, timing, and intensity of speech. Dudley called his device the "voice coder", though it quickly became known simply as "Voder", and it proved a hit at the New York and San Francisco World's Fairs of 1939.

The problem was the human interaction required. Ideally, one would just give the machine (nowadays, computer) a stream of text which it would render into speech.

Generating sounds is not a problem for computers. Synthesizers have changed the face of popular music. By powering a speaker with a stream of electronic pulses of varying amplitude, they can mimic all sorts of instruments. Generating a human voice is the same task - but language adds complexities of pronunciation and, for the computer, comprehension of what it is reading.

Computers typically generate speech using combinations of "phonemes", the individual sounds within words. The word "phoneme" consists of two syllables, but four phonemes, "ph", "o", "nnn" and "eem". English has 43 phonemes in all. Phonemes are easy to digitise, but it turns out that making recognisable speech from them is harder. The "transition" where one phoneme (say, "ph") elides into the next (say, "o") is difficult to do with a computer, and it is actually simpler to digitise the phonemes and their transitions, and split them halfway through each phoneme. This produces about 400 transition-phoneme pieces like Lego bricks, which can be spliced together for seamless speech. Add the phonemes that start words, and you can produce any word from that library.

An accent is produced by variations in the phonemes and transitions, both in their pitch and speed: the American "tomayto" and the English "tomahto" are one example.

All that is the easy part, though. Turning text into speech also requires analysis of the sentence being spoken, or meaning can be lost: "I'm so pleased to see you" could be read many ways, depending on whether the speaker is so pleased, pleased to see, or see you. Incorporating inflection, pauses and emphasis into computer-generated speech remains the big problem, which scientists are still struggling to overcome.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in