Just Speak Naturally

What's That You Said?
If you've dialed directory assistance lately, chances are you have connected to the future of communications. An automatic system on the other end would have asked you to pronounce the name of a person or business, and within seconds it would have responded with the number you were seeking.

This exchange of information sounds simple, but it's actually the complex product of more than 30 years of research in statistics, physics, linguistics, and computer science. What's so complicated about asking for a phone number, and what does it mean for your future?

The idea of talking to machines isn't new. Characters in science fiction stories have conversed with robots and computers for a long time. You may have shared a few words yourself with a computer, car, or cellular phone,especially when they were not working as you expected.

Word Power

If you dictated 130 words per minute instead of typing 50 wpm, how much time would you save on a 650-word paper?

What percentage of time would you be saving?

But nowadays, these machines may understand what you are saying and can respond, all thanks to recent breakthroughs in the field of speech recognition. As a result, you can do the following and more:

  • Ask your car for directions
  • Dial your mobile telephone without touching it
  • Dictate a term ape instead of typing it on a computer keyboard

Speakingcomputer

You can hear examples of speech recognition in action. You'll need to use the Real Player plug-in to play these excerpts:

You Make the Call
If there were speech recognition systems in the following locations, list the different tasks you could accomplish simply by saying what you wanted.

Location

Tasks You Could Accomplish by Speaking

Inside your car

Inside your home

To your computer

Top

Recognizing Speech
Speech recognition systems first break down spoken language into phonemes, or the individual sounds of words, such as the ones below:

/w/ as in "we" "quite" "once"   /ch/ as in "much" "nature" "match"
/ou/ as in "no" "boat" "low"   /au/ as in "haul" "bought" "draw"

Before the system can recognize what you are saying, it converts the individual sounds into digitized sound waves, which it matches to a built-in dictionary. With almost 40 phonemes in the English language, there are millions of possibilities of how these sounds could be combined.

The title of this story looks different in the form of a sound wave. The wave takes its shape from the spoken words. Hear and see this sound wave in action by clicking the image below. You'll need the free Flash plug-in.

The speech recognition system figures out the correct choice through a series of algorithms, or mathematical models, that help narrow down the possibilities to ones that make the most sense. These algorithms also take grammar into account: For instance, if you say "I am going to the beach," the system will know that the subject "I" will take the verb "am" rather than "are" and that the preposition "to" will likely be used if you are "going"somewhere.

A Slow Start
More than 20 years ago, the first video games allowed users to hit a dot back and forth across an otherwise empty screen. Tell that to Super Mario today.

The same comparison holds for the development of the speech recognition field. Only three years ago,anyone dictating to a computer had to speak s-l-o-w-l-y and in short phrases punctuated by long pauses. The results that appeared on screen were often more comical than accurate. Were you saying, "I scream" or "ice cream," for example?

Other speech recognition programs—such as the ones a telephone user might encounter—were famous for their limited options: "Press or say '1' "or "Say 'yes' if you wish to continue."

Quantum Leaps
The latest advances in speech recognition can be summed up in two phrases: natural language—which allows you to speak at normal speed without pausing frequently—and generalized language—which recognizes the words of many different speakers.

HardwareStefan Bohrer has spent the past years on the cutting edge of speech recognition. He works in Cambridge, Massachusetts, for Philips Speech Processing, one of the world leaders in speech recognition technology. Bohrer explains that advanced algorithms and powerful computers that quickly process millions of instructions, have made speech recognition more immediate and accurate.

"If you have an algorithm that takes one minute to recognize a sentence that is five seconds long, it's not that useful. You need to recognize words in real time," he says. And he notes that new algorithms are able to better recognize words, grammar, and the beginnings and endings of sentences.

He adds that speech recognition systems can now handle thousands of words from speakers of different accents, genders, and ages. When these people say the same word, the way it sounds will differ. The system uses millions of additional instructions to recognize these differences.

"In the past, you had simple applications that understood just a few words. Now you can say, 'I'm interested in the weather tomorrow,' " Bohrer explains. "To make a speech recognition system speaker-independent for thousands of words, the system must learn huge amounts of data. There are thousands of variations repeated for each phoneme, so the system will work whether you have a New Orleans or a Boston accent."

One speech recognition company has developed a program that recognizes the voice and speech patterns of computer users ages 11 and up. Way cool!

Brave New World
It might take a computer to calculate the changes that will come from improved speech recognition. Dictating to a computer instead of using the keyboard can improve the lives of people with special needs, including users who are blind or do not have the use of their hands.

Try It Yourself

If a major airline hired you to design a speech recognition system that passengers could reach by telephone:

  • What kinds of words would you train the system to recognize?
  • What are some grammatical phrases you might expect to hear from callers?
  • In what ways would you expect callers to use numbers?

Software that automatically translates one spoken language into another will help bridge different cultures. Other programs understand the complexities of medical and legal terminology.

Bohrer says that over the next five years, we'll start giving commands to handheld personal organizers or cellular phones, especially since these devices will connect to the Internet.

But he also warns about the present limits to this emerging technology. So far, he says, machines have needed to know the context of what the speaker is talking about. If you are calling an automated weather or stock quote service, the software on the other end is prepared to talk weather and stocks and not much else.

"Whenever you have automatic systems, you can always fool them," Bohrer observes. "Don't expect them to be have like human beings or to use real intelligence. A system set up for travel information expects you to say 'I would like to travel on Monday from Zurich to Vienna.' If you say 'Please order me a pizza,' there will be funny things coming out."

Learn More
Look at these Riverdeep Physics Explorer activities to learn more about the behavior of waves:

Related Resources
Get more information on the frontiers of speech recognition from three pioneers in the field:

Return to Top