It's How You Say It That Counts

Designing good speech recognition software is no easy task. Nuance Communications should know. The company has succeeded in creating a product that amazes most users: Apple’s Siri. But this product, like the human speech it’s programmed to recognize, is imperfect. Most owners of an iPhone can share stories about Siri’s often humorous misunderstandings of their voices. In spite of Siri imperfections, Nuance Communications have created a product that demonstrates the remarkable potential of speech recognition technology, and the progress this technology has made in a relatively short time.

The next step in the evolution of this technology will involve greater recognition of words and phrases spoken in tonal languages. In a tonal language such as Mandarin Chinese, there are far more sounds that the speech recognition software has to match, because the same sound can have many different meanings depending on the speaker’s pitch. The words for mother, scold, and horse, for example, all sound like “ma,” but with different intonation. Developing speech recognition software that can understand the sentence “Mother scolds the horse” in Mandarin (sample audio) is no easy task.

Most of the world’s languages are tonal. English, by comparison, is far less challenging with respect to the number of sounds that need to be identified and categorized: We have very few words that mean something different when the speaker’s tone changes. True, sarcasm and irony can warp the meaning of words, but, as iPhone users can attest, it may be a while before speech recognition technology becomes sensitive enough to recognize a joke. In the meantime, as the global proliferation of digital technology continues, companies like Nuance Communications have their work cut out for them.