Speech Recognition: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Samuel C. Smith
(Audrey)
imported>Samuel C. Smith
m (footnote)
Line 17: Line 17:
Lastly, consider that not all sounds are meaningful speech.  Regular speech is filled with interjections that do not have meaning:  ''Oh'', ''like'', ''you know'', ''well''.  There are also sounds that are a part of speech that are not considered words:  ''er'', ''um'', ''uh''.  Coughing, sneezing, laughing, sobbing, even hiccupping can be a part of what is spoken.  And the environment adds its own noises; speech recognition is difficult even for humans in noisy places.
Lastly, consider that not all sounds are meaningful speech.  Regular speech is filled with interjections that do not have meaning:  ''Oh'', ''like'', ''you know'', ''well''.  There are also sounds that are a part of speech that are not considered words:  ''er'', ''um'', ''uh''.  Coughing, sneezing, laughing, sobbing, even hiccupping can be a part of what is spoken.  And the environment adds its own noises; speech recognition is difficult even for humans in noisy places.


Despite the manifold difficulties, speech recognition has been attempted for almost as long as there have been digital computers.  As early as 1952, researchers at Bell Labs had developed an Automatic Digit Recognizer, or "Audrey".  Audrey attained an accuracy of 97 to 99 percent if the speaker was male, and if the speaker paused 350 milliseconds between words, and if the speaker limited his vocabulary to the digits from one to nine, plus "oh", and if the machine could be adjusted to the speaker's speech profile.  Results dipped as low as 60 percent if the recognizer was not adjusted.
Despite the manifold difficulties, speech recognition has been attempted for almost as long as there have been digital computers.  As early as 1952, researchers at Bell Labs had developed an Automatic Digit Recognizer, or "Audrey".  Audrey attained an accuracy of 97 to 99 percent if the speaker was male, and if the speaker paused 350 milliseconds between words, and if the speaker limited his vocabulary to the digits from one to nine, plus "oh", and if the machine could be adjusted to the speaker's speech profile.  Results dipped as low as 60 percent if the recognizer was not adjusted.<ref> K.H. Davis, R. Biddulph, S. Balashek: Automatic recognition of spoken digits.  Journal of the Acoustical Society of America.  24, 637-642.</ref>


== Speech Recognition Today ==
== Speech Recognition Today ==

Revision as of 17:00, 7 August 2008

This article is developed but not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Gallery [?]
 
This editable, developed Main Article is subject to a disclaimer.
Nuvola apps kbounce green.png
Nuvola apps kbounce green.png
This article is currently being developed as part of an Eduzendium student project. The project's homepage is at CZ:CIS 700 Special Topics 2008. One of the goals of the course is to provide students with insider experience in collaborative educational projects, and so you are warmly invited to join in here, or to leave comments on the discussion page. However, please refrain from removing this notice.
Besides, many other Eduzendium articles welcome your collaboration!



Speech Recognition is one of the main elements of natural language processing, or computer speech technology. Speech recognition is equivalent to taking dictation: converting speech into comprehensible data. This is a skill that is done seemingly without effort by humans, but requires formidable processing and algorithmic resources from computers.


History of Speech Recognition

Writing systems are ancient, going back as far as the Sumerians of 6,000 years ago. The phonograph, which allowed the analog recording and playback of speech, dates to 1877. Speech recognition had to await the development of computer, however, due to multifarious problems with the recognition of speech.

First, speech is not simply spoken text--in the same way that Miles Davis playing So What can hardly be captured by a note-for-note rendition as sheet music. What humans understand as discrete words with clear boundaries are actually delivered as a continuous stream of sounds. Iwenttothestoreyesterday, rather than I went to the store yesterday. Words can also blend, with Whaddayawa? representing What do you want?

Second, there is no one-to-one correlation between the sounds and letters. In English, there are slightly more than five vowels--a, e, i, o, u, and sometimes y. There are more than twenty different vowel sounds, though, and the exact count can vary. The reverse problem also occurs, where more than one letter can represent a given sound. The letter c can have the same sound as the letter k or as the letter s.

In addition, people who speak the same language do not make the same sounds. There are different dialects--the word 'water' could be pronounced watter, wadder, woader, wattah, and so on. Each person has a distinctive pitch when they speak--men typically having the lowest pitch, women and children have a higher pitch (though there is wide variation and overlap within each group.) Pronunciation is also colored by adjacent sounds, the speed at which the user is talking, and even by the user's health. Consider how pronunciation changes when a person has a cold.

Lastly, consider that not all sounds are meaningful speech. Regular speech is filled with interjections that do not have meaning: Oh, like, you know, well. There are also sounds that are a part of speech that are not considered words: er, um, uh. Coughing, sneezing, laughing, sobbing, even hiccupping can be a part of what is spoken. And the environment adds its own noises; speech recognition is difficult even for humans in noisy places.

Despite the manifold difficulties, speech recognition has been attempted for almost as long as there have been digital computers. As early as 1952, researchers at Bell Labs had developed an Automatic Digit Recognizer, or "Audrey". Audrey attained an accuracy of 97 to 99 percent if the speaker was male, and if the speaker paused 350 milliseconds between words, and if the speaker limited his vocabulary to the digits from one to nine, plus "oh", and if the machine could be adjusted to the speaker's speech profile. Results dipped as low as 60 percent if the recognizer was not adjusted.[1]

Speech Recognition Today

Technology

Business

Major Speech Technology Companies

NICE Systems (NASDAQ: NICE and Tel Aviv: Nice), headquartered in Israel and founded in 1986, specialize in digital recording and archiving technologies. In 2007 they made $523 million in revenue in 2007. For more information visit http://www.nice.com.

Verint Systems Inc.(OTC:VRNT), headquartered in Melville, New York and founded in 1994 self-define themselves as “A leading provider of actionable intelligence solutions for workforce optimization, IP video, communications interception, and public safety.”[2] For more information visit http://verint.com.

Nuance (NASDAQ: NUAN) headquartered in Burlington, develops speech and image technologies for business and customer service uses. For more information visit http://www.nuance.com/.

Vlingo, headquartered in Cambridge, MA, develops speech recognition technology that interfaces with wireless/mobile technologies. Vlingo has recently teamed up with Yahoo! providing the speech recognition technology for Yahoo!’s mobile search service, oneSearch. For more information visit http://vlingo.com

Patent Infringement Lawsuits

Speech Solutions

The Future of Speech Recognition

Emerging Technologies

Future Trends

Notes

  1. K.H. Davis, R. Biddulph, S. Balashek: Automatic recognition of spoken digits. Journal of the Acoustical Society of America. 24, 637-642.
  2. see "About Verint"