Automatic speech recognition (ASR) is one of the greatest technical challenges of modern times. It remains the focus of interest of many researchers all around the world for more than half a century.
Like all speech technologies, ASR represents a multidisciplinary problem whose solution requires extensive knowledge in many areas of science and engineering such as acoustics, phonetics and linguistics, mathematics, communications, signal processing and programming. Another difficulty is the fact that the problem in question is extremely language-dependent.
The aim of automatic speech recognition is to analyse recorded speech and to convert utterances (words or sentences) into appropriate text. In this way the system "recognizes" what a speaker has said.
ASR systems can be divided into those that recognize only isolated words and those that are capable of recognizing connected words as well. They can also be classified based on the size of the vocabulary (number of words that can be recognized at a time), based on their capability to recognize only a fixed set of words defined during training or any set of words defined at runtime (phoneme based recognition), or based on their dependency on a particular speaker.
Applications of ASR systems are numerous and they depend on their properties. The most popular ASR systems are those independent of the speaker. Such systems are used within interactive voice response systems intended to provide various services to callers (information access, initiation and control over transactions etc.), with all the flexibility offered by speech recognition. Namely, the caller does not have to navigate through a complex menu structure using phone keyboard, but can say what he or she wants at once, thus reducing call duration. In this way system efficiency (number of users serviced) can be increased significantly.
Find out more on the AlfaNum software for speech recognition.