- Feature extraction technology
- Growing application base, particularly for speech-dependent systems
- Reasonable accuracy for well trained systems, assisted generally by feedback on unrecognized utterances
- Flexible hands-free operation
- Ability to vary system configuration to conveniently and effectively accommodate different data gathering needs
- Operator specific usage (speaker-dependent systems) allow a degree of secure data gathering in certain applications
Voice recognition technology converts human speech into electrical signals and transforms these signals into coding patterns with assigned meanings. Voice terminals shine as automated input devices in applications where an operator's hands and eyes are occupied, enabling source data capture in real time.
Workers typically wear a microphone/speaker headset connected to a unit that recognizes spoken words and converts them into analog electrical signals. The analog signals are converted to digital patterns, which are decoded or "recognized" by template-matching or feature analysis. The data output may be entered into a program or it may activate a range of computer-based equipment such as scales, programmable logic controllers, or printers.
In "dialog" voice recognition systems, the unit recognizes human speech and then synthesizes a spoken response (or plays back a digitized response) to verify input and/or prompt the operator through a series of tasks.
Most voice systems are speaker-dependent, trained to recognize an individual voice that has previously read a vocabulary into the system. Speaker-trained systems recognize accents, dialects, and work-specific vocabulary, and offer the highest accuracy rates (under ideal conditions, error rates equal about 1 percent). Speaker-independent systems understand words prerecorded by an average pool of speakers; the system "remembers" words and attempts to match its limited vocabulary with words spoken by any new user.
Discrete speech processing is the most commonly used speech recognition technology. The operator speaks only one word at a time, or pauses briefly after each word of a phrase. Chances for false recognition are minimized, verification is easier, and accuracy rates are consequently high. Discrete speech is preferable when a large vocabulary is required or when there is considerable background noise in the environment.
Continuous speech processing is more natural and less tedious because it allows users to speak at a normal speech rate. However, continuous speech systems are more susceptible to false recognition and are less tolerant of background noise than discrete processing systems. Continuous speech was once significantly more expensive, but prices have dropped dramatically in recent years. Ongoing advances in speech recognition software as well as leaps in hardware development have propelled speech recognition as an up-and-coming AIDC technology in a range of industries.
The spoken word is the way most people communicate.
Voice Data Collection (also called Voice Data Entry) requires no special printed or encoded symbols, no exotic-looking equipment, nothing much more intimidating than a telephone headset. It is also the only technology that is generally trained to the way a human works rather than requiring the human to learn the machine's way of doing things.
And because speaking doesn't require the use of hands it is ideal for jobs requiring the worker's hands to be free. Inspection and baggage handling are two common applications.
Types of Systems
There are two ways voice data entry systems can be differentiated: speaker-dependent or speaker-independent, and discrete or continuous recognition.
Typically, speaker-independent systems are pre-programmed to recognize a limited vocabulary, such as the digits 0 through 9. Because human speech is so varied, it is not economically feasible, at this time, to create a speaker-independent system with a very large vocabulary.
Speaker-dependent systems rely on the operator to train the system in the words it is to recognize. This training makes the system less sensitive to external noises and other voices. It also allows non-English speaking employees to recite the list of words in his or her native language and have those words recognized for their English equivalents.
Discrete recognition systems require the speaker to pause between words and to break numbers down into individual digits. On the other hand, continuous recognition systems don't require such precision in speech. People tend to run words together, as in "serial number." Continuous recognition systems can be trained to recognize this, as well as the number "nineteen seventy-seven oh forty-three."
To provide greater system flexibility without placing undue burdens on acceptable vocabulary, some systems use grammars to allow different meanings for the same word or to help differentiate between similar-sounding words.
Portable systems, sometimes including bar code or other ADC technologies, take Voice Data Collection into the factory floor, storage yards or other locations where a hardwired system just can't go.
Many systems offer Radio Frequency Data Communications interfaces to provide real-time entry and interactive prompts from the host.
As processing speed and memory capacity inevitably increase, voice recognition's ability to transfer the spoken word into electronically transmittable data will likewise grow in breadth and accuracy, enabling the technology to better capture the easiest and most intuitive form of input there is, the human voice.
Mobility is key to many industrial voice applications, and the combination of voice recognition and RFDC is enabling rapid growth of systems that are maximizing productivity where wired, optical-based systems simply cannot effectively operate. With decreasing technology costs and increasing accuracy rates, voice recognition is poised for widespread data capture use, from the desktop to the factory floor.