When we use voice recognition software, it takes our words as snapshots. It then searches through its massive word database for the closest possible match. But due to the tremendous variety in spoken dialects, phonetics and accents, the software often comes up with wrong matches. The software tries to learn and adapt itself to the specific accent and voice tonality. It creates a specific user voice profile, which it keeps on updating. A common analogy could be of a sculptor chiseling away at a stone till he comes up with something that slowly starts resembling the human form. That's how most voice recognition codes works. With these algorithms working overtime and hunting through huge databases in a split second, it is little wonder that voice recognition software requires a lot of CPU power.
Voice Recognition process in simple steps:
- Spoken words enter a microphone.
- Audio is processed by the computer's soundcard.
- The Software then discriminates between lower frequency vowels and higher frequency consonants and then comparers the results with phonemes [the smallest building blocks of speech]. The software then compares results to groups of phonemes, and then to actual words, determining the most likely match.
- Contextual information is also simultaneously processed in order to more accurately predict words that are most likely to be used next.
- Selected words are then arranged in the most probable sentence combinations.