Aculab Prosody Speech Recognition

Contact

Directions

Company

Testimonials

Developers who are seeking to incorporate ASR and bring interactive dialogue applications to market, have an ideal solution in Aculab’s phoneme based speech recogniser. It recognises naturally spoken words or phrases from a number of alternatives specified in a recognition grammar. Aculab’s host based ASR software offers developers excellent value through a cost free licence.

The use of phonetic models allows customers to define and extend the recogniser’s vocabulary, simply by adding entries from a pronunciation dictionary.

Scalable systems can be readily achieved with one or more ASR servers receiving recogniser channel feeds from one or more client systems. Only the client systems require Prosody media processing resource cards. In addition, ASR may be combined with other Prosody card algorithms to bring increased flexibility and choice to integrators. This confirms Aculab’s commitment to offering unmatched value and to continually extend the features available to applications using Prosody. These features are incorporated within Aculab’s consistent API and together with the wide ranging country type approval and protocol support available from Aculab, will reduce time to market.

Enhanced recognition
Aculab’s ASR provides the benefits of high recognition speed and accuracy, seamlessly integrating a combination of enhanced whole word, monophone model sets and triphone models within the recognition process. This allows different parts of an utterance to be modelled in different ways, yet still produces a single recognition result - the most probable sequence of spoken words. Recognition accuracy is 97% or greater for connected word strings. Alternative ‘next most probable’ results are also offered with a confidence value to allow the application to handle ambiguities.

Speaker independence
Extensive databases of speech for all supported languages have been analysed, in order to accommodate variations due to age, gender and accent. Proprietary signal analysis algorithms ensure the recogniser is resilient and is unaffected by differences in handset and line characteristics.

Say it your own way
Where less common dialects cause pronunciation of particular words to differ from the norm, developers can extend the recogniser’s vocabulary by creating additional dictionary entries. An easy to use Windows based lexicon management tool, ASRLexMan is provided to enable developers to edit or create entries. Multiple entries can be used to describe a range of pronunciations, providing accurate recognition over a range of accents.

Runtime grammar processing
The grammar defining the words to be recognised and their order is specified using Aculab speech grammar format (ASGF) notation, a subset of Java speech grammar format (JSGF). This can be specified in advance, but for cases where it will change as a result of any preceding dialogue, or other external factors, Aculab’s ASR also allows grammars to be defined dynamically, at run time. Grammars, or syntax networks, can be readily edited and created using the Windows based network management tool, ASRNetMan, which is provided.

Scalable architecture
With high-density Prosody cards providing up to 64 channel feeds per DSP, recognition channel capacity is only limited by the processing power of the ASR server. Using additional servers will expand the channel capacity. Flexibility is built into the architecture, allowing single or multiple server arrangements to fully exploit the capacity of the Prosody DSPs. For more details, see ‘Performance.’

Language
One product build is offered catering for a number of languages, which include British and American English, French, German, Italian and North American Spanish. Each language is available on a per recognition basis, although they may not be used simultaneously for a single recognition.

Operating system support is offered in Windows XP/2000, Linux and Sun SPARC Solaris.

Hardware choices
The host based ASR application can take its audio feed from any speech firmware build on Aculab Prosody cards in any format. This allows ASR to be run in parallel with the range of DSP based speech processing algorithms, including record, playback, echo cancellation and DTMF detection, providing a versatile mixture of speech technologies in the same platform.

Echo cancellation can be provided by the Prosody firmware and is essential for achieving a natural interaction with the user. This is applied to enable barge-in, making the system extremely responsive for experienced callers who are used to anticipating the outgoing prompts.

Text-to-speech and speaker verification too
Developers can enhance their IVR and contact centre applications by combining ASR and text-to-speech (TTS) or even speaker verification and identification (SVI). Aculab’s host based TTS converts electronically formatted text into intelligible, human sounding, synthetic speech output. Aculab’s SVI software provides the most cost effective, convenient and secure way to access personal information over the telephone. Both are available under a cost free licence, with Prosody, which is used to provide audio replay channels.

Performance
A single 1.9GHz CPU acting as client, server and controller can typically support 120 concurrent channels of real user speech recognition.

Tel: 603-524-2214

Got a Question? Need more info?

E-Mail info@mcct.com