to incorporate ASR and bring interactive
dialogue applications to market, have an ideal solution in Aculab’s
phoneme based speech recogniser. It recognises naturally spoken words
or phrases from a number of alternatives specified in a recognition grammar.
Aculab’s host based ASR software offers developers excellent value
through a cost free licence.
The use of phonetic models allows customers to define and extend the
recogniser’s vocabulary, simply by adding entries from a pronunciation
dictionary.
Scalable systems can be readily achieved with one or more ASR servers
receiving recogniser channel feeds from one or more client systems. Only
the client systems require Prosody media processing resource cards. In
addition, ASR may be combined with other Prosody card algorithms to bring
increased flexibility and choice to integrators. This confirms Aculab’s
commitment to offering unmatched value and to continually extend the
features available to applications using Prosody. These features are
incorporated within Aculab’s consistent API and together with the
wide ranging country type approval and protocol support available from
Aculab, will reduce time to market.
Aculab’s ASR provides the benefits of high recognition speed and
accuracy, seamlessly integrating a combination of enhanced whole word,
monophone model sets and triphone models within the recognition process.
This allows different parts of an utterance to be modelled in different
ways, yet still produces a single recognition result - the most probable
sequence of spoken words. Recognition accuracy is 97% or greater for
connected word strings. Alternative ‘next most probable’ results
are also offered with a confidence value to allow the application to
handle ambiguities.
Extensive databases of speech for all supported languages have been analysed,
in order to accommodate variations due to age, gender and accent. Proprietary
signal analysis algorithms ensure the recogniser is resilient and is
unaffected by differences in handset and line characteristics.
Where less common dialects cause pronunciation of particular words to
differ from the norm, developers can extend the recogniser’s
vocabulary by creating additional dictionary entries. An easy to use
Windows based lexicon management tool, ASRLexMan is provided to enable
developers to edit or create entries. Multiple entries can be used
to describe a range of pronunciations, providing accurate recognition
over a range of accents.
|
|
|
The grammar defining the words to be recognised and their order is
specified using Aculab speech grammar format (ASGF) notation,
a subset of Java speech grammar format (JSGF). This can be specified
in advance, but for cases where it will change as a result of
any preceding dialogue, or other external factors, Aculab’s
ASR also allows grammars to be defined dynamically, at run time.
Grammars, or syntax networks, can be readily edited and created
using the Windows based network management tool, ASRNetMan, which
is provided.
With high-density Prosody cards providing up to 64 channel feeds
per DSP, recognition channel capacity is only limited by the
processing power of the ASR server. Using additional servers
will expand the channel capacity. Flexibility is built into the
architecture, allowing single or multiple server arrangements
to fully exploit the capacity of the Prosody DSPs. For more details,
see ‘Performance.’
One product build is offered catering for a number of languages,
which include British and American English, French, German, Italian
and North American Spanish. Each language is available on a per
recognition basis, although they may not be used simultaneously
for a single recognition.
Operating system support is offered in Windows XP/2000, Linux
and Sun SPARC Solaris.
The host based ASR application can take its audio feed from any
speech firmware build on Aculab Prosody cards in any format.
This allows ASR to be run in parallel with the range of DSP based
speech processing algorithms, including record, playback, echo
cancellation and DTMF detection, providing a versatile mixture
of speech technologies in the same platform.
Echo cancellation can be provided by the Prosody firmware and
is essential for achieving a natural interaction with the user.
This is applied to enable barge-in, making the system extremely
responsive for experienced callers who are used to anticipating
the outgoing prompts.
Developers can enhance their IVR and contact centre applications
by combining ASR and text-to-speech (TTS) or even speaker verification
and identification (SVI). Aculab’s host based TTS converts
electronically formatted text into intelligible, human sounding,
synthetic speech output. Aculab’s SVI software provides
the most cost effective, convenient and secure way to access
personal information over the telephone. Both are available under
a cost free licence, with Prosody, which is used to provide audio
replay channels.
A single 1.9GHz CPU acting as client, server and controller can
typically support 120 concurrent channels of real user speech
recognition.
|
|