is designed for use in telephony
applications, setting a standard for accuracy and channel count. Our
host based TTS is a fully scalable, fast and accurate, concatenative
speech synthesis system that has a small memory footprint and offers
an exceptionally high channel count using a client/server architecture.
Aculab TTS works in conjunction with Prosody product options, which
are used for audio replay and the simultaneous use of other media processing
resources.
Developers are, therefore, able to enhance the performance of their
telephony applications, combining TTS with automatic speech recognition
(ASR) and separate media processing algorithms, through the generic Aculab
API. When used with Prosody media processing resources, Aculab TTS is
available under a cost free licence, from software downloads.
Aculab TTS includes an email pre-processor and a lexicon customisation
tool. One product build is offered providing multiple language support.
See features summary for details of languages available and software
downloads for details of operating systems supported.
With the recent improvements in speech and language technology, innovative
and sophisticated solutions for a range of markets can be produced
using Aculab TTS and Prosody. For example, electronically formatted
text messages can be retrieved and heard by telephoning an integrated
messaging service. This type of service is ideal for people on the
move who need immediate access to their email messages whilst out of
the office. When using TTS in a contact centre environment, information
can be read out to callers, allowing an organisation to offer improved
customer service without increasing staff numbers.
TTS has been specifically designed for use in telephony applications
that require spoken output and where the information to be read out is
either frequently updated or too extensive to record.
It is unique in that its development has taken into account all of the
additional constraints that the telephone networks impose, for example,
narrow bandwidth, noisy listening conditions and high channel requirements.
As a result, Aculab TTS produces extremely consistent and intelligible
telephone speech, even for prolonged passages of synthesised text.
Aculab TTS is used with Prosody cards in PCI or cPCI card formats that
provide audio via an E1 or T1 network line interface. It can also be
used with Prosody S as the TTS software is compatible with any media
processing firmware that supports 64kbit/s replay. Replay (or playback)
may be combined with other media processing algorithms, such as echo
cancellation and recording to enable barge-in for applications using
ASR.
This means that developers benefit from the flexible combination of
speech technologies and Prosody to enable the construction of resilient,
high performance, scalable systems.
|
|
In TTS applications where text is to be read out over telephone lines, there
are practical considerations that dictate the channel count. These include
digital network access and media processing
resources as well as host capabilities and loading. In a 1GHz
Pentium III PC host, for example, acting also as the client machine, and
using a Prosody card with 2 DSPs and a PM4 module for digital network access,
Aculab TTS can speak texts simultaneously to 120 callers. This means that
developers can confidently deploy extremely high density speech technology
solutions for competitive advantage.
The following Prosody variants are available:
|
Prosody
PCI with 1, 2, 3 or 4 Prosody DSPs |
|
Prosody
cPCI with 2 or 4 Prosody DSPs |
|
Prosody
S host media processing software |
-
Languages may be changed dynamically. British English
male and female voices, American English, French, German,
Brazilian Portuguese and Italian (female voices) and Spanish
(male voice).
-
Variants in voice styles provide a number of pre-configured
versions. This allows developers to choose from up to ten
stylistic variants. These include a range of formal, semi-formal
and casual options for both male and female voices.*
-
Cost free licence enables download of the software, when
and where you need it, with no per channel costs or recurring
licence fees.
-
Input format is free form text (ASCII).
Email pre-processor (optional module) deals with special text such as email addresses,
message headers and URLs.
-
Text normalisation identifies and appropriately renders
special text fields like dates, time, currencies, bank
account numbers, phone numbers, acronyms, quotation marks,
parentheses, apostrophes and punctuation marks.
-
Intonation, stress and duration are applied using sophisticated
language models to provide natural and appropriate prosody.
This results in highly comprehensible synthesised speech.
-
Lexicon manager is a stand-alone utility that allows
the developer to tailor the pronunciation of words in the
lexicon and add new words.
-
Sampling rate for the TTS system synthesis is at 8kHz
adapted to telephone bandwidth.
-
Read out modes give user flexibility; line-by-line or
sentence-by-sentence.
-
Volume can be changed from high-level (8dB) to low-level
(-24dB) dynamically, with a default of 0dB.
-
Speech rate can be changed from fast rate (7) to slow
rate (1) dynamically.
-
Pitch level can be changed from high-level (7) to low-level
(1) dynamically.
|
|
|