Aculab Speech to Text

Contact

Directions

Company

Testimonials

Aculab’s TTS software is designed for use in telephony applications, setting a standard for accuracy and channel count. Our host based TTS is a fully scalable, fast and accurate, concatenative speech synthesis system that has a small memory footprint and offers an exceptionally high channel count using a client/server architecture.

Aculab TTS works in conjunction with Prosody product options, which are used for audio replay and the simultaneous use of other media processing resources.

Developers are, therefore, able to enhance the performance of their telephony applications, combining TTS with automatic speech recognition (ASR) and separate media processing algorithms, through the generic Aculab API. When used with Prosody media processing resources, Aculab TTS is available under a cost free licence, from software downloads.

Aculab TTS includes an email pre-processor and a lexicon customisation tool. One product build is offered providing multiple language support. See features summary for details of languages available and software downloads for details of operating systems supported.

Speech technology
With the recent improvements in speech and language technology, innovative and sophisticated solutions for a range of markets can be produced using Aculab TTS and Prosody. For example, electronically formatted text messages can be retrieved and heard by telephoning an integrated messaging service. This type of service is ideal for people on the move who need immediate access to their email messages whilst out of the office. When using TTS in a contact centre environment, information can be read out to callers, allowing an organisation to offer improved customer service without increasing staff numbers.

TTS has been specifically designed for use in telephony applications that require spoken output and where the information to be read out is either frequently updated or too extensive to record.

It is unique in that its development has taken into account all of the additional constraints that the telephone networks impose, for example, narrow bandwidth, noisy listening conditions and high channel requirements. As a result, Aculab TTS produces extremely consistent and intelligible telephone speech, even for prolonged passages of synthesised text.

TTS architecture
Aculab TTS is used with Prosody cards in PCI or cPCI card formats that provide audio via an E1 or T1 network line interface. It can also be used with Prosody S as the TTS software is compatible with any media processing firmware that supports 64kbit/s replay. Replay (or playback) may be combined with other media processing algorithms, such as echo cancellation and recording to enable barge-in for applications using ASR.

This means that developers benefit from the flexible combination of speech technologies and Prosody to enable the construction of resilient, high performance, scalable systems.

Channel count
In TTS applications where text is to be read out over telephone lines, there are practical considerations that dictate the channel count. These include digital network access and media processing resources as well as host capabilities and loading. In a 1GHz Pentium III PC host, for example, acting also as the client machine, and using a Prosody card with 2 DSPs and a PM4 module for digital network access, Aculab TTS can speak texts simultaneously to 120 callers. This means that developers can confidently deploy extremely high density speech technology solutions for competitive advantage.

Prosody hardware options
The following Prosody variants are available:

	Prosody PCI with 1, 2, 3 or 4 Prosody DSPs
	Prosody cPCI with 2 or 4 Prosody DSPs
	Prosody S host media processing software
Features
Languages may be changed dynamically. British English male and female voices, American English, French, German, Brazilian Portuguese and Italian (female voices) and Spanish (male voice). Variants in voice styles provide a number of pre-configured versions. This allows developers to choose from up to ten stylistic variants. These include a range of formal, semi-formal and casual options for both male and female voices.* Cost free licence enables download of the software, when and where you need it, with no per channel costs or recurring licence fees. Input format is free form text (ASCII). Email pre-processor (optional module) deals with special text such as email addresses, message headers and URLs. Text normalisation identifies and appropriately renders special text fields like dates, time, currencies, bank account numbers, phone numbers, acronyms, quotation marks, parentheses, apostrophes and punctuation marks. Intonation, stress and duration are applied using sophisticated language models to provide natural and appropriate prosody. This results in highly comprehensible synthesised speech. Lexicon manager is a stand-alone utility that allows the developer to tailor the pronunciation of words in the lexicon and add new words. Sampling rate for the TTS system synthesis is at 8kHz adapted to telephone bandwidth. Read out modes give user flexibility; line-by-line or sentence-by-sentence. Volume can be changed from high-level (8dB) to low-level (-24dB) dynamically, with a default of 0dB. Speech rate can be changed from fast rate (7) to slow rate (1) dynamically. Pitch level can be changed from high-level (7) to low-level (1) dynamically.

Tel: 603-524-2214

Got a Question? Need more info?

E-Mail info@mcct.com