May/June 2003

Speech Wars—Round Two

By Dr. James A. Larson

The Voice Browser Working Group has finished the technical work on the three major languages in the W3C Speech Interface Framework—VoiceXML 2.0, the Speech Recognition Grammar Specification, and the Speech Synthesis Markup Language. These languages will soon become “check-off items” in a list of features provided by the leading speech platforms. Most speech platform vendors will offer these languages as a standard part of their platforms.

In order to compete, speech platform vendors need to find new ways to differentiate their platforms from the platforms of their competitors. Major differentiators will include new software development and management facilities that accelerate the speech application development process and enable sophisticated monitoring of deployed speech applications. Vendors will no longer boast “our platform supports VoiceXML and yours doesn’t”—the speech wars of yesteryear. Vendors will now do battle with a range of tools that make life easier for speech application developers. These tools will fall into three categories: development environments, information centers and control centers.

Development Environment

A development environment contains integrated tools that accelerate the speech development process. Useful tools include:

Code editors—Text editors that help the developer to createVoiceXML 2.0, Speech Recognition Grammar Specification and Speech Synthesis Markup Language code that syntactically conforms to the associated XML Schema. Any syntactical error is immediately flagged so the developer can resolve it.

Graphical dialog designer—Enables developers to draw dialog states and transitions and, then, automatically generate VoiceXML 2.0 code.

Prompt manager—Collects all of the text prompt messages into a single file that enables voice talent to easily record the equivalent verbal prompts.

Pronunciation specification tool—Enables developers to specify the pronunciation of words by selecting and sequencing sounds for each phoneme in a word.

Grammar specification tool—Converts developer-created flow charts, spread sheets or tree structures into grammar rules.

Rehearsal tool—Enables the designer to walk through a VoiceXML application without using speech recognition and speech synthesis—by reading textual prompts on a screen and typing the responses via a keyboard. The developer debugs the dialog logic without dealing with speech recognition errors and misunderstood synthesized speech.

Debug tools—Displays the contents of internal buffers and actions performed by the VoiceXML interpreter.

Information Center

The information center provides tools that present developers with information about how speech applications are used, including:

Log file—Captures timestamps and other information for each tag that developers embed into a VoiceXML application.

Log file report generator—Summarizes information calculated from log file including durations (task duration, response latency, mean system turn duration) and counts (turns to task completion, number of help, nomatch and no response events, number of reprompts).

User evaluation results—Summarizes usability questions answered by users, including likes, dislikes, preferences and other user subjective responses to usability questions.

Control Center

The control center contains performance monitoring tools such as:

Application activity tool—Summarizes speech recognition response times and application activity process.

Platform activity tool—Summarizes page fault rates and fetch times, communication delays, and congestion at platform resources. System administrators can dynamically reconfigure the distributed system to better support application performance and offload processing to backup hosts during peak processing loads.

Who will win the new speech wars to provide better development environments, information centers and control centers? Vendors who pay attention to the speech applications developers and provide a usable collection of tools that satisfy their needs will win by selling more platforms than their competitors. This is a war where application developers win by developing and deploying new applications faster. Let round two of the speech wars begin.

Dr. Jim A. Larson is an adjunct professor at Portland State University and Oregon Health Sciences University. He can be reached at jim@larson-tech.com and his Web site is http://www.larson-tech.com