Technology Trends

What’s new with VoiceXML 2.0?

By Dr. James A. Larson

While the W3C Voice Browser Working Group is focused on enhancing the VoiceXML 2.0 language, the VoiceXML Forum is working to improve the community of VoiceXML developers. Here are brief summaries of ongoing activities—some available today, others available in the future.

Today: VoiceXML Developer Certification Test

The VoiceXML Forum has announced the availability of the VoiceXML Application Developer Certification Test. This test certifies that developers have important knowledge and skills necessary to design, develop, deploy and maintain speech applications using the W3C Speech Interface Framework languages of VoiceXML 2.0: Speech Synthesis Markup Language 1.0, Speech Recognition Grammar Language 1.0, Semantic Interpretation Language 1.0, ECMAScript 262/327 and Call Control XML Language 1.0. The test sets the standard for knowledge and training required to develop speech applications. The VoiceXML Application Developer certification test benefits both employers and their employees:

Employer’s benefits — Employers will recognize job seekers and employees, who have obtained VoiceXML certification, as highly competitive VoiceXML programmers with a demonstrated competency with the W3C Speech Interface Framework. Employees will be more productive and better understand vendor products and solutions when they have the expertise demonstrated by the certification test.
Employee benefits — Successful completion of the test will become a benchmark for hiring, promoting, and salary increases. Completing the test adds to employee credibility and respect from both management and peers. Most important, using the knowledge gained preparing for the test will improve the employee’s work performance and contribute to the employee’s overall job satisfaction.

Today: The VoiceXML Education Exchange

VoiceXML resources are available to instructors of college and university courses involved in VoiceXML training and research. These resources include:

Commercial grade software — software platforms, development environments, tools, examples of VoiceXML applications, grammars, audio files and subdialogs available via academic licensing from commercial VoiceXML vendors
Online services — online development environments, documentation and/or hosting services provided at low or no cost
Contributed freeware — software platforms, development environments, tools, examples of VoiceXML applications, grammars, audio files and subdialogs downloadable without a licensing fee from universities and individuals
Teaching resources — course syllabi, lecture notes, sample test questions, example projects, training materials contributed by instructors and others, downloadable without a licensing fee

Future: VoiceXML 2.1

To determine which features to include in VoiceXML 2.1, the W3C Voice Browser Working Group identified eight new features implemented, documented and deployed by individual VoiceXML 2.0 vendors. VoiceXML 2.1 is backwards compatible with VoiceXML 2.0, so all VoiceXML 2.0 applications should continue to run under VoiceXML 2.1. VoiceXML 2.1 features includes the following:

Flexibility — references to both scripts and grammars can be dynamically calculated. No longer must a designer reference a specific grammar or script. Instead, the developer specifies code to dynamically generate the URI of the appropriate script or grammar. This enables developers to reference alternative scripts and grammars for different situations. For example, depending upon the current date, the application executes alternative scripts and grammars for different holiday seasons.
Increased functionality — useful new functionality is added to VoiceXML 2.0: (See Table 1 Page 9)
New type of transfer — the <transfer> element now supports a new type attribute, which may have one of three values:

· blind — connect the calling party to another telephone line. The VoiceXML 2.1 application does not remain in the connection and does not monitor the outcome. (This corresponds to the VoiceXML 2.0 <transfer> attribute bridge = “false”.

· bridge — connect the calling party. The VoiceXML 2.1 application remains in the connection and listens for words included in the <grammar> element. (This corresponds to the VoiceXML 2.0 <transfer> attribute bridge = “true”.)

· consultation — similar to a blind transfer except that the platform monitors the progress of the transfer and if the connection cannot be established, then the session remains active and returns control to the application.

Future: V3

The W3C Voice Browser Working Group is working on a new speech processing architecture code named V3. V3 will support the following features:

Modularization — a set of modules with common external interfaces. This enables dialog designers to mix and match voice with other modes of input, including keyboard and pen. For example, a V3 module for speech recognition might be embedded into XHTML, enabling a graphical Web page to accept speech input from the user.
Extensibility — extends the power of dialog management. VoiceXML 2.0 already supports both system-directed and mixed-initiative dialogs. V3 will allow for plan-based or rules-based dialog definitions by enabling the dialog author to define new dialog strategies.
Low-level control of media resources — includes speech recognition, speech synthesis and audio replay. Using these resources application developers will be able to specify their own control structures, enabling a procedural style of dialog specification similar to that used by SALT developers as well as the declarative programming style of VoiceXML 2.0 enabled by the Forms Interpretation Algorithm.

The Voice Browser Working Group expects to publish the first working draft of the V3 architecture by the end of this year.

Dr. James A. Larson is Manager of Advanced Human Input/Output at Intel Corporation and author of the book, Voice XML - Introduction to Developing Speech Applications. He can be reached at jim@larson-tech.com and his Web site is http://www.larson-tech.com/.