November/December 2002

Developing Verbal, Visual, and Multimodel User Interfaces for the Same Application

By Dr. James A. Larson

PC users access the World Wide Web using a graphical user interface (GUI) that is commonly specified with HTML. Telephone and cell phone users access the Web using a verbal user interface (VUI) often specified with VoiceXML. With the integration of cell phones with PDAs, there will be a need soon for a third type of user interface—a multimodal user interface (MMUI) that enables users to both speak and hear, as well as read and click. MMUIs provide a combination of graphical and verbal user interfaces to enable users to access the Web more naturally than when using only a GUI or a VUI. Many application developers want their web-based applications to be accessed by PCs, telephones and multimodal devices using, respectively, GUI, VUI and MMUI user interfaces.

Even though GUIs, VUIs and MMUIs all present the same information to users, the presentations are fundamentally different. GUIs present information physically on a large two-dimensional screen. VUIs present information temporally as a voice stream. MMUIs present information both physically and temporally.

User Interface Creation Tasks

The developer should always write application code separate from the user interfaces. Then GUI, VUI and MMUI can all use the same application functions. For each of the three user interfaces, the developer must:

1. Select interaction objects appropriate for the device. Interaction objects for devices with screens include menus, data boxes and scroll bars. Interaction objects for telephones include verbal menus and verbal form fields. A typical interaction object for a multimodal device might be a form field that speaks to the user when highlighted and accepts both spoken and typed input.

2. Customize each of the interaction objects. GUI interaction objects must be sized, positioned within the screen, and given colors, shapes and textures. VUI interaction objects must be given a voice, volume and prosody. MMUI interaction objects must be given physical and temporal attributes so that the user may easily manipulate them.

3. Position the interaction objects in time and space. Verbal interaction objects must be presented sequentially to the user. Visual interaction objects must be laid out onto a two-dimensional screen for presentation to the user. MMUI interaction objects must be laid out both physically and temporally.

The fundamental differences for presenting interaction objects to the user offer challenges to the developers of GUIs, VUIs and MMUIs for the same application. Application functions and databases are shared among all three types of devices, but the user interfaces for the three types of devices are quite different.

Approaches for Creating a User Interface

There are at least five approaches for creating GUI, VUI and MMUI user interfaces for the same web-based application.

1. Write from scratch. An application developer designs and implements interaction objects, customizes the interaction objects, and lays them out physically and/or temporally as appropriate for each device.

2. Code generation tool. A user interface designer selects, customizes and positions existing interaction objects in space and time using a code generation tool.

3. Transcoder. A transcoder accepts user interfaces expressed in one language while it creates a functionally equivalent user interface expressed in a second language targeted for a second device. A transcoder from a VUI to GUI must choose the correct GUI interaction objects, position them onto a 2D surface, and customize the interaction objects with appropriate physical characteristics. A transcoder from a GUI to VUI must choose the correct VUI interaction objects, customize the interaction objects with the appropriate audio characteristics, and temporally sequence the interaction objects for presentation to the user. A transcoder that produces MMUI interaction objects must customize the interaction objects with the appropriate physical and temporal characteristics.

4. Dynamic generation. A specially written program dynamically generates the user interfaces at the time the application is executed. Many of the Web development tools used to develop dynamic GUIs using HTML can also be used to develop dynamic verbal user interfaces using VoiceXML and dynamic multimodal user interfaces using HTML plus SALT.

5. Author once. The application developer creates a single coded module that produces appropriate user interfaces when executed on each of the three device types.

When to Use Each Approach

There is no single best approach for all applications. However, some approaches seem to lend themselves to certain applications:

·New innovative applications. These applications may require special user interfaces that must be written from scratch.

·Traditional GUI applications. If the GUI is targeted to the large screen available on PCs, then add SALT tags to the HTML only if adding voice enhances the PC application. If these applications are to be accessed using the much smaller screens on the new handheld multimodal devices, then the GUI will likely need to be written to fit into the smaller screen and take advantage of an integrated VUI.

·Simple form-fill-in applications. All three, GUI, VUI and MMUI user interfaces, can be embedded into the same code for simple form-oriented user interfaces using development environments such as Microsoft .NET Speech SDK (currently in beta testing).

There are many approaches for developing a GUI, VUI and MMUI using the same application functions. In every approach, interaction objects must be selected, customized and laid out either physically on a 2D surface, temporally in time or both. Different approaches attempt to automate one or more of these steps. The choice of which approach to use is determined by the quality of the resulting user interface, tool availability, and time, effort, and skill of the developer.

Jim Larson is Manager of Advanced Human Input/Output at Intel Corporation, and author of the book, VoiceXML—Introduction to Developing Speech Applications. His web site is www.larson-tech.com and he can be reached at mailto:jim@larson-tech.com.