James A. Larson
Larson Technical Services
Developing user interfaces for a unimodal user interface such as a GUI or a VUI is a difficult task. Developing a multimodal user interface can be even more complex. This note presents a strategy for developing a GUI and VUI for the same application that, while different modalities, are similar in terminology, structure, and behavior. These user interfaces can be combined into a multimodal user interface in which users select the mode that is most appropriate for the task at hand, within the current environment, and subject to the user’s preference and background. The approach consists for four major steps:
1. Develop a user’s model of the user interface
2. Develop both GUI and VUI that conform to the user’s model
3. Fine tune both GUI and VUI to make them easier to use
4. Integrate the GUI and VUI into a single multimodal user interface
To illustrate each step of our approach, we will use the example of a catalog sales application in which the user browses the catalog, selecting items for purchase by placing them into a shopping card.
This captures the essence of the user interface without getting bogged down in details like screen layout, interaction objects, and other creative aspects of a user interface design. Instead, it captures the major application objects which the user will manipulate, the attributes and relationships of these objects, and the operations which users are expected to perform on the objects. Most importantly, it capture the flow control—the possible sequence of operations which the user may perform.
Figure 1 illustrates a typical flow control for a shopping user interface consisting of a catalog which the user may browse, and a shopping cart into which the user places goods and services the user wishes to purchase. Users may apply operations (represented by arcs) to the catalog which includes searching for a specific catalogue item, viewing the next catalog item, viewing the previous item, and placing an item into the shopping card. The users may review items to be purchase by moving from the catalog to the shopping card, and resume browsing the catalog by moving back form the shopping card to the catalog. The user may check out at any time. (To keep the example simple, we do not include the checkout process.)
Figure 2 illustrates a prototype graphical user interface in which buttons represent the operations shown as arcs in Figure 1. By clicking the next and previous button in Figure 2, the user can view items in the catalog. By pressing the shopping cart button, the user opens the shopping card window.
Figure 3 illustrates a prototype voice user interface in which voice menus represent the operations shown as arcs in Figure 1. By speaking menu options, the user can browse the items in the catalog. Be speaking the option “shopping cart, the user moves to the shopping cart portion of the dialog shown in Figure 3.
Revise the layout, colors, fonts, and graphics within the GUI to make the user feel comfortable and enable to user the graphical user interface easily. Revise the prompt wording, grammars and sequence of fields in the voice user interface to make the user feel comfortable and enable the user to use the interface easily.
Because both the graphical and voice user interface use the same user’s conceptual model, use the same business code, and use the same terms and operations, it is possible to integrate the two user interfaces into the multimodal user interface of Figure 4 in which the user can both read and type, and listen and speak. The user both reads and hears the same information, and may click/type as well as speak to enter information. After a few minutes using the user interface, the user will naturally choose between clicking/typing or speaking for each operation. Different users may choose differently at different times, depending upon environmental concerns (noise pollution, low light, small keyboard, etc.). If for any reason one user interface becomes difficult to use (e.g., low light for reading the screen, or high background noise which the user must speak over), the user can select the best mode for the situation.
For example, it is probably faster for the user to read the contents of the shopping cart than listen to the options via voice. However, it may be easier for the user to speak the names of desired items by voice rather than visually browse through a big catalogue. For efficiently purposes, it may be desirable to emphsize one user interface over the other for specific tasks. However, keep both interfaces available for every task in the event that the user needs to use that interface due to external environmental constraints, or simply because the ser prefers to do so.
At this point for flow control for the VUI and TUI may begin to deviate. For example, the GUI may provide sequence of items which the user can purchase, while the VUI may simply ask the user to name the item to which the user wants to purchase. When this occurs, the two user interfaces diverge, more work is required by developers to synchronize the two user interfaces, and users have more difficulty applying what they know about one user interface when they attempt to user the other user interface. For simple user interfaces, it is possible to manage this divergence. For complex user interfaces, this could become problematic.
There are several advantages in keeping the GUI and VUI as similar as possible. Similar GUI and VUI user interfaces enable user to:
Choose between these unimodal user interfaces depending upon the device available. If the user is in an office with a desktop computer, then he should use the graphical user interface. However, if the user in a car with only a cell phone, they he can use the voice user interface. The user should be able to switch between user interfaces even while using the application, for example when leaving the office and walking to a meeting in a different part of the building.
Select the user interface with which they are most familiar. If the device supports both GUI and VUI, then the user can select which every user interface he wants. Many users will select the graphical user interface because they feel comfortable with a mouse and keyboard. Other users will select the voice user interface because it leaves their hands free for tasks.
Select the user interface most appropriate to the current environment. If the user is in a business meeting, the user will select the graphical user interface to avoid noise pollution in the meeting. If the environment is nosy and the speech recognition system is likely to fail, the user may select the graphical user interface. If the lighting is bad or the user’s eyes are busy, the user may select the voice user interface.
Switch between the user interfaces at well. The user performs exactly the same operations in both the graphical voice user interfaces. Both user interfaces use the same commands and work consistently, even though the user interfaces are very different. Users can apply their knowledge of how one unimodal user interface works when they user the other unimodal user interface.
This flexibility is possible because both user interfaces access the same database and the same underlying application-specific code. Only the user interfaces are different.
The World Wide Web Consortium (W3C) Mulitmodal Interaction Working Group has developed an architecture to support multimodal user interfaces . A principle component of this architecture is the Interaction Manager, which is responsible for coordinating modality components such as HTML for graphical user interfaces and VoiceXML 3.0 for voice user interfaces. A candidate language for specifying the Interaction Manger is State Chart XML (SCXML) , being developed by the W3C Voice Browser Working Group. Using the W3C multimodal architecture, developers could use SCXML to specify the flow control, XHTML to specify the GUI, and VoiceXML to specify the VUI. The Mutimodal Working Group is developing (1) guidelines for designing multimodal user interfaces, and (2) guidelines and best practices for synchronizing the GUI and VUI interfaces within the W3C multimodal architecture.