HomeTrainingTechnology TrendsAboutCalendar

Commonsense Guidelines for Developing Multimodal User Interfaces


James A. Larson
April 3, 2003


Four Major Principles

1. Satisfy Real-world Constraints

Task-oriented Guidelines
1. *** Guideline: For each task, use the easiest mode available on the device.
Physical Guidelines
2. **** Guideline: If the user’s hands are busy, then use speech.
3. **** Guideline: If the users eyes are busy, then use speech.
4. **** Guideline: If the user may be walking, use speech for input.
Environmental Guidelines
5. **** Guideline: If the user may be in a noisy environment, then use a pen or keys.
6. **** Guideline: If the user’s manual dexterity may be impaired, then use speech.
2. Communicate Clearly, Concisely, and Consistently with Users
Consistency Guidelines
7. *** Guideline: Phrase all prompts consistently.
8. *** Guideline: Switch presentation modes only when the information is not easily presented in the current mode.
Organizational Guidelines
9. * Guideline: Use audio to indicate the verbal structure.
10. * Guideline: Use visual clues to indicate the visual structure.
11. ** Guideline: Use pauses to divide information into natural “chunks.”
12. * Guideline: Use animation and sound to show transitions.
3. Help Users Recover Quickly and Efficiently from Errors
Conversational Guidelines
13. * Guideline: Users tend to use the same mode that was used to prompt them.
14. *** Guideline: If privacy is not a concern, use speech as output to provide commentary or help.
Reliability Guidelines
15. **** Guideline: The user always should be able to easily determine how much longer the device will be operational.
16. **** Guideline: Support at least two input modes so one input mode can be used when the other cannot.
4.  Make Users Comfortable
System Status
17. **** Guideline: Always present the current system status to the user.
Human-memory Constraints
18. *** Guideline: Use the screen to ease stress on the user’s short-term memory.
Social Guidelines
19. **** Guideline: If the user may desire privacy, use a pen or keys.
20. **** Guideline: If the device may be used during a business meeting, then use a pen or keys (with the keyboard sounds turned off).
Advertising Guidelines
21. * Guideline: Use animation and sound to attract the user’s attention.
22. * Guideline: Use graphics and/or audio to create an awareness of company brands and logos.
Ambience Guideline
23. ** Guideline: Use audio and graphics design to set the mood and convey emotion in games and entertainment applications.
Summary



Commonsense Guidelines for Developing Multimodal User Interfaces

When fonts were first introduced, many messages looked like ransom notes from kidnappers. When color was introduced, many reports looked like they barely survived an explosion in a paint factory. To avoid these annoying user interfaces, developers adopted guidelines and best practices for using fonts and colors.

With the introduction of multiple modes of input—voice, pen, and keys—inexperienced developers may design loud, confusing, and annoying user interfaces that result in low user performance and high user discontent. This document attempts to enumerate a collection of commonsense guidelines for developing high performance and high preference multimodal user interfaces. We have collected suggestions, techniques, and principles from many diverse disciplines to generate the following guidelines for developing multimodal user interfaces.

Again, these are commonsense guidelines. You may think that no one would ever develop user interfaces that violate these guidelines, but developers have violated commonsense guidelines before and will likely do so again. Use these guidelines as a checklist when you design a multimodal interface. These guidelines should help you to construct a multimodal user interface that improves user performance and satisfaction, so intended people can use your application easily and effectively.

Four Major Principles

The guidelines are organized into four major principles of user interface design. The following four principles determine how quickly users are able to learn and how effectively they are able to perform desired tasks with the user interface:
  1. Satisfy real-world constraints
  2. Communicate clearly, concisely, and consistently with users
  3. Help users recover quickly and efficiently from errors
  4. Make users comfortable

Each guideline is rated with one to four stars to indicate its importance. Guidelines with four stars should always be followed. Guidelines with one star may be ignored if they conflict with other guidelines for developing multimodal user interfaces. Multimodal user inteface developers should follow the above four principles and apply the following guidelines to avoid many of the potential usability problems by using modes incorrectly.

1. Satisfy Real-world Constraints

Real-world constraints limit what the users may achieve with an application. These limitations may be due to the nature of the task the user intend to perform, other activities the user is performing, physical limitations of the user, and conditions of the environment in which the user will perform the task. The user interface should be designed to compensate for these limitations.

Task-oriented Guidelines

The nature of the task influences the mode (or modes) users select to perform the task. Tasks, which are easy to perform in one mode, may be difficult or impossible to perform using another mode. Task-oriented guidelines suggest which tasks lend themselves best to data entry using various modes of entry.

New mobile devices will enable users to enter data by speaking into a microphone, writing with a stylus, and pressing keys on a small keypad. These input modes can be used to perform the following four basic manipulation tasks:
  1. Select objects (e.g., menu options)
  2. Enter text
  3. Enter symbols (e.g., part of mathematical equations)
  4. Enter sketches or illustrations
There are other basic tasks, but the tasks mentioned above are performed most frequently in common applications using handheld computers.

Table 1 summarizes how users perform the four basic tasks using the following popular input modes:
Table 1: Performing the four basic manipulation tasks using three popular input modes, ranked from easiest (1) to most difficult (3)
Content Manipulation Task
Voice Mode
Pen Mode
Keys Mode
Select objects
(2) Speak the name of the object
(1) Point to or circle the object
(3) Press keys to position the cursor on the object and press the select key
Enter text
(2) Speak the words in the text
(3) Write the text
(1) Press keys to spell the words in the text
Enter symbols
(3) Say the name of the symbol and where it should be placed.
(1) Draw the symbol where it should be placed
(2) Select the symbol from a menu and indicate where it should be placed
Enter sketches or illustrations
(2) Verbally describe the sketch or illustration
(1) Draw the sketch or illustration
(3) Create the sketch by pressing keys to move the cursor so it leaves a trail (similar to an Etch-a-Stetch™)


Select objects. Object selection is easy with a pen—just point to or circle the desired object. When using voice, just say the name of the desired object, assuming the object has a name. With a keyboard, press keys to position the cursor on the desired object and press the select key.

Enter text. Each of the three modes can be used for text entry—the user speaks words into a microphone, handwrites the words using a pen, or presses keys on a keypad to spell the words. Most users can speak and write easily. However, some training and practice may be necessary to use a keyboard efficiently.

Enter symbols. Entering mathematical equations, special characters, and signatures is easy with a pen, awkward and time-consuming with a mouse, and most difficult with speech.

Enter sketches or illustrations. Drawing simple illustrations and maps is easy with a pen, awkward with a mouse, and nearly impossible with speech. When speaking, users must verbally describe the illustration or map.

Each input mode has its strengths and weaknesses. Voice is good for describing attributes. The pen is good for pointing and sketching. Keys are good for entering text, numbers, and symbols. A useful and efficient multimodal system uses the appropriate mode for each entry.
1. *** Guideline: For each task, use the easiest mode available on the device.
Guideline examples include:

Physical Guidelines

Different physical devices exhibit different usability characteristics. The device’s size, shape, and weight affect how it may be used. Most important, the placement of a microphone and speaker, the size of the display and writing surface, and the size of keys in a keypad all affect the ease with which a user can enter information by speaking, writing or pressing keys. Table 2 summarizes the three modes of input with respect to physical usability issues.


Table 2: Physical usability issues for the three popular modes of information entry
Device Usability Issues
Voice Mode
Pen Mode
Keys Mode
Required number of user hands
None (plus possibly one to hold the device)
One (plus possibly one to hold the device)
One or two
Required use of eyes
No
Yes
Yes
Portable
Yes, especially when walking
Yes, but difficult while walking
Yes, but difficult while walking


Required number of user hands. A user’s hands may be required when operating machinery, assembling parts into a device, or creating an object of art. No hands are needed to speak and listen to a voice user interface. A pen requires one hand to hold the pen. By their nature, handheld devices also may require a hand to hold the device. A 12-key keypad requires one hand to enter data, while a QWERTY keypad requires two hands to enter data efficiently. Some users become skilled at holding a small QWERTY keyboard with both hands and using their thumbs to type.
2. **** Guideline: If the user’s hands are busy, then use speech.
Guideline examples include:

Required use of eyes. A user’s eyes should be focused primarily on the road while driving a vehicle, on a physical device to be constructed or repaired, or on subjects and their activities while observing an experiment. Usually, users must look at what they are writing with a pen or typing on a keypad. However, the user’s eyes may be free to observe his or her environment while speaking.
3. **** Guideline: If the users eyes are busy, then use speech.
Guideline examples include:

Portable. Speech and pen devices are very portable. Users may use them while sitting, standing, walking, and sometimes while running. Traditionally, keyboard devices are used only while sitting. Keypads requiring only one hand, like those frequently found on handheld devices and telephones, can be used while sitting or standing.
4. **** Guideline: If the user may be walking, use speech for input.
Guideline examples include:

Environmental Guidelines

People work in environments that may not be ideal for some modes of user interfaces. The environment might be noisy or quiet, hot or cold, light or dark, or moving or stationary with a variety of distractions and possible dangers. Multimodal user interfaces must be designed to work in the environments where they will be used. Table 3 summarizes the environmental usability issues with respect to three popular input modes.


Table 3: Environmental usability issues for the three popular modes of information entry
Device Usability Issues
Voice Mode
Pen Mode
Keys Mode
Noisy environment
Works poorly in a noisy environment
Works well in a noisy environment
Works well in a noisy environment
Other environmental concerns
Works well independently of gloves
Does not work well when users must wear thick gloves
Does not work well when users must wear thick gloves


Noisy environment. Because speech recognition systems pick up background sounds, they often make mistakes if the user speaks in a noisy environment.

5. **** Guideline: If the user may be in a noisy environment, then use a pen or keys.
Guideline examples include:
Other environmental concerns. Pen and keyboard devices are difficult if the user must wear thick gloves, such as in a cold environment or when protecting hands from rough objects.

6. **** Guideline: If the user’s manual dexterity may be impaired, then use speech.
A guideline example is:

2. Communicate Clearly, Concisely, and Consistently with Users

Efficient communication is required if teams of people are to achieve success in joint activities. Likewise, effective communication between the user and the device is necessary for achieving the user’s goals. The multimodal user interface is the conduit for all communication between the user and the device. Communication should be clear and concise, avoiding ambiguities and confusion. Communication styles should be consistent and systematic so users know what to expect and can leverage the patterns and rhythms in the dialog.

Consistency Guidelines

Consistency enables users to leverage conversational patterns to accelerate their interaction. For example, users can follow a consistent conversational rhythm without having to pause to adjust to heterogeneous dialog styles.

Consistent prompts. If prompts are worded inconsistently, then users must pause to decode each wording format. Users must spend additional time and mental effort to respond to differently structured questions. When prompts are consistently worded, users can concentrate on the answers to questions rather than trying to understand the questions.

7. *** Guideline: Phrase all prompts consistently.
Guidelines examples include:
  1. Speak the name of the menu or form item.  The menu name serves as a landmark. A landmark is a speech or non-speech cue that marks a specific location within the dialog structure. By providing a name, such as “main menu” or “thermostat,” callers can jump to this menu by speaking the menu name or return to the menu when they get confused or lost. Also, repeating the menu name to the caller confirms that the caller has reached the correct menu. However, if the name is contained within the question and is not needed as a landmark, then skip speaking the name.
  2. Ask a question. Often, this can be achieved with two or three words. This should be enough to remind experienced callers to respond without listening to the enumerated options. Novice callers will listen to the enumerated options before speaking their selection.
  3. Enumerate options.  If there are a small number of valid responses, then list the options so novice callers can hear and select their desired option. However, if the user is likely to know the set of valid responses, then skip this operation.
Experienced callers can barge-in after they hear the question, while novice callers will respond after they hear the entire menu option list.
Switching modes. Switching modes can be jarring and sometimes surprising. For example, a user who has just answered three verbal questions will be surprised if a textual question suddenly pops up.
8. *** Guideline: Switch presentation modes only when the information is not easily presented in the current mode.
Guideline examples include:   

Organizational Guidelines

Grade school teachers always teach that organizing your thoughts before writing a composition will dramatically improve its understanding. The same principle applies to user interfaces. Organizing information and transitioning between topics will improve the users’ comprehension of and performance with the multimodal interface. Information should be structured and organized in ways that are familiar to the user.

Verbal structure. Audio cues help users understand audio information. For example, use a click to introduce each item of a bulleted list, increase the volume to emphasize highlighted text, or use a whisper to speak parenthetical text.
9. * Guideline: Use audio to indicate the verbal structure.
Because there are no standard assignments of meanings for sounds, commonsense and user testing should guide the dialog designer.  Here are suggestions for items that lend themselves to non-speech sounds:
Visual structure. Users quickly scan visual information to select topics of current interest. Visual cues help users to locate desired information quickly. For example, use a bullet to indicate each item on a list. Change the font style to emphasize highlighted text. Use parentheses to indicate parenthetical text.

10. * Guideline: Use visual clues to indicate the visual structure.
Guideline examples include:
Chunks of information. Users comprehend audio information more easily if it is presented as blocks, or chunks, of information. For example, users may not recognize “six, one, seven, two, two, five, four, three, seven, six” as a telephone number, but they will recognize “six, one, seven (pause) two, two, five (pause) four, three, seven, six” as either an American or Canadian telephone number.
11. ** Guideline: Use pauses to divide information into natural “chunks.”
Guideline examples include:
Transitions. A user may become disoriented if the information content suddenly changes. Writers are well aware of the need for transitions between topics. Similar transitions are needed for visual and verbal information.

12. * Guideline: Use animation and sound to show transitions.
A guideline example is:

3. Help Users Recover Quickly and Efficiently from Errors

The user interface must help users recover quickly and efficiently from errors. All users, especially novice users, will occasionally fail to respond to a prompt appropriately. The user interface must be designed to detect such errors and assist users to recover naturally. The multimodal interface also should help users learn how to use the user interface to achieve the desired results quickly and efficiently.

Conversational Guidelines

Principles of conversational discourse suggest that the guidelines for the nature, content, and format of information exchanged between two humans may be applied to information exchanged between a human and a computer.

Reflexive principle. The reflexive principle states that people tend to respond in the same manner that they are prompted. For example, if users are given long rambling prompts, they will likely reply with long rambling responses.

13. * Guideline: Users tend to use the same mode that was used to prompt them.
Guideline examples include:
Verbal help. Speech is more immediate and does not obscure screen contents.

14. *** Guideline: If privacy is not a concern, use speech as output to provide commentary or help.
Guideline examples include:

Reliability Guidelines

Few situations are more frustrating to users than to have a device at hand but not be able to use it.

Power status. One especially frustrating situation is when the device suddenly goes dead because the batteries are low.

15. **** Guideline: The user always should be able to easily determine how much longer the device will be operational.
A guideline example is:
Backup mode. In Section 1, Table 1 summarized the various strengths and weaknesses of using voice, pen, and keys as input methods. Because user tasks, environmental situations, and user distractions change, users should be able to switch modes when it becomes inconvenient or impossible to use the primary mode of input.
16. **** Guideline: Support at least two input modes so one input mode can be used when the other cannot.
Guideline examples include:

4.  Make Users Feel Comfortable

Users often judge a computer application by its user interface. If users do not like the user interface, the application will not be used. If the user interface is not easy to learn and easy to use, the application cannot be used successfully.

System Status

Users need feedback to determine whether the computer is processing input data, is waiting for input, or is malfunctioning.

17. **** Guideline: Always present the current system status to the user.

Some suggestions for indicating if the computer is idle or busy are shown in Table 4.   ”


Table 4:  Suggested indicators for the current system status
Mode
Idle
Busy
Error
Text
“Ready for next input”
“Processing, please wait”
Explanation for the cause of the error and how to fix it
Icons
Green
Red
Blinking “danger” icon
Audio
Silence
Sounds of a clicking clock or a percolationg coffee pot
Emergency vehicle siren

Human-memory Constraints

Normally, human short-term memory holds only to 7±2 items, so it is necessary to keep verbal lists short. Instead of reading a list of options to users, display the list so users will not forget the spoken information.

18. *** Guideline: Use the screen to ease stress on the user’s short-term memory.
Guideline examples include:

Social Guidelines

Social customs among people suggest guidelines for user interfaces between users and devices.

Privacy. Speech presented with a speaker is not private. Others in close proximity can hear both the user and the computer. The keyboard/mouse and pen provide greater privacy.

19. **** Guideline: If the user may need privacy, use a pen or keys.
Guideline examples include:
Acceptance in meetings. Pen devices are accepted in meetings—they replace a pen and pad of paper for taking notes. Keyboards and keypads are becoming acceptable with the widespread use of laptops. However, key sounds should be turned off. Usually, devices that speak or are spoken to are not accepted in meetings without the use of earphones; and, in some cases, earphones may infer that the user is not interested in the current discussion.

20. **** Guideline: If the device may be used during a business meeting, then use a pen or keys (with the keyboard sounds turned off).

Advertising Guidelines

Techniques from the field of advertising can be applied to user interfaces to make them more appealing and interesting to the user.

Important messages. Users must notice important messages.  

21. * Guideline: Use animation and sound to attract the user’s attention.
A guideline example is:  
Brand or name recognition. Company brands and logos are recognized quickly by users, who unconsciously recognize brands based upon previous experiences with products and services offered by the company. A well-recognized brand name encourages users to purchase associated goods and services. Sound can be an important part of a brand or logo. For example, the “bong” heard at the beginning of long distance telephone calls indicates the service is being offered by AT&T. The “Intel Inside” visual and audio logos indicates that Intel supplied the computer chip inside the computing device.

22. * Guideline: Use graphs and/or audio to create an awareness of company brands and logos.
Example Guidelines include:

Ambience Guideline

Television and movie directors set the mood with set design, lighting, and background music. Screen layout, colors, and background music also create moods in multimodal user interfaces. However, in some cases, moods and emotion may not be appropriate in productivity applications.

23. ** Guideline: Use audio and graphics design to set the mood and convey emotion in games and entertainment applications.
Guideline examples include:

Summary

Use these guidelines as a checklist when you first construct a multimodal user interface. However, the final decisions about the usefulness and friendliness of the user interface rests in an abundance of iterative usability testing. If users do not like or cannot use the user interface, it does not matter if the guidelines were followed. The user interface needs to be changed so users will like and be productive with it, even when some guideline may not have been followed. The users’ needs should be the foremost concern for multimodal user interface designers and developers.