James A. Larson
February 24, 2002
When fonts were first introduced, many messages looked like ransom notes from kidnappers. When color was introduced, many reports looked like they barely survived an explosion in a paint factory. To avoid these annoying user interfaces, developers have adopted widely-accepted guidelines for using fonts and colors.
With the introductions of multiple modes of input—voice, pen, and keys—developers may develop applications reminiscent of Times Square on New Years Eve, The Daytona Raceway during a high speed race, and the Fourth of July fireworks all combined into a loud, confusing, and annoying user interface that result in low user performance and high user discontent. This document attempts to enumerate a collection of commonsense guidelines for developing high performance and high preference multimodal user interfaces.
This set of guidelines present a first attempt to enumerate commonsense heuristics for developing multimodal user interfaces. Each guideline is rated with one to four stars to indicate its importance. Guidelines with four stars should always be followed. Guidelines with one star may be ignored if they conflict with other guidelines for developing multimodal user interfaces. We encourage developers to apply these guidelines when developing multimodal user interfaces and avoid many of the potential usability problems of using modes incorrectly.
Many of these guidelines are just commonsense. The reader might think that no one would ever develop user interfaces that violate these guidelines. But developers have violated commonsense before and will likely do so again. Use these guidelines as a check list when you design every multimodal interface. These guidelines should help you to construct a multimodal user interface that improves user performance and user preference scores, so your application can be used easily and effectively by its intended users.
We have collected suggestions, techniques, and principles from many diverse disciplines to generate the following guidelines for developing multimodal user interfaces.
Task-oriented guidelines suggest which tasks lend themselves best to data entry using various modes of entry.
New mobile devise will enable users to enter data by pressing the keys on a small keypad, write using a stylus, and speak into a microphone. These input modes can be used to perform four basic manipulation tasks:
There are other basic tasks, but the four above are performed most frequently in common applications using handheld computers.
Table 1 summarizes how users perform these tasks using the following popular input modes:
Table 1: Performing the four basic tasks using the three popular input modes, ranked from easiest (1) to most difficult (3)
Content manipulation task |
Mode |
||
Voice |
Pen |
Keys |
|
Select objects |
(2) Speak the object’s name |
(1) Point to or circle the object |
(3) Press keys to position the cursor on the object and press the select key |
Enter text |
(2) Speak words in the text |
(3) Write the text |
(1) Press keys to spell words in the text |
Enter symbols |
(3) Say the name of the symbol and where it should be placed. |
(1) Draw the symbol where it should be placed |
(2) Menu select the symbol and indicate where it should be placed |
Enter sketches or illustrations |
(2) Verbally describe the sketches or illustrations |
(1) Draw the sketches or illustrations |
(3) Create the sketch by pressing keys to move a cursor that leaves a trail (similar to Etch-a-sketch) |
Select objects. Object selection is easy with a pen—just point to or circle the desired object. When using voice, just say the name of the desired object, as long as the object has a name. With a keyboard, press keys to position the cursor on the desired object and press the select key.
Enter text. Each of the three modes can be used for text entry—the user speaks words into a microphone, handwrites the words using a pen, or presses keys on a keypad to spell the words. Most users can speak and write easily. However, some training and practice may be necessary to use a keyboard efficiently.
Enter symbols. Entering mathematical equations, special characters, and signatures is easy with a pen, awkward and time-consuming with a mouse, and most difficult with speech.
Enter sketches or illustrations. Drawing simple illustrations and maps is easy with a pen, awkward with a mouse, and nearly impossible with speech. When speaking, users must verbally describe the illustration or map.
Each input mode has its strengths and weaknesses. Voice is good for describing attributes. The pen is good for pointing and sketching. Keys are good for entering text, numbers, and symbols. A useful and efficient multimodal system uses the appropriate mode for each entry.
*** Guideline: Use the easiest available mode for each task.
Guideline examples
Different physical devices exhibit different usability characteristics. The device’s size, shape, and weight effect how it may be used. Most important, placement of microphone and speaker, size of keys in a keypad, and size of display and writing surface all affect the ease with which a user can enter information by speaking, writing or pressing keys.
Physical device characteristics may restrict the modes used to enter data.
Table 2: Device usability issues for the three popular modes for information entry
Device Usability Issues |
Mode
|
||
Voice |
Pen |
Keys |
|
Required number of user hands |
None (plus possibly one to hold the device)
|
One (plus possibly one to hold the device) |
One or two |
Required use of eyes |
No |
Yes |
Yes |
Portable |
Yes, especially when walking |
Yes, but difficult while walking |
Yes, but difficult while walking |
Noisy Environment |
Works poorly in a noisy environment |
Works well in a noisy environment |
Works well in a noisy environment |
Poor lighting |
Works well in poor lighting |
May work poorly in poor lighting |
May work poorly in poor lighting |
Other environmental concerns |
Works well independently of user wearing gloves |
Does not work well when users must wear thick gloves |
Does not work well when users must wear thick gloves |
Privacy |
No, with speakers. Yes, with earphones |
Yes |
Yes |
Acceptable in meetings |
No, with speakers. Yes, with earphones. |
Yes |
Becoming acceptable |
Required use of hands. No hands are needed to speak and listen to a voice user interface. A pen requires one hand to hold the pen. A 12-key keypad requires one hand to enter data, and a QWERTY keypad requires two hands to enter data efficiently. By their nature, handheld devices may require a hand to hold the device. Some users become skilled at holding a small QWERTY keyboard with both hands and using their thumbs to type. So in reality, voice may require one hand and pen and a 12 key keypad requires may require two hands.
**** Guideline: Use speech when the user’s hands are busy.
Guideline examples:
Required use of eyes. Usually, users must look at what they are writing with a pen or typing on a keypad. However, the user’s eyes are free to observe his or her environment while speaking
**** Guideline: Use speech when the user’s eyes busy.
Guideline examples:
Portable. Speech and pen devices are very portable. Users may use them while sitting, standing, walking, and sometimes while running. Traditionally, keyboard devices are used only while sitting. Keypads requiring only one hand, like those frequently found on handheld devices and telephones, can be used while sitting or standing.
**** Guideline: Use speech if the user is walking
Guideline examples:
*** Guideline: Use voice to provide commentary or help . Voice is more immediate and does not obscure screen contents.
Guideline examples:
Environmental guidelines suggest that some modes can not be effectively used in certain environmental conditions such as low lighting, high noise, extreme cold, and other environmental conditions.
Environmental concerns. Speech recognition systems often make mistakes if the user speaks in a noisy environment.
**** Guideline: Use a pen or keys in a noisy environment.
Guideline examples:
Pen and keyboard devices are difficult if the user must wear thick gloves, such as in a cold environment or when protecting their hands from rough objects.
**** Guideline: Use voice if the user must wear heavy gloves.
Guideline example:
Social customs among people suggest guidelines for user interfaces between people and the computer.
Privacy. Speech presented with a speaker is not private. Others in close proximity can hear both the user and the computer. The keyboard/mouse and pen provide greater privacy.
**** Guideline: Make pen or keys available for use if privacy is desired.
Guideline examples:
Acceptable in meetings. Pen devices are acceptable in meetings—they replace a pen and pad of paper for taking notes. Keyboards and keypads are becoming acceptable with the widespread use of laptops. Usually, devices that speak or are spoken to are not acceptable in meetings without the use of earphones, and in some cases, earphones may infer that the user is not interested in the current discussion taking place in the meeting.
**** Guideline: Use pen or keys (with the key clicking sound turned off) if the device will be used during a business meeting.
Principles of conversational discourse suggest guidelines for the nature, content, and format of information exchanged between two humans may be applied to information exchanged between a human and a computer.
Reflexive principle. The reflexive principle states that people tend to respond in the same manner that they are prompted. For example, if users are prompted with long rambling prompts, they will likely respond with long rambling responses.
* Guideline: Users tend to use the same mode that was used to prompt them.
Guideline examples:
The backup mode principle. If an input mode fails, the user should be able to use a different mode as backup.
**** Guideline: support at least two input modes so each input mode can be used when the other can not be used.
Guideline examples :
Users need feedback to determine whether the computer is processing input data, is waiting for input, or is malfunctioning.
**** Guideline: Always indicate whether the computer is busy or idle.
Example Guideline:
Mode |
Idle |
Busy |
Text |
“Ready for next input” |
“Processing, please wait” |
Icons |
Green |
Red |
Audio |
Silence |
Clicking clock or percolating coffee pot |
If prompts are worded inconsistently, then users must pause to decode each wording format. Users must spend additional time and mental effort to respond to questions that are worded differently from one another. When prompts are consistently worded, users are able to supply the answers quickly.
*** Guideline: Phrase all prompts consistently
Guideline example:
1. Speak the name ofthe menu or form item. The menu name serves as a landmark. A landmark is a speech or non-speech cue that marks a specific location within the dialog structure. By providing a name such as “main menu” or “thermostat,” callers can jump to this menu by speaking the menu name or callers can return to this menu when they get confused or lost. Also, repeating the menu name to the caller confirms that the caller has reached the correct menu. However, if the name is contained within the question and is not needed as a landmark, then skip speaking the name.
2. Ask a question. Often, this can be achieved with two or three words. This should be enough to remind experienced callers to respond without listening to the enumerated options. Novice callers will listen to the enumerated options before speaking their selection.
3. Enumerate options. If there are a small number of valid responses, then list the options so novice callers can hear and select their desired options. However, if the user is likely to know the set of valid responses, then skip the operation.
Average callers can barge-in after they hear the question, while novice callers will respond after they hear the entire menu option list.
Switching modes can be jarring and sometimes surprising. For example, a user who has just answered three verbal questions will be surprised if a textual question suddenly pops up.
*** Guideline: Do not switch modes without good reason .
Guideline examples:
TV and movie directors set the mood with sets, lighting, and background music. Screen layout, colors, and background music can also create moods in multimodal user interfaces.
** Guideline: Use audio and graphics design to set the mood and convey emotion in games and entertainment applications . However, moods and emotion may not be appropriate in productivity applications.
Example guidelines:
Grade school teachers always teach that organizing your thoughts before writing a composition will dramatically improve its understanding. The same principle applies to user interfaces. Guidelines for organizing information and transitioning between topics will improve the voice user interface.
* Guideline: Use audio to indicate structure.
Example guidelines: There is no standard assignment of meanings to sounds, so common sense and user testing should guide the dialog designer. Here are some suggestions for items that lend themselves to non-speech sounds:
** Guideline: Use pauses to divide spoken information into natural “chunks. ”
Guideline examples
* Guideline: Use animation and sound to show transitions.
Guideline example
Normally, human short-term memory holds only to 7±2 items. So replace long verbal presentations by other visual presentations.
*** Guideline: Use the screen to ease stress on the user’s short-term memory.
Guideline examples:
Techniques from the field of advertising can be applied to user interfaces.
* Guideline: Use animation to attract the user’s attention.
Guideline example:
Animate the delivery of important events and messages so users will notice them. Often this type of animation is accompanied with sound, which also attracts the users’ attention.
Company brands and logos are quickly recognized by users, who unconsciously recognize brands based upon previous experiences with products and services offered by the company. A well-recognized brand name encourages users to purchase associated goods and services. Sound can be an important part of a brand or logo.
* Use audio to create an awareness of company brands and logos.
Guideline example: