November/December 2001

Assistive Technology

A Single Interactive Help Session Is Worth a Thousand Words of Tutorial

By James Larson

When I was a teen-ager learning to drive, I watched a training film about how to parallel park. But when I sat behind the steering wheel, I forgot many instructions from the film. Tutorials often are not effective, especially if they contain many details that the user is expected to remember.

So my dad, a very patient man, taught me to parallel park. He gave me step-by-step instructions as I maneuvered the car into the parking place. After that experience, I remembered the details of parallel parking much better because I performed the steps as I received instructions. I also learned that interactive help is much more effective than tutorials.

Let's apply this lesson to the design of voice dialogs. Consider the following VoiceXML menu containing a long tutorial that tells the user how to select from among options:

‹ menu ›

‹ prompt ›

Welcome to the Ajax Bank online system. First, you will need to say the name of the transaction you wish to perform.

To get money from your account, you will need to say, "Withdraw."

To place money into your account, you will need to say, "Deposit."

To move money between two of your accounts, you will need to say, "Transfer."

Speak the type of transaction you wish to perform now.

‹ /prompt › ‹ choice next = "#withdraw" › withdraw ‹ /choice › ‹ choice next = "#deposit" › deposit ‹ /choice › ‹ choice next = "#transfer" › deposit ‹ /choice ›‹ /menu ›

While the tutorial does need to cover all situations and contingencies, few users will remember the commands "withdraw," "deposit" and "transfer" in all this verbage. Instead, consider the following brief prompt:


Ajax Bank. What transaction would you like to perform? Withdraw, deposit or transfer?


This prompt is brief so experienced users do not get bored or angry listening to a long tutorial, especially if they have heard the tutorial hundreds of times. This prompt contains three simple elements: (a) a landmark ("Ajax Bank") so the user knows with which system the user is speaking, (b) a brief instruction telling the user what to do next ("What transaction would you like to perform?"), and (c) a list of options reminding the user of what the user can say ("Withdraw, deposit or transfer?"). Experienced users can save time by barging in after they hear the landmark to speak the desired option while the computer is still speaking.

After hearing the complete prompt, most users new to the Ajax Bank online system can respond correctly, especially if the user is familiar with other prompts using a similar format containing the three major elements: landmark, system name and list choices. However, some novice users will not respond to the prompt in the prespecified time limit, will respond incorrectly or will ask for help. Developers using VoiceXML define how to handle these situations using a ‹catch› element:

‹catch namelist = "nomatch, mismatch, help"›


To get money from the Ajax bank, say "Withdraw."

To deposit money into Ajax bank, say "deposit."

To transfer money between accounts in Ajax bank, say "Transfer."

‹/prompt› ‹/catch›

If the user fails to respond in the prespecified time limit ("no match"), responds incorrectly ("mismatch") by saying something other than the three commands, or asks for help, the catch element prompts the user with additional information.

In the example above, the instructions are very specific to the situation, just as my dad gave me very specific instructions at each step of parallel parking.

Verbal tutorials do not work well in conversational user interfaces. Instead, the "just-in-time" help of the type enabled by VoiceXML's element lets experienced users avoid detailed instructions, yet presents detailed instructions to novice users when needed. This step-by-step approach lets novice users learn how the system works as the users become experts at their own pace.

James A. Larson, Ph.D., is chairman of the W3C Voice Browser Working Group. He is the author of Developing Speech Applications Using VoiceXML and teaches courses in user interfaces and speech applications at Portland State University and Oregon Health and Sciences University. Larson is coordinating the VoiceXML sessions at SpeechTEK 2001. He may be contacted at