January/February 2004

Help Users Speak

By Dr. James A. Larson

A speech application does not work if users do not speak when prompted with a question. Users might not speak because they do not expect to talk to a computer, they did not understand the question asked of them or they can not think of a good answer. Dialog designers use the following guidelines to encourage users to speak appropriate responses to prompts:

Inform users that they are expected to speak to the computer
After the user connects, announce the name of the application or service with words such as “Ajax Products. Please respond to each question by speaking.”

Use Whisper prompts
After asking a question, whisper possible answers. “What date? For example, January 13, 2004.” Whisper the second part of the prompt to the user by decreasing the volume or using a different voice.

Ask the question, then pause
Pausing after a direct question encourages the user to answer by speaking. For example, “What date? (pause) For example, January 13, 2004.” Some designers insert short beeps signaling when the user should speak, but I find these beeps annoying. A silent pause is a natural signal that the user should speak.

Taper prompts
If the user fails to respond appropriately to a prompt, rephrase the prompt with additional instructions. For example, Destination? Where do you want to travel to? You want to travel to which city?

Each prompt is more detailed and phrased differently. Rephrasing a question differently enables the user to understand the question and respond appropriately.

Skip a question and come back to it later
Do not let users get angry or discouraged. Skip a question if the user is having trouble. Get as much data as you can, and then revisit the question. Then the user may be in a different mind frame and be able to answer the question. If not, then transfer to a human operator who only needs to solicit the answers to the unanswered questions.

Recognize rather than generate
Some users find it easier to recognize and select the answer rather than “think up” the answer. This is one of the advantages of verbal menus. In general, menus should be short, with three or four options. If this is not possible, enable the user to “barge-in” when the user hears the desired option. For example, “Say the name of the state when you hear it: Alabama, California, Colorado,…” This is, in effect, a whisper prompt with a long whisper.

Spell and speak
It may not be possible to create a grammar that recognizes all of the possible answers to a question. One technique for spelling the response includes pressing the keys on a touchtone phone. However, users find it difficult to hunt for letters on the keypad and may become confused when the word they spell contains a Q or contains special marks, such as the German umlaut, which do not appear on the telephone keypad. Another approach is to use the English names of the letters of the alphabet. Unfortunately, many letters sound similar, such as B, C, D, E, G, P, T and P or M and N. Users could speak letters of the military alphabet (alpha, bravo, charlie, …), but many users cannot remember the exact words in the military alphabet. My favorite approach is to spell a word using the names of states or major U.S. cities, such as, “J as in Jamestown, I as in Iowa, M as in Minneapolis.”

Large grammars are used to recognize several words at the same time. For example, the city and state in an address, or the month, day and year in a date. However, if the user fails to provide all of the necessary words, then prompt for individual words by using simpler grammars with more specific prompts.

System: What city and state?
User: (no response)
System: What city?
User: Portland
System: What state?
User: Oregon

Sometimes, it is sufficient to simplify the prompt by asking an easier question and resolving any ambiguity when and if it occurs. For example, there is no need to ask the state if the city is New York City. However, if the city is Portland, a second prompt is needed to determine if users wants Portland, Oregon, or Portland, Maine.

Test, test and test
Always test each dialog with several potential users and note which prompts are troublesome. Continue to revise and test, making sure that no additional problems are induced, until most users are able to answer all questions quickly and easily.

These guidelines are useful for all speech applications. If you are a designer or developer, then refer to these guidelines when developing voice user interfaces. If you are a manager, verify that your speech applications follow these guidelines. Your applications work better if you help users speak the appropriate answer to each prompt.

Dr. James A. Larson is Manager of Advanced Human Input/Output at Intel Corporation, and author of the book, VoiceXML — Introduction to Developing Speech Applications. He can be reached at jim@larson-tech.com and his Web site is www.larson-tech.com.