September/October 2002

The What, Why and How of Usability Testing

Technology Trends: The What, Why and How of Usability Testing

By Dr. James A. Larson

As the cartoon illustrates, users become frustrated when speech applications don’t work. Testing minimizes this frustration by detecting and resolving many speech application problems before they cause user frustration.

What do usability tests measure?

Developers use two types of metrics (measurements) in usability testing—performance and preference:

Why is usability testing important?
Performance and preference metrics are important during and after the development of speech applications for the following reasons:

How do developers conduct usability tests?

Developers insert commands throughout the application to capture the times and names of interesting events such as when the application presents a prompt to the user. The user responds to the prompt successfully, fails to respond to the prompt, or the user responds to the prompt inappropriately. The VoiceXML browser records these events in a log file. A report generator summarizes and calculates the indicators for each performance metric. The developer quickly determines which performance metrics are satisfied and which portions of the application need refinement. Most VoiceXML system development environments support the function and one or more report generators.

Preference testing is accomplished by collecting preference scores from users after they test the system. Developers collect this data by interviewing users immediately after testing the application and asking the user to score the various preference criteria. This can be labor intensive. Alternatively, users are asked to enter preference scores onto a paper questioner, a Web page, or a verbal VoiceXML form. VocaLabs,, provides a service that conducts usability tests and collects preference data via visual Web pages.

How much usability testing is enough? Jakob Nielsen,, suggests that elaborate usability tests are a waste of resources. The best results come from testing no more than 5 users and running as many small tests as you can afford. The first test identifies several problems. As Nielsen says, “The difference between zero and even a little bit of data is astounding.” The second and third tests will provide more usability. As more and more users are tested, you learn less and less because you keep seeing the same problems again and again.

I don’t want to deal with frustrated users like General Knox in the cartoon. A little usability testing goes a long way toward keeping users happy.

Dr. Jim Larson works for Intel and chairs the W3C Voice Browser Working Group. His new book, VoiceXML: Introduction to Building Speech Applications, has just been published by Prentice Hall.