Human-Computer Interaction 2000

August, 2000


Designing the User Interface for Multimodal Speech and Pen-based Gesture Applications: State-of-the-Art Systems and Future Research Directions (paper, figures)
Sharon Oviatt, Oregon Graduate Institute of Science and Technology, Phil Cohen, Oregon Graduate Institute, Lizhong Wu, HNC Software, John Vergo, IBM T. J. Watson Research, Lisbeth Duncan, The Boeing Company, Bernhard Suhm, BBN Technologies, Josh Bers, BBN Technologies, Thomas Holzman, NCR, Terry Winograd, Stanford University, James Landay, University of California at Berkeley, Jim Larson, Intel, & David Ferro, Unisys

ABSTRACT

The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of human-computer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applications, to be usable by a broader spectrum of the average population, and to function more reliably under realistic and challenging usage conditions. In this paper, we summarize the emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner- including early and late fusion approaches, and the new hybrid symbolic/statistical approach. We also describe four diverse state-of-the-art multimodal systems that process users' spoken and gestural input. These applications range from map-based and virtual reality systems for engaging in simulations and training, to field medic systems for mobile use in noisy environments, to standard text-editing applications that will reshape daily computing and have a significant commercial impact. To realize successful multimodal systems of the future, among the key research challenges that remain to be addressed are the development of new multimodal interface concepts, more robust architectures and better error handling techniques, and intelligently adaptive multimodal architectures. Before this new class of systems can proliferate, multimodal research infrastructure and software tools also will be needed to promote the development of both simulated and functioning systems.