Multimodal Interaction with W3C Standards: Towards Natural User Interfaces to Everything

Editor: Deborah A. Dahl, Conversational Technologies
published by Springer

possible connected things -- appliances, clothing, pet collar, rice cooker, etc.

Contact the editor

About this book

From tiny fitness trackers to huge industrial robots, we are interacting today with devices with shapes, sizes, and capabilities that would have been hard to imagine when the traditional graphical user interface (GUI) first became popular in the 1980’s. It is becoming increasingly apparent that the decades-old GUI interface is a poor fit for today’s computer-human interactions, as we move farther and farther away from the classic desktop paradigm, with input limited to mouse and keyboard, and a large screen as the only output modality. While the growth of touch interfaces has been especially dramatic, we are now also starting to see applications that make use of many other forms of interaction, including voice, handwriting, emotion recognition, natural language understanding, and object recognition.

As these forms of interaction (modalities) are combined into systems, the importance of having standard ways for them to communicate with each other and with application logic is apparent. The sheer variety and complexity of multimodal technologies makes it impractical for most implementers to handle the full range of possible modalities (current and future) with proprietary API's.

To address this need, the World Wide Web Consortium (W3C) has developed a comprehensive set of standards for multimodal interaction which are well-suited as the basis of interoperable multimodal applications. However, most of the information about these standards is currently available only in the formal standards documents, conference presentations, and a few academic journal papers. All of these can be hard to find, and are not very accessible to most technologists. In addition, papers on applications that use the standards are similarly scattered among many different resources.

This book addresses this gap with clearly-presented overviews of the full suite of W3C multimodal standards. In addition, to illustrate the standards in use, it includes case studies of a number of applications that use the standards. Finally, a future directions section discusses new ideas for other standards as well as new applications.

To buy

Springer website


Topics covered

Overviews of the following standards:

  1. Multimodal Architecture and Interfaces -- building applications from multiple modalities
  2. Discovery and Registration – finding and integrating components into dynamic systems
  3. EMMA: Extensible Multimodal Annotation—representing user inputs from speech recognition, natural language understanding, handwriting recognition, gesture, and camera.
  4. EmotionML: Emotion Markup Language – representing human emotions
  5. Creating an MMI Architecture-compliant modality component: Modality Component design best practices (see for the standards documents for items 1-6)
  6. Voice Standards: Handling speech: VoiceXML, SSML, SRGS, SISR, PLS
  7. SCXML: State Chart XML—declarative handling of events with a state machine
  8. WebRTC: Web Real Time Communications: handling media on the web
(see for the standards documents for items 7-8)
(see for the standards documents for item 9)


Applications using the W3C standards for multimodal interaction:

  1. Applications that make use of the standards listed above in multimodal applications.
  2. Implementations of the standards, including but not limited to open source implementations.
  3. Evaluations of systems using the standards, including interoperability testing.

Future directions:

  1. The evolution of multimodal standards
  2. Where new standards are needed
  3. Integration with related standards

Publication Date: November, 2016

Contact the editor