Conversational Technologies

Home

News

Application Questionnaire

Conferences and Workshops

About Us

Resources

Multimodal Interaction

EMMA

InkML

Chrome Translation

Speech Innovations

Voice Training in Africa

Speech for Seniors

Speech Technology Consulting Services

Useful Speech, Multimodal, and Natural Language Links

Last updated: August 2, 2011

Suggestions for additional resources

The EMMA Standard

EMMA (Extensible MultiModal Annotation) is a standard published by the W3C for representing multimodal user inputs, including speech, text, and handwriting.

NL Workbench simple tools for exploring Statistical Language Models and tagged grammars which includes an open source implementation of EMMA developed by Conversational Technologies.

Speech Technology Resources

A directory of speech technology related websites.

Open Directory for Speech Technology

Standards and Information about Standards

World Wide Web Consortium: http://www.w3.org

  • Voice Extensible Markup Language (VoiceXML 2.0): http://www.w3.org/TR/voicexml20/
  • Voice Extensible Markup Language (VoiceXML 2.1): http://www.w3.org/TR/voicexml21/
  • Voice Extensible Markup Language (VoiceXML 3.0): http://www.w3.org/TR/voicexml30/
  • W3C Speech Synthesis Markup Language (SSML): http://www.w3.org/TR/speech-synthesis/
  • W3C Speech Recognition Grammar Specification (SRGS): http://www.w3.org/TR/speech-grammar/
  • Multimodal Interaction Working Group home page: http://www.w3.org/2002/mmi/
  • W3C Multimodal Architecture: http://www.w3.org/TR/mmi-arch/

VoiceXML tutorials and training

  • Building VoiceXML Dialogs http://www.developer.com/voice/article.php/3394911
  • Exploring the Distributed Web-Based Application Model and Advanced Features of VoiceXML
  • http://www.developer.com/voice/article.php/3405191
  • Natural vs. Direct Dialog and How VoiceXML Enables Both http://www.developer.com/voice/article.php/3413361
  • Training is available from the organizations listed on the VoiceXML Forum training page.

Quick Guide to the XML SRGS Grammar format (download, 48k )

IETF SpeechSC: The SpeechSC Work Group is developing protocols (Media Resources Control Protocol) to support distributed media processing of audio streams, http://www.ietf.org/html.charters/speechsc-charter.html

Books

Abbott, K. R. (2001). Voice Enabling Web Applications: VoiceXML and Beyond, APress.

Andersson, E. A., S. Breitenbach, et al. (2001). Early Adopter VoiceXML. Birmingham, UK, Wrox Press.

Balentine, B. and D. Morgan (1999).How to build a speech recognition application. San Ramon, California, Enterprise Integration Group.

Balentine, B. (2007) It's better to be a good machine than a bad person. Annapolis, MD, ICMI Press.

Beasley, R., K. M. Farley, et al. (2002). Voice Application Development with VoiceXML, Sams.

Deborah Dahl, Editor. Practical Spoken Dialog Systems. Springer-Verlag, 2005.

Gardner-Bonneau, D. (1999). Human Factors and Voice Interactive Systems. Boston, Kluwer Academic Publishers.

Harris, R. A. (2005)." Voice Interaction Design, Morgan Kaufmann.

Hocek, A. and D. Cuddihy (2002)." Definitive VoiceXML, Prentice-Hall.

Kotelly, B. (2003). The Art and Business of Speech Recognition. Addison-Wesley.

Larson, J. A. (2002). VoiceXML: Introduction to developing speech applications. Upper Saddle River New Jersey, Prentice Hall.

Meisel, W.(2006) VUI Visions: Expert views on effective voice user interface design, Trafford Publishing.

Miller, M. VoiceXML: 10 projects to voice-enable your web site, John Wiley and Sons.

Reeves, Byron, and C. Nass (1996) The Media Equation, Cambridge University Press.

Sharma, C. and J. Kunin (2002)." VoiceXML. New York, John Wiley and Sons, Inc.

Shukla, C., A. Dass, et al. (2002). VoiceXML 2.0 Developer's Guide: Building Professional Voice-enabled Applications with JSP, ASP & Coldfusion. New York, McGraw-Hill Osborne Media.

Applications for People with Disabilities

  • Autism Language Therapies: Software for teaching language to autistic children, http://www.autism-language-therapies.com
  • Sentence Shaper from Psycholinguistic Technologies, Inc.: Software that allows people with aphasia to construct spoken messages.
  • Emacspeak: Emacspeak provides complete eyes-free access to daily computing tasks, http://emacspeak.sourceforge.net/

Industry Conferences

  • Voice Search: http://www.voicesearch.com/
  • SpeechTek: http://www.speechtek.com/

Industry Organizations

  • American Voice Input Output Society: http://www.avios.com/
  • Midwest Speech Technology Association: A professional association for members of the speech technology community, http://www.midwestspeech.org
  • VoiceXML Forum: http://www.voicexmlforum.org

Speech Recognition Technology

  • ViaVoice
  • Nuance Communications
  • Loquendo
  • Telisma
  • LumenVox
  • Novauris
  • Microsoft

Text to Speech Technology

  • ATT Natural Voices
  • Nuance Communications
  • Loquendo
  • Cepstral
  • IBM

Speech Analytics

Speech Analytics refers to software that analyzes speech and gets various types of useful information from it. Examples would be keyword spotting in broadcast speech and analysis of calls between customers and agents in a call center. Some companies that work in this area include:

  • Nexedia
  • BBN
  • Call Miner
  • SER

Speaker Verification

A biometric technology for verifying that someone is who they claim to be based on characteristics of their voice.

  • Nuance Communications
  • Persay
  • Loquendo

VoiceXML Information

  • See the list of platforms on the W3C Voice Browser Group Home Page: http://www.w3.org/Voice/
  • VoiceXML World: news and information about VoiceXML
  • VoiceXML Forum: an industry organization which promotes VoiceXML

University Research Centers

  • Carnegie-Mellon University
  • Carnegie-Mellon's comprehensive list of speech technology resources.
  • MIT Media Lab
  • MIT Spoken Language Systems Group
  • Rutgers University Speech Group
  • Oregon Graduate Institute: Center for Spoken Language Understanding
  • University of Colorado: Clear: Computational Language and Education Research
  • University of Edinburgh Centre for Speech Technology Research