PCD 2/10/93 Lee

Towards Conversational Computers

Kai-Fu Lee, Speech and Language Technologies, Advanced Technology Group, Apple Computer, Inc.
kai-fu_lee@gateway.qm.apple.com

Seminar on People, Computers, and Design
Stanford University February 10, 1993

One of the grand challenges of Computer Science is to build a "conversational computer," or a computer that communicates with its user using a spoken interface. Apple's vision of a conversational computer was shown a few years ago, in a video called the "Knowledge Navigator."

While delivering the Knowledge Navigator remains elusive, researchers have made tremendous progress in speech and language technologies in the last decade. Speech recognition systems boast a 10-fold error rate reduction. Speech synthesis systems are approaching human-level intelligibility, in spite of their lack of naturalness. The capabilities and limitations of Natural Language Processing and Artificial Intelligence are much better understood. At Apple, we have recently integrated these technologies, and created a spoken language interface on the Macintosh called "Casper."

I will show both the Knowledge Navigator vision video, and the Casper Demonstration video. Then, I will describe the challenges and breakthroughs needed to bridge the gap between Casper and the Knowledge Navigator. Finally, I will argue that pervasive use of spoken interfaces will arrive well before the perfection of the Knowledge Navigator, and give examples of where speech technologies will be used in the next decade.

Kai-Fu Lee received his B.A. in computer science (summa cum laude) from Columbia University in 1983, and Ph.D. in computer science from Carnegie Mellon University in 1988. Dr. Lee is known for his dissertation work on Sphinx, the most accurate speaker-independent, continuous speech system in the world. Dr. Lee joined Apple Computer in 1990 as the Principal Speech Scientist. Dr. Lee currently manages Apple's Speech & Language Technologies Group. This group has developed PlainTalk, a recognizer that delivers Sphinx-level performance on personal computers, Casper, a Macintosh speech interface, and GalaTea, a general-purpose high-quality text to speech system. In addition to these areas, Dr. Lee is also forming a group in natural language processing.

Titles and abstracts for all years are available by year and by speaker.

For more information about HCI at Stanford see