Interactively Skimming Recorded Speech

Barry Arons, MIT Media Lab
barons@media.mit.edu

Seminar on People, Computers, and Design
Stanford University April 7, 1995

 

Listening to a speech recording is much more difficult than visually scanning a document because of the transient and temporal nature of audio. Audio recordings capture the richness of speech, yet it is difficult to directly browse the stored information. This work investigates techniques for structuring, filtering, and presenting recorded speech, allowing a user to navigate and interactively find information in the audio domain. This research makes it easier and more efficient to listen to recorded speech by using the SpeechSkimmer system.

This talk will briefly review Hyperspeech, a speech-only hypermedia system that explores issues of speech user interfaces, browsing, and the use of speech as data in an environment without a visual display. The system uses speech recognition input and synthetic speech feedback to aid in navigating through a database of digitally recorded speech. This system illustrates that managing and moving in time are crucial in speech interfaces. Hyperspeech uses manually segmented and structured speech recordings-a technique that is practical only in limited domains.

This talk will focus on SpeechSkimmer, a user interface for interactively skimming speech recordings. SpeechSkimmer uses simple speech processing techniques to overcome the limitations of Hyperspeech while allowing a user to hear recorded sounds quickly, and at several levels of detail. This work exploits properties of spontaneous speech to automatically select and present salient audio segments in a time-efficient manner. User interaction, through a manual input device, provides continuous real-time control of the speed and detail level of the audio presentation. SpeechSkimmer incorporates time-compressed speech, pause removal, automatic emphasis detection, and non-speech audio feedback to reduce the time needed to listen. This research presents a multi-level structural approach to auditory skimming, and user interface techniques for interacting with recorded speech.

 

Barry Arons developed "Phone Slave" and the "Conversational Desktop", explorations of highly interactive conversational answering machines and office environments, at the MIT Architecture Machine Group and Media Laboratory. He integrated speech and natural language processing technologies while a member of the technical staff at Hewlett Packard Laboratories, and developed a workstation-based audio server and applications while a research scientist and project leader at Olivetti Research California.

 

Titles and abstracts for all years are available by year and by speaker.

For more information about HCI at Stanford see

Overview Degrees Courses Research Faculty FAQ