CS547 Human-Computer Interaction Seminar (Seminar on People, Computers, and Design)
Fridays 12:50-2:05 · Gates B01 · Open to the public- 20 years of speakers
- By year
- By speaker
- Videos: iTunesU · YouTube
|
April 7, 1995
Listening to a speech recording is much more difficult than visually scanning a document because of the transient and temporal nature of audio. Audio recordings capture the richness of speech, yet it is difficult to directly browse the stored information. This work investigates techniques for structuring, filtering, and presenting recorded speech, allowing a user to navigate and interactively find information in the audio domain. This research makes it easier and more efficient to listen to recorded speech by using the SpeechSkimmer system. This talk will briefly review Hyperspeech, a speech-only hypermedia system that explores issues of speech user interfaces, browsing, and the use of speech as data in an environment without a visual display. The system uses speech recognition input and synthetic speech feedback to aid in navigating through a database of digitally recorded speech. This system illustrates that managing and moving in time are crucial in speech interfaces. Hyperspeech uses manually segmented and structured speech recordings-a technique that is practical only in limited domains. This talk will focus on SpeechSkimmer, a user interface for interactively skimming speech recordings. SpeechSkimmer uses simple speech processing techniques to overcome the limitations of Hyperspeech while allowing a user to hear recorded sounds quickly, and at several levels of detail. This work exploits properties of spontaneous speech to automatically select and present salient audio segments in a time-efficient manner. User interaction, through a manual input device, provides continuous real-time control of the speed and detail level of the audio presentation. SpeechSkimmer incorporates time-compressed speech, pause removal, automatic emphasis detection, and non-speech audio feedback to reduce the time needed to listen. This research presents a multi-level structural approach to auditory skimming, and user interface techniques for interacting with recorded speech. |
|