Speech Is 3x Faster than Typing for English and Mandarin Text Entry on Mobile Devices

With the ubiquity of mobile devices like smartphones, two new widely used methods have emerged: miniature touch screen keyboards and speech-based dictation. It is currently unknown how these two modern methods compare. We therefore evaluated the text entry performance of both methods in English and in Mandarin Chinese on a mobile smartphone. In the speech input case, our speech recognition system gave an initial transcription, and then recognition errors could be corrected using either speech again or the smartphone keyboard.

We found that with speech recognition, the English input rate was 3.0x faster, and the Mandarin Chinese input rate 2.8x faster, than a state-of-the-art miniature smartphone keyboard. Further, with speech, the English error rate was 20.4% lower, and Mandarin error rate 63.4% lower, than the keyboard. Our experiment was carried out using Baidu's Deep Speech 2, a deep learning-based speech recognition system, and the built-in Qwerty or Pinyin (Mandarin) Apple iOS keyboards. These results show that a significant shift from typing to speech might be imminent and impactful. Further research to develop effective speech interfaces is warranted.

This study was conducted by researchers from Stanford University, University of Washington, and Baidu.

Paper

An (arXiv) paper describing our study can be viewed here.

Video

Check this interesting video about our study on YouTube: Stanford experiment shows speech recognition writes text messages more quickly than thumbs, produced and owned by Stanford University.

Dataset

Our app automatically logged all pertinent user behaviors during the experiment. In addition, we logged timestamps with these actions so as to have a record of the time at which the action occurred. There are 32 log files and each contains 50 trials for the speech input method and 50 trials for the keyboard input method. Hence, we have 3200 data points from the study. All CSV files can be downloaded at the dataset page.

Measures

We present and discuss a series novel empirical measures of text entry performance such as entry rates, error rates, speech-specific measures, and participants' subjective ratings. The left one shows a box plot of words per minute and the right one is a box plot of total error rates. More plots can be viewed by clicking measure names in the navigation pane on the left: uncorrected error rates, corrected error rates, utilized bandwidth, NASA TLX, subjective ratings, speech-specific results.

This website is no longer updated. Please read our published work here. I recently gave a talk about this project at Ubicomp 2018. Slides can be found here.

Overview

Paper

Video

Dataset

Measures