By Zhuxiaona Wei (Nina) and James A. Landay
Described in Z. Wei and J. A. Landay, “Evaluating Speech-Based Smart
Devices Using New Usability Heuristics,” in IEEE Pervasive Computing, vol. 17, no. 2,
pp. 84-96, 2018.
doi:10.1109/MPRV.2018.022511249
url:doi.ieeecomputersociety.org/10.1109/MPRV.2018.022511249
·
Create
an illusion by being consistent.
·
Make
sure to do this without being distracting.
· Use verbal, sound, or multimodal feedback.
·
Communicate
delays immediately and give feedback while “busy”.
· Use words, phrases and concepts familiar to
end users, rather than system-oriented or technical jargon.
· Use a wake word to start a conversation, but
don’t require it again in the same conversation.
·
Gracefully
end conversations when the user is done.
If the user doesn’t speak for a 5-10 seconds, end the conversation and give feedback
(e.g., a distinctive tone or lights) to indicate the conversation is done.
·
Leverage
user input when it can be used as a parameter to a command.
User: “Find a flight to San Francisco on
June 25th”
Agent: “Searching for flights to San Francisco on June 25th…”
·
Remember
what the user has said in the current conversation.
User: “What is the weather like in San Jose
today?”
Agent: “San Jose will be sunny with a high of 72 degrees today.”
User: “How about tomorrow?”
Agent: “It will be sunny in San Jose tomorrow.”
· Use context you already know about the user to
fill in fields, but confirm them.
User: “Make a reservation for 2 tonight at
Jack’s at 8pm”
Agent: “Ok, I made a reservation for John Smith for 2 people at Jack’s tonight
at 8.”
· Use context to respond intelligently (e.g.,
location / environment, time constraints, number of users, identity / age of
users).
User: “What will the weather be like
today?”
Agent: “The weather in Palo Alto will be sunny today. The high will be 77, and
the low will be 58.”
· Use discourse markers as part of confirmations
and prompts to make conversation more natural (e.g., “next”, “and”, “so”,
“actually”, “sure”, “ok”, “got it”).
·
Leverage
prosody, including rhythm, tone, pauses, emphasis, discourse fillers (e.g.,
“uh”, “uhm”, “hmm”, “ah”, “like”).
·
Don’t
always prompt for everything all at once.
· Take turns and don’t let instructions get in
the way.
·
Give
users a chance before jumping in.
·
Users
prefer agents that have conversational style similar to their own — match it.
·
Match
the user’s emotion, gender, and personality.
Ensure the emotion and personality of the
voice is consistent with the emotion and personality implied in the words.
Be cautious about matching extreme emotions.
·
Users
prefer agents that have conversational style similar to their own — match it.
·
Guide
subtly using natural affordances rather than explicitly.
User: “Check my email.”
Agent: “You have two new emails. The first email is …”
Agent: “Repeat, reply or go to the next one?”
·
Guide
user towards desired response and cue the user what type of response is
desired.
User: “Trivia Game”
Agent: “Choose from the following: capitals of the world, sports stars, or invetors”
·
Allow
data to be naturally given in response to single or multiple prompts.
User: “Get a reservation at Ding Tai Fang
for tonight.”
Agent: “For what time at Ding Tai Fang?”
User: “7PM.”
Agent: “How many people tonight at 7?”
User: “Three.”
Agent: “Ok, I made a reservation for John Smith for 3 people at Ding Tai Fang
tonight at 7.”
·
Teach
multiple possible ways of asking for a result.
· Use examples in a natural manner rather than
teaching commands explicitly.
·
Clear
but succinct.
·
Keep
lists of items short (3-5 max.), and let people ask if they want to hear more.
· Let experienced users have faster and shorter
prompts.
·
Confirm
input implicitly through results or next prompt.
User: “What time is the Lakers game on
tonight?”
Agent: “The Lakers game is at 7:30PM tonight on NBC.”
·
Confirm
irreversible or critical actions explicitly and even allow undo after
confirmation.
Actions involving other people (e.g.,
sending a text message or email)
Actions that can be seen publicly (e.g., posting on public social media)
Actions involving financial transactions (e.g., transferring funds or buying
something)
·
High:
Do it and tell me
·
Moderate:
Confirm input
· Low: Re-prompt (“Say that again?”)
·
Lights
·
Graphic
displays
·
Sounds
·
Escalate
detail in prompts when input is ambiguous or incorrect.
· If input results in multiple hypotheses, let
user select from list with “yes” / “no”.
· For error correction, use a different modality
or voice response style (e.g., select from a list).
·
Vary
(error) prompt wording on re-prompts.
·
Don’t
blame the user for errors (don’t say: “that was not a valid response”).
·
Don’t
show mock concern (don’t say: “I’m sorry. I did not understand the response I
heard.”).
· Use a special escape word globally (e.g.,
“Stop”).
User: “What’s the weather in San
Francisco?”
Agent: “Movies playing today in San Francisco include Titanic, The Godfather…”
User: “Stop.”
· Use non-speech methods when speech fails
(e.g., push a physical button).