Speech-based Conversational Agent Heuristics

By Zhuxiaona Wei (Nina) and James A. Landay

Described in Z. Wei and J. A. Landay, “Evaluating Speech-Based Smart Devices Using New Usability Heuristics,” in IEEE Pervasive Computing, vol. 17, no. 2, pp. 84-96, 2018.
doi:10.1109/MPRV.2018.022511249
url:
doi.ieeecomputersociety.org/10.1109/MPRV.2018.022511249

 

General

S1: Give agent persona through language, sounds and other styles

·  Create an illusion by being consistent.

·  Make sure to do this without being distracting.

S2: Make system status clear

·  Use verbal, sound, or multimodal feedback.

·  Communicate delays immediately and give feedback while “busy”.

S3: Speak user’s language

·  Use words, phrases and concepts familiar to end users, rather than system-oriented or technical jargon.

S4: Start and stop conversations

·  Use a wake word to start a conversation, but don’t require it again in the same conversation.

·  Gracefully end conversations when the user is done.

If the user doesn’t speak for a 5-10 seconds, end the conversation and give feedback (e.g., a distinctive tone or lights) to indicate the conversation is done.

S5: Pay attention to what the user said and respect their context

·  Leverage user input when it can be used as a parameter to a command.

User: “Find a flight to San Francisco on June 25th”
Agent: “Searching for flights to San Francisco on June 25th…”

·  Remember what the user has said in the current conversation.

User: “What is the weather like in San Jose today?”
Agent: “San Jose will be sunny with a high of 72 degrees today.”
User: “How about tomorrow?”
Agent: “It will be sunny in San Jose tomorrow.”

·  Use context you already know about the user to fill in fields, but confirm them.

User: “Make a reservation for 2 tonight at Jack’s at 8pm”
Agent: “Ok, I made a reservation for John Smith for 2 people at Jack’s tonight at 8.”

·  Use context to respond intelligently (e.g., location / environment, time constraints, number of users, identity / age of users).

User: “What will the weather be like today?”
Agent: “The weather in Palo Alto will be sunny today. The high will be 77, and the low will be 58.”  

Conversational Style

S6: Use spoken language characteristics

·  Use discourse markers as part of confirmations and prompts to make conversation more natural (e.g., “next”, “and”, “so”, “actually”, “sure”, “ok”, “got it”).

·  Leverage prosody, including rhythm, tone, pauses, emphasis, discourse fillers (e.g., “uh”, “uhm”, “hmm”, “ah”, “like”).

S7: Make conversation a back and forth

·  Don’t always prompt for everything all at once.

·  Take turns and don’t let instructions get in the way.

·  Give users a chance before jumping in.

S8: Adapt agent style to who the user is, how they speak, and how they are feeling

·  Users prefer agents that have conversational style similar to their own — match it.

·  Match the user’s emotion, gender, and personality.

Ensure the emotion and personality of the voice is consistent with the emotion and personality implied in the words.
Be cautious about matching extreme emotions.

Guiding, Teaching, and Offering Help

S9: Guide users through a conversation so they are not easily lost

·  Users prefer agents that have conversational style similar to their own — match it.

·  Guide subtly using natural affordances rather than explicitly.

User: “Check my email.”
Agent: “You have two new emails. The first email is …”
Agent: “Repeat, reply or go to the next one?”

·  Guide user towards desired response and cue the user what type of response is desired.

User: “Trivia Game”
Agent: “Choose from the following: capitals of the world, sports stars, or invetors

·  Allow data to be naturally given in response to single or multiple prompts.

User: “Get a reservation at Ding Tai Fang for tonight.”
Agent: “For what time at Ding Tai Fang?”
User: “7PM.”
Agent: “How many people tonight at 7?”
User: “Three.”
Agent: “Ok, I made a reservation for John Smith for 3 people at Ding Tai Fang tonight at 7.”

S10: Use responses as a way to help users discover what is possible

·  Teach multiple possible ways of asking for a result.

·  Use examples in a natural manner rather than teaching commands explicitly.

Feedback and Prompts

S11: Keep feedback and prompts short

·  Clear but succinct.

·  Keep lists of items short (3-5 max.), and let people ask if they want to hear more.

·  Let experienced users have faster and shorter prompts.

S12: Confirm input intelligently

·  Confirm input implicitly through results or next prompt.

User: “What time is the Lakers game on tonight?”
Agent: “The Lakers game is at 7:30PM tonight on NBC.”

·  Confirm irreversible or critical actions explicitly and even allow undo after confirmation.

Actions involving other people (e.g., sending a text message or email)
Actions that can be seen publicly (e.g., posting on public social media)
Actions involving financial transactions (e.g., transferring funds or buying something)

S13: Use speech recognition system confidence to drive feedback style

·  High: Do it and tell me

·  Moderate: Confirm input

·  Low: Re-prompt (“Say that again?”)

S14: Use multimodal feedback when available

·  Lights

·  Graphic displays

·  Sounds

Errors

S15: Avoid cascading correction errors

·  Escalate detail in prompts when input is ambiguous or incorrect.

·  If input results in multiple hypotheses, let user select from list with “yes” / “no”.

·  For error correction, use a different modality or voice response style (e.g., select from a list).

S16: Use normal language in communicating errors

·  Vary (error) prompt wording on re-prompts.

·  Don’t blame the user for errors (don’t say: “that was not a valid response”).

·  Don’t show mock concern (don’t say: “I’m sorry. I did not understand the response I heard.”).

S17: Allow users to exit from errors or a mistaken conversation

·  Use a special escape word globally (e.g., “Stop”).

User: “What’s the weather in San Francisco?”
Agent: “Movies playing today in San Francisco include Titanic, The Godfather…”
User: “Stop.”

·  Use non-speech methods when speech fails (e.g., push a physical button).