Error-Handling Strategy for Multi-Modal Conversational Agent

Recognition

This paper is rewarded as the best paper of UIST 2020, which is one of the top venues in Human-Computer Interaction.

My Contribution

Collaborated with Computer Science Ph.D. candidate Toby Li,  I contributed as the second author worked on the concept ideation, literature review, interaction design, and user testing.

Abstract

A major problem in task-oriented conversational agents is the lack of support for the repair of conversational breakdowns. Prior studies have shown that current repair strategies for these kinds of errors are often ineffective due to: (1) the lack of transparency about the state of the system’s understanding of the user’s utterance; and (2) the system’s limited capabilities to understand the user’s verbal attempts to repair natural language understanding errors. This paper introduces SOVITE, a new multi-modal (speech plus direct manipulation) interface that helps users discover, identify the causes of, and recover from conversational breakdowns using the resources of existing mobile app GUIs for grounding. SOVITE displays the system’s understanding of user intents using GUI screenshots, allows users to refer to third-party apps and their GUI screens in conversations as inputs for intent disambiguation, and enables users to repair breakdowns using direct manipulation on these screenshots. The results from a remote user study with 10 users using SOVITE in 7 scenarios suggested that SOVITE’s approach is usable and effective.