Comparing Spoken Dialog Corpora Collected with Recruited Subjects versus Real Users
- Hua Ai ,
- Antoine Raux ,
- Dan Bohus ,
- Maxine Eskenazi ,
- Diane Litman
Proceedings of SIGdial 2007, Antwerp, Belgium |
Empirical spoken dialog research often involves the collection and analysis of a dialog corpus. However, it is not well understood whether and how a corpus of dialogs collected using recruited subjects differs from a corpus of dialogs obtained from real users. In this paper we use Let’s Go Lab, a platform for experimenting with a deployed spoken dialog bus information system, to address this question. Our first corpus is collected by recruiting subjects to call Let’s Go in a standard laboratory setting, while our second corpus consists of calls from real users calling Let’s Go during its operating hours. We quantitatively characterize the two collected corpora using previously proposed measures from the spoken dialog literature, then discuss the statistically significant similarities and differences between the two corpora with respect to these measures. For example, we find that recruited subjects talk more and speak faster, while real users ask for more help and more frequently interrupt the system. In contrast, we find no difference with respect to dialog structure.