Microsoft Speech Language Translation (MSLT) Corpus: The IWSLT 2016 release for English, French and German
- Christian Federmann ,
- Will Lewis
Proceedings of IWSLT 2016 |
We describe the Microsoft Speech Language Translation (MSLT) corpus, which was created in order to evaluate end-to-end conversational speech translation quality. The corpus was created from actual conversations over Skype, and we provide details on the recording setup and the different layers of associated text data. The corpus release includes Test and Dev sets with reference transcripts for speech recognition. Additionally, cleaned up transcripts and reference translations are available for evaluation of machine translation quality. The IWSLT 2016 release described here includes the source audio, raw transcripts, cleaned up transcripts, and translations to or from English for both French and German.
Publication Downloads
Microsoft Speech Language Translation (MSLT) Corpus
October 9, 2017
The Microsoft Speech Language Translation Corpus release contains conversational, bilingual speech test and tuning data for English, Chinese, and Japanese collected by Microsoft Research. The package includes audio data, transcripts, and translations and allows end-to-end testing of spoken language translation systems on real-world data.