Dramatic cloud over city of Montreal skyline at Quebec, Canada.

Microsoft Research Lab – Montréal

Downloads

Tip of the Tongue Known Item Retrieval Dataset for Movie Identification

August 2021

The Tip of the Tongue (ToT) dataset is from the paper Tip of the Tongue Known-Item Retrieval: A Case Study in Movie Identification. It is comprised of 758 question/answer pairs scraped from the website iRememberThisMovie.com between 2013 and 2018. These…

Github

Python Reasoning Challenges

May 2021

A short Python Reasoning Challenge can replace an entire page of English describing a typical programming problem. The goal is to teach computers how to program. This OSS repository will contain a dataset of short Python challenges. Most of them…

Github

Conformer-Kernel Model with Query Term Independence (TREC Deep Learning Quick Start)

March 2021

This is a quick start guide for the document ranking task in the TREC Deep Learning (TREC-DL) benchmark. If you are new to TREC-DL, then this repository may make it more convenient for you to download all the required datasets…

Github

Sepsis Cohort from MIMIC III

December 2020

This repo provides code for generating the sepsis cohort from MIMIC III dataset. Our main goal is to facilitate reproducibility of results in the literature.

Github

Generative Neural Visual Artist (GeNeVA) – Training and Evaluation Code

September 2019

Code to train and evaluate the GeNeVA-GAN model for the GeNeVA task proposed in our ICCV 2019 paper Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction.

Github

MetaLWOz: A Dataset of Multi-Domain Dialogues for the Fast Adaptation of Conversation Models

July 2019

We introduce the Meta-Learning Wizard of Oz (MetaLWOz) dialogue dataset for developing fast adaptation methods for conversation models. This data can be used to train task-oriented dialogue models, specifically to develop methods to quickly simulate user responses with a small…

Download

TextWorld

July 2019

TextWorld is a text-based framework used to generate games used to train artificial intelligent agents for text adventure games. The goal is to have this project be used to advance the state of the art of AI research and to…

Github

Bias in Bios

July 2019

Code on Github to reproduce data in Bias in Bios Paper

Github

AMDIM – Augmented Multiscale Deep InfoMax

June 2019

AMDIM (Augmented Multiscale Deep InfoMax) is an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context.

Github