TALES: Text Adventure Learning Environment Suite
- Christopher Zhang Cui ,
- Xingdi Yuan ,
- Zhang Xiao ,
- Prithviraj Ammanabrolu ,
- Marc-Alexandre Côté
arXiv
Reasoning is an essential skill to enable Large Language Models (LLMs) to interact with the world. As tasks become more complex, they demand increasingly sophisticated and diverse reasoning capabilities for sequential decision-making, requiring structured reasoning over the context history to determine the next best action. We introduce TALES, a diverse collection of synthetic and human-written text-adventure games designed to challenge and evaluate diverse reasoning capabilities. We present results over a range of LLMs, open- and closed-weights, performing a qualitative analysis on the top performing models. Despite an impressive showing on synthetic games, even the top LLM-driven agents fail to achieve 15% on games designed for human enjoyment. Code and visualization of the experiments can be found at https://microsoft.github.io/tale-suite.
Publication Downloads
TALES
November 6, 2024
Text Adventure Learning Environment Suite (TALES) - Benchmark to evaluate language models on interactive text environments. This repository contains the files needed to benchmark language agents on a curated list of text-based games from the following frameworks: Jericho, TextWorld, TextWorld-Express, ScienceWorld, ALFWorld).