OdysseyBench: Evaluating LLM Agents on Long-Horizon Complex Office Application Workflows
Weixuan Wang, Dongge Han, Daniel Madrigal, Jin Xu, Victor Ruehle, Saravan Rajmohan
ArXiv | August 2025
Weixuan Wang, Dongge Han, Daniel Madrigal, Jin Xu, Victor Ruehle, Saravan Rajmohan
ArXiv | August 2025
Dongge Han, Menglin Xia, Daniel Madrigal, Samuel Kessler, Ankur Mallick, Xuchao Zhang, Mirian Hipolito Garcia, Jin Xu, Victor Ruehle, Saravan Rajmohan
ICML TTODLer-FM Workshop 2025 | June 2025
Metod Jazbec, Menglin Xia, Ankur Mallick, Daniel Madrigal, Dongge Han, Samuel Kessler, Victor Ruehle
NeurIPS FITML workshop 2024 | December 2024
Weixuan Wang, Dongge Han, Daniel Madrigal, Jin Xu, Victor Ruehle, Saravan Rajmohan
ArXiv | August 2025
Dongge Han, Menglin Xia, Daniel Madrigal, Samuel Kessler, Ankur Mallick, Xuchao Zhang, Mirian Hipolito Garcia, Jin Xu, Victor Ruehle, Saravan Rajmohan
ICML TTODLer-FM Workshop 2025 | June 2025
Metod Jazbec, Menglin Xia, Ankur Mallick, Daniel Madrigal, Dongge Han, Samuel Kessler, Victor Ruehle
NeurIPS FITML workshop 2024 | December 2024
Weixuan Wang, Dongge Han, Daniel Madrigal, Jin Xu, Victor Ruehle, Saravan Rajmohan
ArXiv | August 2025
Dongge Han, Menglin Xia, Daniel Madrigal, Samuel Kessler, Ankur Mallick, Xuchao Zhang, Mirian Hipolito Garcia, Jin Xu, Victor Ruehle, Saravan Rajmohan
ICML TTODLer-FM Workshop 2025 | June 2025
Metod Jazbec, Menglin Xia, Ankur Mallick, Daniel Madrigal, Dongge Han, Samuel Kessler, Victor Ruehle
NeurIPS FITML workshop 2024 | December 2024