Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale
Kunal Jain, A. Parayil, Ankur Mallick, Rujia Wang, Renee St. Amant, Chetan Bansal, Victor Ruehle, Saravan Rajmohan, Shashwat Jaiswal, Yogesh Simmhan, Anoop Kulkarni, Steve Kofsky
ArXiv | February 2025