IOValve: Leakage-Free I/O Sandbox for Large-Scale Untrusted Data Processing
- Sangho Lee ,
- Jules Drean ,
- Yue Tan ,
- Marcus Peinado
32nd ACM Conference on Computer and Communications Security (CCS 2025) |
Organized by ACM
The widespread adoption of Large Language Models (LLMs) is driving the rapidly growing demand for large-scale computations like training and fine-tuning models. In many areas, the confidentiality of the underlying data is of critical importance to their corporate or government owners. However, securing data in large-scale computations is challenging. First, its demand for enormous hardware resources typically requires outsourcing (e.g., to the public cloud). Second, the large and rapidly evolving software stack used in LLM training in conjunction with a growing incidence of supply chain attacks and software vulnerabilities makes it all but impossible for data owners to establish trust in the code that processes their highly sensitive data. Confidential computing and sandboxing are promising techniques for solving these problems. However, existing sandboxes do not address covert channels which limits their ability to protect confidential data.
This paper proposes IOValve, a novel I/O sandbox for large-scale computations on confidential data. IOValve places sandbox enforcement on a programmable network device that is physically isolated from the processor hardware running the untrusted software stack. This construction allows IOValve to sidestep the multitude of side channels due to visible or hidden resource sharing. IOValve interposes on all network I/O of the sandbox and only transmits encrypted and regularized network traffic in order to prevent information leakage over the network. Our evaluation shows that IOValve has marginal performance overhead and supports real-world applications like LLM fine-tuning and batch inference and molecular simulation.