Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency
- Colin Hong Fung Heng ,
- Xu Guo ,
- Anand Chaanan Singh ,
- Esha Choukse ,
- Dmitrii Ustiugov
EMNLP |
Recently, Test-Time Scaling (TTS) has gained increasing attention for improving LLM reasoning performance at test time without retraining the model. A notable TTS technique is Self-Consistency (SC), which generates multiple reasoning chains in parallel and selects the final answer via majority voting. While effective, the order-of-magnitude computational overhead limits its broad deployment. Prior attempts to accelerate SC mainly rely on model-based confidence scores or heuristics with limited empirical support. For the first time, we theoretically and empirically analyze the inefficiencies of SC and reveal actionable opportunities for improvement. Building on these insights, we propose Slim-SC, a (thinking) step-wise pruning strategy that removes redundant chains using inter-chain similarity at the thought level. Experiments on three mathematical reasoning datasets and two recent LLM architectures confirm that Slim-SC not only reduces resource waste but also matches or even increases the accuracy of SC, providing a simple yet effective alternative for efficient test-time scaling with Self-Consistency.