SocialCC: Interactive Evaluation for Cultural Competence in Language Agents
- Jincenzi Wu ,
- Jianxun Lian ,
- DingDong Wang ,
- Helen Meng
ACL 2025 |
Large Language Models (LLMs) are increasingly deployed worldwide, yet their ability to navigate cultural nuances remains underexplored. Misinterpreting cultural content can lead to AI-generated responses that are offensive or inappropriate, limiting their usability in global applications such as customer service, diplomatic communication, and online education. While prior research has evaluated cultural knowledge of LLMs, existing benchmarks fail to assess dynamic cultural competence — the ability to apply cultural knowledge effectively in real-world interactions. To address this gap, we introduce SocialCC, a novel benchmark designed to evaluate cultural competence through multi-turn interactive intercultural scenarios. It comprises 3,060 human-written scenarios spanning 60 countries across six continents. Through extensive experiments on eight prominent LLMs, our findings reveal a significant gap between the cultural knowledge stored in these models and their ability to apply it effectively in cross-cultural communication.