Building Better Language Models Through Global Understanding

  • Dr. Marzieh Fadaee, Cohere Labs

Modern language models have achieved remarkable capabilities in English, but human knowledge and experience span thousands of languages, each encoding unique perspectives and problem-solving approaches. From the algorithmic precision required to handle Arabic’s root-pattern morphology to the contextual reasoning needed for Japanese’s topic-prominent structure, each language presents distinct computational challenges that push the boundaries of natural language processing. The talk will address pressing challenges in multilingual AI development, including training data imbalances, cross-lingual transfer limitations, safe and harmless generations, and evaluation complexity across different languages. I’ll discuss practical solutions and emerging research directions that could help bridge the current performance gap between high-resource and low-resource languages. By building truly multilingual AI systems, we not only expand technology access but also develop more sophisticated models capable of handling the full spectrum of human language complexity.