Why AI Still Doesn’t Understand How We Speak
As part of the AI for the Global Majority initiative, a research team from Ghana led by Professor Jerry John Kponyo, and including Ama Branoa Banful, Juliet Arthur, and Kenneth Dotse, is tackling a problem that remains largely invisible in global discussions on artificial intelligence: AI often fails to understand how people actually speak.
Voice assistants and speech recognition systems are now embedded in everyday life. Yet for millions of users, these technologies remain unreliable, or simply unusable.
The reason is simple: most systems are not designed for real-world linguistic diversity.
When speaking naturally becomes a barrier
In multilingual societies such as Ghana, people frequently switch between languages within the same sentence, a practice known as code-switching.
A speaker may move seamlessly between English and Twi, depending on context, expression, or habit. While this is entirely natural in everyday communication, it creates significant challenges for AI systems, which are typically trained on standardised, monolingual datasets.
As a result, performance drops sharply the moment speech deviates from these norms.
The issue is further compounded in tonal languages like Twi, where meaning depends not only on words, but on variations in tone. The same word can have entirely different meanings depending on how it is pronounced, something most current models struggle to interpret.
An invisible exclusion
These limitations are not just technical, they have real social consequences.
When AI systems fail to recognise how people speak, they effectively exclude entire populations from accessing digital tools and services. This creates a form of digital inequality, where certain voices are systematically underrepresented or misunderstood.
The gap becomes even more pronounced for individuals with impaired speech, including those affected by stroke, dysarthria, or cerebral palsy. Because current models are not trained on such speech patterns, they perform even worse, further reinforcing exclusion.
A problem rooted in lived experience
For Ama Branoa Banful, a member of the research team, this issue first became apparent through personal experience.
During her studies, she realised that widely used tools such as Google Assistant or Siri failed to understand her whenever she switched between English and Twi mid-sentence, despite speaking clearly.
This everyday frustration revealed a deeper issue:
AI systems are not built to reflect how people actually communicate.
A broader question
The implications go far beyond speech recognition.
If AI systems cannot understand linguistic diversity, what does this mean for their role in society? Who gets to be recognised, and who remains invisible?
Addressing these questions requires more than technical fixes, it calls for a rethinking of how AI is designed, trained, and evaluated.
We will explore these questions deeper in Part 2 of this series, coming soon.
About AI for the Global Majority
AI for the Global Majority (AI4GM) is a joint initiative of the Geneva Graduate Institute, Microsoft, and the International Telecommunication Union (ITU) dedicated to supporting innovative, evidence-based, and context-sensitive research on how artificial intelligence can benefit the world’s majority populations.
Bringing together interdisciplinary teams from across regions and sectors, the initiative explores practical pathways for more inclusive, responsible, and impactful AI in areas such as governance, education, health, finance, and digital innovation.
Selected teams will present their work in Geneva as part of the AI for Good Global Summit, contributing to international discussions on the future of AI and global development.