Bridging the Language Gap: The Quest for AI Accessibility in Africa’s Diverse Linguistic Landscape

Africa’s estimated 1,500 to 3,000 languages speak to the diverse ways people have organized their lives, cultures, and identities over millennia. For all this linguistic diversity, artificial intelligence (AI) development on the continent is still sorely lacking. The availability of data to train AI models is crucial for ensuring that all languages can benefit from…

Liam Avatar

By

Bridging the Language Gap: The Quest for AI Accessibility in Africa’s Diverse Linguistic Landscape

Africa’s estimated 1,500 to 3,000 languages speak to the diverse ways people have organized their lives, cultures, and identities over millennia. For all this linguistic diversity, artificial intelligence (AI) development on the continent is still sorely lacking. The availability of data to train AI models is crucial for ensuring that all languages can benefit from technological advancements. A shared vision between technologists and researchers can start to democratize the landscape of AI accessibility throughout Africa.

Chinasa T. Okolo, the founder of Technēculturǎ, a consultancy that focuses on building inclusive AI solutions, would like to see inclusivity baked into AI model development. She cautions that if we don’t collect better, more inclusive data and account for all languages, we risk leaving entire communities in the dark without opportunities. “We’re going to continue to see people locked out of opportunity,” Okolo stated, highlighting the urgency of addressing these disparities.

The African Next Voices project, which Marivate spearheaded in South Africa, has taken important steps to get there. The project completed an incredible 9,000 hours of recordings! Youth, adults, and elders from the cities and towns ranging from South Africa, Kenya, and Nigeria shared their wondrous voices. These recordings represent 18 unique continental African languages, forming a rich dataset for AI developers working on the continent.

Unfortunately, only 42 languages are currently featured in the available language modes on the continent. The situation is made even more difficult by the reality that Africa is home to 23 different scripts and alphabets. Only three—Latin, Arabic, and Ge’Ez—are today used in AI tools. This lack of representation highlights the need for an urgent push to preserve African languages by compiling them in dictionaries and grammatical studies.

Nyalleng Moorosi is a post-doctoral researcher with the Distributed AI Research Institute (DAIR). She incredibly advocates for making AI available in every African language, including those with only a single speaker. “All languages deserve representation or preservation,” Moorosi asserts. He further highlights that insufficient documentation and contextual understanding required for effective AI training are needed for the majority of African languages. “There’s a lot of contextual knowledge; there is little documentation,” he explained.

Moorosi takes the discussion a step further to illustrate the need for cultural awareness when developing AI. “We need to make sure that the people who build these models understand the consequences. They understand the cultures enough to understand the weight of these errors,” he stated. He knows that small mistakes are ok in informal information search, such as finding out about happenings in downtown Nairobi. For issues that impact people’s lives and livelihoods, precision matters.

The difference between English and African languages in terms of available resources is pronounced. For example, developers can access more than 7 million articles in English. African languages remain hugely underrepresented in this vast array of resources. This gap highlights the need for companies like Apple and Google to invest more in African language development, despite the challenges posed by fewer speakers compared to markets like Finland.

Okolo presses that a new way of developing models needs to be reconsidered. “We have to reenvision the way that we undertake model development in the first place,” she said. This necessitates new and groundbreaking approaches that focus on ensuring inclusivity and accessibility across the entire spectrum of African languages.

This dataset, created by the African Next Voices (ANV) project, will help ensure that AI developers across the continent have access to a more inclusive dataset. This effort is an important advance toward creating a more equitable digital world where all languages can flourish. With access to such a diverse linguistic landscape, developers will be better equipped to build AI applications that are not only more accurate but culturally relevant.

Liam Avatar