CLEAR Global and Sheng Language

Training Chatbots to Learn a New Language

The Company:

CLEAR Global, previously Translators Without Borders, is a nonprofit organization helping people get vital information and be heard, whatever language they speak.

The Project:

Continuing the pro-bono partnership with Appen, CLEAR Global is keen to develop chatbots for mental health, like a voice hotline, in regions with limited literacy. The first language to tackle was Sheng, a Swahili-English slang used primarily by young people in Nairobi and other urban areas of Kenya. As Sheng usage continues to grow, it is vital that community resources can readily adapt to new variances in the lexicon and give people the most accurate and credible information possible.

With Sheng as a new language for both Appen and CLEAR Global, the Appen linguist team needed to develop a project model that revolved around linguistic research, best practices, and methodology. This would provide a concise language-specific summary document, as well as consultation services to achieve a similar output in future languages.

The Challenge:

When approaching work on a language that is complex or new to Appen, our teams conduct structured research with the output being a Language Specific Peculiarities (LSP) document. An LSP is a concise research document which outlines the phonological, grammatical, and orthographic aspects of a language in context to the proposed application of the language – in this case, voice enabled chatbots.

Sheng is growing rapidly and being used in advertisements, public service announcements, and political campaigns. However, the language varies heavily between neighborhoods and has high lexical turnover, preventing widespread formalization and documentation. The Sheng LSP needed to highlight these patterns of change to help developers navigate this variability in their models.

The Result:

Over 2 months, the project delivered five consultation sessions, one Sheng LSP document, and one LSP template with instructions for future LSP document creation.

The consultation sessions and LSP template were developed for CLEAR Global to conduct their own research, particularly on smaller, lesser-known languages. Using the materials we provided, they will be able to develop their own LSP research documents for future development of ASR models in a range of African languages.

For Appen, being engaged in this project has many benefits. Our knowledge and processes of developing and writing Language Specific Peculiarities for a range of purposes has been centralized and solidified, particularly for researching lower resourced languages.

“As a native speaker of Sheng and Swahili, I was impressed by the level of detail and accuracy in the LSP document Appen delivered,” shared Paul Waramabo, Swahili Language Lead for CLEAR Global. “It’s a powerful tool that shows the endless possibilities for many underdeveloped languages and what can be done for those languages.”


Website for deploying AI with world class training data