The Informal Economy Already Runs on Voice. The question is who is now listening.
The conductor doesn’t announce the stop. He leans out the door of the matatu, one hand gripping the roof rack, and shouts a single word into the Nairobi traffic. Sepa. The driver pulls away. The passengers already on board knew what it meant. The passengers left behind at the stop figure it out fast. The traffic policeman writing a ticket nearby was, by design, not supposed to.
Sepa is Sheng for “go.” But it is also a signal, a warning, a piece of operational code shared among the crew of a minibus that moves nine million people around this city every day. Sheng itself is an acronym for Swahili-English Slang, though “slang” undersells it. It is a living language, built from Swahili grammar, English vocabulary, Kikuyu borrowings, Dholuo inflections, and whatever got invented this week in Mathare or Kibera. Linguists at Appen and CLEAR Global spent two months producing a Language-Specific Peculiarities document for it, yet concluded that Sheng’s high lexical turnover made it nearly impossible to formalize. Words considered archaic in Nairobi Sheng one year become reconstructed and recombined in Nakuru Matatu Sheng the next.
Ask a standard Swahili natural language processing model to transcribe a matatu crew at work and you will get noise. Most of what they say will register as errors.
That gap is worth sitting with.
Three thousand kilometers north and east, in a suburb of Jaipur, a kirana shopkeeper fields orders the same way he always has. Customers send voice notes on WhatsApp. “Bhai, ek kilo atta bhejdo aur haan, woh last wali Maggi nahi, dusri wali.” (Brother, send a kilo of flour and listen, not that last Maggi, the other one.) He listens. He packs. He sends his boy. No app involved. No keyboard. No English. The interaction happens in Hinglish, the code-switching hybrid of Hindi and English that somewhere between 350 and 500 million Indians speak daily, a language in which “order return karna hai” (need to make a return) sits alongside brand names like Maggi and Surf, inside syntax that is neither Hindi nor English but unmistakably both.
India has roughly 12 million kirana shops. Almost all of them take orders this way. The informal retail sector is the second-largest employer in the country after agriculture. And the voice systems that run it are, at the technical level, invisible to every major AI platform.
That is changing fast, and the change is worth watching carefully.
India’s government made a bet Kenya hasn’t. In March 2024, IIT Madras, the nonprofit AI4Bharat, and Sarvam AI released IndicVoices, a 12,000-hour speech dataset covering 22 Indian languages. The government’s Bhashini platform offers open APIs for all 22 scheduled languages. NITI Aayog’s AI roadmap targets 490 million informal workers with voice-first interfaces by 2035. The commercial sector followed: Gnani.ai handles 10 million voice interactions daily. Yellow.ai powers multilingual bots for financial services. The Indian voice commerce market was valued at $1.57 billion in 2024 and is projected to reach $7.47 billion by 2030, growing at 32% annually (Grand View Research). Companies like Tabbly and AnveVoice are training specifically on Hinglish patterns, the code-switching patterns that formal Hindi training data doesn’t catch.
Kenya has no Bhashini. Sheng has no official status, no government speech corpus, no IndicVoices equivalent. The research on Sheng-capable AI remains at the pilot stage, mostly humanitarian nonprofits doing sentiment analysis on contraception messaging. The gap between the two countries is not just resources. It is ambition. India is building public infrastructure for informal speech. Kenya is waiting.
But before concluding that India has simply solved a problem Kenya hasn’t yet faced, it’s worth asking what “solving” this problem actually means.
Sheng’s power is precisely its instability. Matatu crews coin new words to exclude outsiders, including passengers and traffic police. The lexicon turns over fast enough that an outsider who learns it this year will be behind by next year. This is a feature. Researchers who studied the language’s use in the matatu economy found that secrecy and identity marking are explicit functions of its vocabulary. Sepa works because the people who need to know what it means already know. Everyone else hears a word.
Hinglish is less deliberately secretive, but it carries a similar grain of resistance. It is how ordinary Indians communicate when they are not performing for institutions, when they are ordering flour, arguing with a distributor, negotiating a price in the market. It belongs to the register of daily life that formal Hindi and formal English were never quite built for. The kirana owner’s WhatsApp voice note is not a failure to use a proper interface. It is the proper interface, evolved over decades of need.
Now that interface is becoming training data.
India’s AI4Bharat and the companies building on its datasets are collecting exactly this kind of speech. Raw recordings from everyday people, full of background noise, code-switching, the ambient sound of a neighborhood. The Outlook Business account of one startup describes its founders recording their own voices, then reaching out to friends speaking Gujarati, Marathi, Bengali, Kannada, Tamil, receiving voice files through WhatsApp. Cleaned, annotated, augmented with synthetic samples. Fed into models.
The matatu workers who kept their language deliberately opaque, the kirana owners who took orders by voice note rather than by app, the farmers in Vidarbha who never adopted the organized retail AI because it assumed data they didn’t have — they are now the subjects of a training project they were never asked to join.
The parallel that keeps coming back is the cashless matatu experiment. In 2013, Kenya’s government mandated cashless payments across the entire matatu network. Google partnered with Equity Bank on BebaPay. Safaricom brought M-Pesa. Mastercard, KCB, and Family Bank all competed for a slice of the 2 billion KES in annual commuter transactions. Every major fintech brand in what was then called Silicon Savannah put resources in.
Eighteen months later, every single project failed. The Google BebaPay service sent a termination email from its Ireland office. The others quietly folded. The matatu economy had simply continued doing what it always did, running on cash, trust, social networks, and the kind of demand intelligence that a conductor who works the same route for five years accumulates in his body.
The cashless experiment assumed the informal economy wanted to become legible to the financial system. The informal economy disagreed.
Voice AI is not a payment system. The comparison is imperfect. But the structure of the assumption is the same: that the informal economy’s friction with formal technology is a problem to be fixed rather than a signal to be read.
When India builds a 12,000-hour Indic speech dataset, it is building something genuinely useful for the 85% of Indians who don’t speak English fluently and have been locked out of digital interfaces built for the other 15%. The farmer in Kitale who can ask her phone about maize prices in Swahili, rather than navigating English menus, is better served. The access argument is real.
But the access comes bundled. A voice AI that understands Hinglish is also, necessarily, a voice AI that records Hinglish, that builds models from it, that enables platforms to hear and process speech that was previously unreadable to them. The informality that made kirana shops resilient against organized retail — the personal relationships, the WhatsApp voice notes, the cash transactions that left no data trail — starts generating a data trail.
The matatu crews developed sepa so that passengers couldn’t follow. A Sheng-capable AI would follow.
I am not arguing against the technology. The access case for vernacular voice AI is strong, maybe one of the strongest cases for AI in low-income economies right now. Being heard in your own language, by a system that actually understands you, is not a small thing. The alternative, for most of India’s 490 million informal workers and most of Nairobi’s working poor, is continued exclusion from services that exist but were never built for them.
What I am less sure about is whose interests are prioritized when infrastructure is built. India’s bet is state-led and explicitly framed as an act of inclusion. Bhashini is a public good. AI4Bharat is a nonprofit. But the commercial layer on top — the fintech bots, the kirana AI platforms, the voice commerce systems targeting 540 million vernacular users — is not building this for the kirana owner’s benefit. It is building it because informal speech was the last frontier of customer data that hadn’t yet been captured.
Kenya doesn’t have India’s state infrastructure, but it also hasn’t yet decided whose interests a Sheng-capable AI would serve. That is a harder question than it looks. The matatu conductor shouting sepa is not waiting for an AI to understand him. He is waiting for the traffic to clear.
The question is whether the AI arrives before he gets a say in who built it, who owns the recordings, and what they do with the knowledge that the word means go.
Views expressed are my own.

