Hide table of contents

Disclosure: I work at GiveDirectly. This is a linkpost summarizing findings from a pilot we ran in Rwanda. I used AI to assist in writing this post, and it’s likely that >30% is AI-generated text. 

View our blog and watch a video of recipients using AI here:

https://www.givedirectly.org/the-robots-work-at-night 

Last year, GiveDirectly tested whether unrestricted access to an AI chatbot could complement cash transfers for recipients living in extreme poverty. Alongside our usual ~$1,000 one-time transfers in rural Rwanda, we offered 832 recipients access to a ChatGPT-powered chatbot via WhatsApp - a platform most already used - with no restrictions on what they could ask.

What we expected

We anticipated questions about the GiveDirectly program, help planning how to spend transfers, and basic business advice. People did use it for all of those things.

What actually happened

The more revealing pattern was how quickly recipients moved beyond program-specific questions. Across 21,000 inbound messages between November 2025 and April 2026, people used the chatbot the way people use AI everywhere: for family conflicts, sick children, market prices, and questions they couldn't easily take to anyone else. A few examples, translated verbatim from Kinyarwanda:

  • "I have conflicts with the person I married."
  • "Why does my child cough at night?"
  • "What is the number of neighbors I should be cautious about?"
  • "Who are you that you answer me?"

This isn't surprising in isolation - it mirrors how AI is used globally. But in rural Rwanda, where a community health worker, business coach, or legal aid office may be hours away or nonexistent, the stakes of that access feel different.

The timing finding

Usage increased late at night - after farm work, after children were asleep, in the quiet hours when formal services are long closed. One recipient captured it simply in a focus group: "The robots work at night." This matters because most traditional support programs - training sessions, coaching, extension services - are delivered during the day, in groups, on fixed schedules. The chatbot met people where they actually were.

Where it fell short

This is where we think the EA community's scrutiny is most valuable. Three gaps stood out:

  • Language. Kinyarwanda support across major LLMs is inconsistent and often poor. Language is the first barrier to meaningful access, and it remains largely unsolved for most African languages.
  • Voice. For lower-literacy users, voice notes are the natural interface. But voice functionality was slow, unreliable, and poorly adapted to this context.
  • Local knowledge. Models know Kigali far better than the villages where our recipients live. The more rural the setting, the less useful the AI's answers about local markets, services, and conditions.

Open questions

We're continuing to test - a similar pilot is now underway in Malawi - but we're genuinely uncertain about several things and would value the community's thinking:

  • How should we evaluate LLM utility in low-resource settings when standard benchmarks don't capture what matters - local language quality, contextual relevance, reliability on local queries?
  • What's the right bar for "good enough" language support before deploying these tools at scale?

We don't think the answers will come from one organization or one pilot. If you're building, funding, or researching AI in low-resource settings, we'd welcome the conversation.

33

1
0

Reactions

1
0

More posts like this

Comments12
Sorted by Click to highlight new comments since:

What do you mean by "tested"? What outcomes are you measuring / or do you plan to do surveys...?

Hi! We're taking a phased approach to our learning on digital innovation. These results are from the first phase of work - small pilots assessing uptake and usability through qualitative research (focus groups) and data from the chatbots. We are now running A/B tests of different prompts to see which ones respond most effectively to recipients. Once we know which prompts perform best, we will move on to A/B testing of cash versus cash plus chatbot to evaluate economic indicators that we believe from the micro-data of past studies are leading indicators of longer-term income.  

This is super interesting stuff thanks for posting!

The first thing that jumped out at me was that you are reading and analysing people's messages that come through the chatbot. I'm sure they consented (as much as this is possible to truly consent with the level of education, and the cash incentive) and its all anonymised but it still seems weird. 

I have so many ethical questions about this. None of them I think necessarily mean something like this isn't worth trying, but I think it's worth discussing. Here's just a couple off the top of my head

  1. What do you do if they have a conversation about harming themselves or others? Do you react and do something about it or do you leave it be? Would people then be aware that what they type could illicit some kind of external response?
  2. When they ask something like ""What business has quick profits" for which there is obviously no good answer, what does the bot do? I hope it doesn't try and give business advice. When I asked Claude sonnet, the first answer it gave was 

"Poultry farming is one of the most cited options. It has high demand for eggs and chicken meat, with startup capital of around 1–2.5 million RWF and potential returns in 2–3 months if well managed."

In many rural contexts this might be a decent idea, but without proper disease treatment, housing, protection from theft etc. this advice could be a huge liability. 


Also Who are you that you answer me?" is pretty haunting. I concur.

I think there's a huge amount to be gained potentially by trying chatbots in these settings, but its a bit of an ethical minefield and its a new fronteir for sure.

 

Hi NickLaing, you raise excellent points - this does raise ethical issues and it's something that we have thought a lot about and have put systems in place to address. We do gather consent from recipients when they are initially enrolled, and we ensure that data is anonymised when it is analysed. We have built additional railguards into the chatbots (we tested exactly the types of questions you raised to check how the chatbot responded and then ensured that any questions that raised a red flag receive responses that link them to our call centre). Our safeguarding teams also monitor messages for anything problematic and address these with the appropriate level of engagement depending on the issues that surface. Our approach to safeguarding and guardrails is evolving as we learn - if you have additional suggestions of things we should be thinking about, we'd welcome these suggestions. 

One thing which is unclear to me is, why aren’t these users counterfactually using free commercial offerings? Price is clearly not a barrier, is it just language? And why, then, wouldn’t a frontier lab be well-positioned to capture that market?

Good question. Digital literacy is fairly low and many people did not have smartphones until we ran this pilot. Language offerings are limited in free commercial offerings, particularly for voicenotes, which are essential for engagement in this group.  

It would be interested to see a more detailed and systematic report on the activity and findings so far.

In some respects, it seems like a strange thing for GiveDirectly to be piloting. On the one hand, GiveDirectly has expertise in systematic studies of behavioural change in LDCs , and the chatbot possibly also performed programmatic functions in a cost effective manner. On the other hand it involves a charity known for its "let local people decide how to use money spent on their behalf, Western aid agencies doing it can be disempowering and often wrong" ethos asking "which parameters should we use to fine tune this [adaptation of a commercial] product we've designed to give them the most suitable answers before scaling up its deployment"... which seems like a very different ethos and approach.[1] 

The conclusions highlighted from the research so far - both that if you give poor Rwandans access to ChatGPT they have a similar range of interaction to other humans[2] and that responses generated by an LLM with no meaningful local training dataset were often inadequate - seem unsurprising. I am sympathetic to arguments that people make better decisions with access to information, but I am also sympathetic to arguments a ChatGPT derivative is not the most valuable information Rwandans could receive (and may have minimal or even negative value)

I'm not actually sure what the costs of acquiring relevant local data and training a chatbot to achieve greater fluency in spoken Kinyarwada dialects and safeguarding against advice that is very bad in a local context are,[3] but they seem like a pretty relevant benchmark, since they might actually be considerable on a per user basis and the alternative for critical information like "what is the nearest health centre" might be something like signing people up to email lists, or a small number of human agents in Kigali costing surprisingly little.[4] I guess there's also the "who's paying?" question, especially when the current implementation appears to involve providing training data for one of the world's most valuable companies (and obscure languages may or may not add value to their model). 

I feel one relevant benchmark for GiveDirectly specifically might be "what is the estimated cost per per person reached to improve it: would locals rather have a better chatbot or the cash?". It's possible the insights they're getting are extremely valuable particularly in the context of limited/no of web access, but it's possible they're not...
 

  1. ^

    the relevant comparator might be the One Laptop Per Child project. Well intentioned, theory of change centred on the idea that people in LEDCs can be empowered by interacting with modern technology and better information too, but perhaps actual educational benefits didn't really stack up with the costs and the participants would have chosen to have something other than a computer

  2. ^

    I must admit, I am curious about the extent to which Rwandans engaged in "witty banter" or attempts to manipulate the chatbot into saying something silly...

  3. ^

    I don't know how bad the speaking and dataset is, and whether an adequate "solution" looks like a finetuning prompt with some info or developing a corpus of services data and synthetic idiosyncratic Kinyarwada to fix the model, but the latter option could be very expensive compared with the people it would actually reach...

  4. ^

    I suspect you get many person years of Rwandan human call centre time for a month or two of a mid-level AI engineer's time...

  5. Show all footnotes

Hi David T., you are right to ask the question on whether people would rather have had the cash instead of a better chatbot. This is why we are starting with small pilots and using these to inform our next steps. So far the feedback has been very positive - people say that they wish they had had the chatbot earlier. Next we want to know if adding a chatbot boosts the impact of cash - if so, then rolling this out at larger scale will involve minimal costs but the benefits could be high. We don't know of other non-profits that are offering unrestricted versions of chatbots at this point (many focus only on agricultural advice) - but we believe this offers the kind of information access that should be available to people and allows people to make decisions based on their own needs.  

And one quick query from the main article

"Shelton, Constance (CJ), and Latifah in Latifah’s farm where she now grows Irish potatoes and cabbage. She invested $300 in her farm, and has been able to increase her profits from $5/week to $30/week. She plans to expand into mushroom farming next."

Profits (not gross sales) of $30 a week subsistence farming in Rural Rwanda seem close to impossible. I imagine this is what she told you? Perhaps this was just during harvest season or something, then it would make more sense.

Perhaps not completely impossible though if she's unlocked a particular market!

Hi, yes this would be from self-reported data. We aren't currently collecting detailed M&E data on economic indicators during this phase of work. 

I suggest you look at the work being done on multi-user agents and collective memory. Not as a replacement for individual AI access, not because the poor should somehow have their identities absorbed by the collective, but because multi-user agents could address exactly the local knowledge gap you are describing. Through sustained group interactions and self-learning the agent should be able to accumulate local knowledge and increase its usefulness (local markets, services, and conditions).

I have not been been able to find much of active discussion or experimentation on multi-user and collective memory in ICT4D context, focus is more on enterprise application, but that work is still relevant and maybe its development application is exactly where GiveDirectly will be well positioned to lead.

Basic implementation stack could be an openclaw or hermes style agent connected to a local WhatsApp or Telegram group, both have the functionality. WhatsApp and Telegram are already popular and widespread, so the technology can meet people where they already are. 

Thanks Alex N.! We'll have a look into this. 

Curated and popular this week
Relevant opportunities