New Cause Area: Administrative Data Sets

Lev Heller

In all of the discussion about analyzing the best possible ways to intervene in global development, there’s an underlying component we really haven’t been giving enough attention to: Access to data for impact evaluation.

We still spend enormous resources on surveys, audits, and one-off M&E, only to discover months or years later that a program failed. By then, trust and funding are gone.

A more scalable alternative already exists in principle: administrative data. J-PAL and others have shown that reusing operational datasets (like tax records, health records, or credit score data) can make impact evaluation faster, cheaper, and more accurate. But in most LMIC contexts, access is limited and fragmented.

This is a highly neglected opportunity. Lowering the cost and lag of evaluation has massive RoI. Ineffective programs and deployments can be identified more quickly and cheaply. Successful ones can be scaled with confidence. It also makes continuous outcome tracking feasible. Imagine comparing typhoid trends across program cohorts in real time, using anonymized clinic data, rather than waiting for intermittent surveys.

This could also make it easier to popularize higher levels of rigor in M&E, if there were a way to couple data access partnerships with an accessible frontend.

There are real challenges (privacy, governance, logistics), but the upside is huge.

The focus, from the practitioner perspective, is supporting initiatives that will create these data sets, and establishing means to make operational use of them. I’m already exploring some angles to pursue this cause area and make lightweight comparison tests more accessible to the broader global health sector. Any thoughts on the value of this approach? If anyone in data science or software engineering has interest in collaborating, you’re also welcome to message me.

35 Reactions

Comments3

Sorted by

New & upvoted

Click to highlight new comments since: Today at 4:50 PM

geoffreyAug 285

Agree the value is high. But practically, there's two big questions that pop to mind since I work / study around this area:

If aggregating existing datasets, what's your value-add over what J-PAL, World Bank, IDInsight, Our World in Data, and what numerous un-affliated academics are already doing? (See Best of EconTwitter Substack for "public goods" which are sometimes publicly accessible datasets)
If gaining access to new datasets, what are you offering to LMIC governments in exchange? Even making a single batch of batch of data publicly accessible is a lift. So in practice, they need to see some value, analytically or logistically, to be willing to work with you

Lev HellerAug 2812

What I'm suggesting is really about:

Creating new data sets
Making inaccessible data sets accessible
Making lightweight, live comparison testing more readily available to the sector as a whole

On the new data sets front, I've been looking at last-mile health record digitization and interoperability. There are some promising cases of traction via smartphone-compatibility like UCS in Tanzania, or MedTrack in Ghana (who I've been directly working with). Speaking for MedTrack, I can say that we're already working with the goal of creating a usable administrative data set.

On the front of currently inaccessible but existing data, I think there's more potential for looking to business than government. Mobile money platforms have already done some limited cooperation with research, to my knowledge. They'd likely be reluctant to form a large number of data sharing partnerships, but a single partner that translates the data into an anonymized aggregate set and then acts as the go-between could create a lot of potential for assessing economic impact. From an incentive perspective, they could make some kind of commission from access, and it makes for good CSR optics.

On the last point, I'd propose that there's an underserved need for simplified impact eval that rules out obviously ineffective interventions or deployments early and affordably. We spend so much time focusing on the maximum of rigor in analysis that we can lose sight of the operational value of quick and dirty numbers, especially if that practice can be low friction enough to become normalized in the sector. What I'd really want to see is an interface to set up live dashboards that just run a straightforward a/b test or something.

Tyler KolotaSep 11

If you ever find high-impact data that could need cleaning or could need to be pulled from like PDFs & scanned images then let me know as I often work develop tools related to this in my day-job.

LI: https://www.linkedin.com/in/kolota