Hide table of contents

TL;DR: Working as Lead Outreach Manager at a mental health nonprofit in Japan, I realized we couldn't answer a basic question funders kept asking: "What's your actual impact?" We had activity metrics (workshops delivered, calls answered) but no outcome metrics. So I built a cost-effectiveness framework using national police data and our publicly available financial records to create a model-based estimate of potential impact. Under plausible assumptions (25% intervention effectiveness), the program costs ~¥720,000 ($4,830) per DALY averted - 7× below WHO's threshold for highly cost-effective interventions. This analysis helped transform how we communicate impact to grant organizations, though significant uncertainties remain.

Background: The Question That Started Everything

I've been working at a Tokyo-based mental health organization for almost a year now, managing outreach programs and grant coordination. The learning curve has been steep - delivering mental health workshops to schools and businesses across four Japanese regions simultaneously, managing distributed teams, engaging donors remotely. There were ups and downs, failed pilot programs, and moments of genuine impact I couldn't quite quantify.

Then came the rejection email.

We'd applied for a major grant to expand our youth suicide prevention programs. The proposal highlighted impressive activity metrics: workshops delivered, hotline calls answered, schools reached. We didn't get the funding. The feedback, diplomatically phrased, essentially asked: "But what's the actual impact?"

That question hit differently after I'd spent 2024 engaging with Effective Altruism Japan and reading my first EA handbook on reasoning transparency and cost-effectiveness. I realized: we were measuring outputs, not outcomes. We knew how many people attended our workshops. We had no idea if we were preventing suicides.

As the main person at the nonprofit working with data across the organization - managing project finances, issuing tax receipts, allocating grants - I was in the best position to figure this out. So I did.

The Problem: Activity Metrics ≠ Impact Metrics

Our nonprofit provides crisis intervention services (suicide hotline, counseling) and prevention programs (school workshops, awareness campaigns). Like many organizations, our reporting focused on what funders traditionally requested:

  • Number of workshops delivered: 9 in 2024
  • Hotline calls answered: ~9,000 annually
  • Schools engaged: 145 schools received awareness materials with phone numbers to call the hotline in times of crisis per post

These numbers look impressive. But they tell you nothing about whether we're actually preventing deaths.

Many grant evaluators now want outcome-oriented metrics. Questions like:

  • How many lives are you saving?
  • What's your cost per disability-adjusted life year (DALY) averted?
  • How does this compare to other interventions?

We had no answers. Not because we didn't care about impact (we deeply did) but because measuring preventive interventions is genuinely hard. How do you count deaths that didn't happen?

Collecting the Data: What I Had and What I Needed

What I started with:

  1. Nonprofit's public annual report - clearly stated ¥11,113,476 allocated to youth outreach and prevention programs in 2024
  2. Japan National Police Agency open data - detailed suicide statistics by age, gender, cause, prefecture
  3. A data science background - enough Python, SQL, and statistics to tie it all together

What I needed to find:

1. National baseline: How big is the problem?

  • Youth population (ages 10-19): ~11,000,000 (Japan Statistics Bureau, 2024)
  • Youth suicides in 2024: 799 deaths (National Police Agency)
  • Youth suicide rate: 7.26 per 100,000

This was far from straightforward. Japan maintains excellent public health data... in PDFs... in Japanese. I did what any searcher with a tech background would do here - I attempted to scrape the website data with a Python script. After this attempt failed due to Japan's aggressive amazing security system, I reverted to downloading all 24 files into my repository by hand. Now that this was out of the way, the data extraction, analysis, labeling and translation followed. If you want to see the Python code - check my project page here: Project Page

2. Program reach: Who did we actually engage?

This was harder. Our outreach team is dispersed across four regions - Tokyo, Osaka, Okinawa and Fukuoka - with different documentation practices. I needed consistent, defensible numbers.

I categorized our reach into three channels:

A. School Awareness Packages (Indirect reach)

  • Sent mental health resource packets to 145 schools
  • Assumed 300 students per school (conservative estimate for Japanese high schools)
  • Assumed 25% engagement rate (materials opened, read, or discussed)
  • Total: 10,875 youths

Uncertainty: High. We don't track which students actually engaged with materials. 25% is based on email open rates from our digital campaigns on Mailchimp (~22-28%) and assumes physical materials have similar engagement. This could be wildly optimistic or pessimistic.

B. Suicide Prevention Workshops (Direct reach)

  • 9 workshops delivered with verified attendance
  • Total: 500 youths

Uncertainty: Low. We have attendance sheets.

C. Crisis Hotline Youth Callers (Direct reach)

  • 9,000 total calls in 2024
  • ~20% from youth callers (based on voice/disclosure, not rigorous)
  • ~70% unique callers (estimated from caller ID patterns, excluding repeat callers)
  • Total: 1,260 unique youth callers

Uncertainty: Medium. Age estimation is imperfect. Some callers don't disclose. Some use blocked numbers. 20% could be 15-25%.

Total youths reached: 12,635

Major assumption: These groups don't significantly overlap. In reality, some workshop attendees may also call the hotline or receive school materials. We're likely double-counting, but we have no way to de-duplicate across channels.

3. Expected deaths without intervention

Using the national baseline suicide rate:

  • Expected deaths = 12,635 × (7.26 / 100,000) = 0.917 deaths

In other words, without any intervention, we'd statistically expect slightly less than one youth in our reached population to die by suicide in 2024.

4. Intervention effectiveness: The biggest uncertainty

This is where things get speculative.

Meta-analyses of suicide prevention programs show effectiveness rates ranging from 10% to 40%, with significant heterogeneity based on:

  • Intervention type (universal vs. indicated)
  • Population (general vs. high-risk)
  • Implementation quality
  • Follow-up duration

I used 25% effectiveness as a central estimate. This assumes that among youths who would otherwise die by suicide, our intervention prevents death in 1 out of 4 cases.

Why 25%?

  • Gatekeeper training programs (similar to our workshops): 10-30% reduction in suicidal ideation
  • Crisis hotlines: 10-20% reduction in immediate suicide risk (short-term)
  • Combined interventions: Potentially higher, but limited long-term data

Critical uncertainties:

  • We don't know if our workshop attendees are representative or self-selected (higher baseline risk?)
  • We don't know long-term outcomes (did the crisis call prevent death, or just delay it?)
  • We don't track individuals longitudinally (no way to verify actual outcomes)
  • Japanese youth may respond differently to interventions validated in Western contexts

Lives saved = 0.917 × 0.25 = 0.229 lives per year

Or roughly 1 life saved every 4 years.

This number feels uncomfortably precise for something this uncertain. It's not a measurement - it's a model-based estimate with wide confidence intervals.

Calculating Cost-Effectiveness of The Program

DALYs averted

Using standard DALY methodology:

  • Average age at prevention: 17 years
  • Life expectancy in Japan: 84 years
  • Years of life lost per death: 67 years
  • Total DALYs averted = 0.229 × 67 = 15.34 DALYs

Cost per DALY

  • Program cost (2024): ¥11,113,476
  • Cost per DALY averted = ¥11,113,476 / 15.34 = ¥724,510 ($4,830 USD)

WHO threshold comparison

WHO considers interventions "highly cost-effective" if cost per DALY is below 1× GDP per capita:

  • Japan GDP per capita: ~$34,000
  • Our cost per DALY: $4,830
  • The non-profit is 7× below the threshold

What This Analysis Actually Shows (And Doesn't Show)

What it shows:

  1. Our program is plausibly cost-effective compared to global health benchmarks
  2. Youth suicide prevention can compete with other cause areas for funding on pure cost-effectiveness grounds
  3. There's a clear case for scaling if effectiveness holds (doubling reach could save ~0.5 lives/year)
  4. Theory of change clarity - Forced conversations about how we think our programs work
  5. Grant applications can be improved - We can now answer "what's the impact?" with quantified estimates and transparent assumptions

What it absolutely does NOT show:

  1. Causal impact - We have no control group, no randomization, no way to know counterfactuals
  2. Attribution - If a workshop attendee doesn't die by suicide, was it our workshop? Family support? Something else?
  3. Long-term effects - Does crisis intervention prevent suicide, or just delay it?
  4. Generalizability - Does 25% effectiveness hold across all our channels equally?

Lessons Learned: What I'd Do Differently

If I could restart, I'd:

  • Survey workshop participants at 3, 6, 12 months - there is value in doing longitudinal feedback to check the effect of intervention
  • Partner with researchers for rigorous evaluation
  • Be more uncertain, more publicly. It's genuinely hard - Preventive interventions have murky counterfactuals, getting helpful advice is crucial

My Takeaways

This analysis is flawed. It relies on heroic assumptions, imperfect data, and speculative effectiveness estimates. But it's better than nothing. Perfect is the enemy of good. We don't need perfect impact measurement to make better decisions. We need transparent, improvable estimates that guide resource allocation.

If you're working at a nonprofit without cost-effectiveness analysis, start somewhere. Use whatever data you have. Make assumptions explicit. Update as you learn. The alternative - flying blind - is worse.

I'm sharing my full methodology, data sources, and Python notebooks here: [GitHub link]. If you spot errors or have suggestions, please reach out: [email].

13

3
0
2

Reactions

3
0
2

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities