Angelina Li

Data Analyst @ Centre for Effective Altruism
1430 karmaJoined Working (0-5 years)Berkeley, CA, USA


Hiya! I work on data stuff at CEA. I used to be the content lead on the EA Global team at CEA, and before that I did economic consulting. Here's an old website I might update at some point.

Think I'm making a mistake? Want to give me feedback? Here's my admonymous. You can also give feedback for me directly to my manager, Oscar Howie.


Thanks for publishing this + your code, I found this approach interesting :) and in general I am excited at people trying different approaches to impact measurement within field building.

I had some model qs (fair if you don't get around to these given that it's been a while since publication):

We define a QARY as:

  1. A year of research labor (40 hours * 50 weeks),
  2. Conducted by a research scientist (other researcher types will be inflated or deflated),
  3. Of average ability relative to the ML research community (other cohorts will be inflated or deflated),
  4. Working on a research avenue as relevant as adversarial robustness (alternative research avenues will be inflated or deflated),


I feel confused by the mechanics of especially adjustments 2-4:

  • On 2: I think you're estimating these adjustments based on researcher type — what is this based on?

Here, scientists, professors, engineers, and PhD students are assigned ‘scientist-equivalence’ of 1, 10, 0.1, and 0.1 respectively.

  • On 3: I feel a bit lost at how you're estimating average ability differences — how did you come up with these numbers?

Given the number of pre-PhD participants each program enrolls, Atlas participants have a mean ability of ~1.1x, Student Group and Undergraduate Stipends ~1x, and MLSS ~0.9x. Student Group PhD students have mean ability ~1.5x.

  • On 4:
    • Am I right that this is the place where you adjust for improvements in research agendas (i.e. maybe some people shift from less -> more useful agendas as per CAIS's opinion, but CAIS still considers their former agenda as useful)?
      • Is that why Atlas gets such a big boost here, because you think it's more likely that people who go on to do useful AI work via Atlas wouldn't have done any useful AI work but for Atlas?
    • I feel confused explicitly how to parse what you're saying here re: which programs are leading to the biggest improvements in research agendas, and why.

The shaded area indicates research avenue relevance for the average participant with (solid line) and without (dashed line) the program. Note that, after finishing their PhD, some pre-PhD students shift away from high-relevance research avenues, represented as vertical drops in the plot.

In general, I'd find it easier to work with this model if I understood better, for each of your core results, which critical inputs were based on CAIS's inside views v.s. evidence gathered by the program (feedback surveys, etc.) v.s. something else :)

I'd be interested to know whether CAIS has changed its field building portfolio based on these results / still relies on this approach!

Congratulations on launching this and reaching your one year mark!! Starting a new charity sounds like a tremendous amount of work, and I have so much respect for CE incubatees.

Based on these factors, we believe that Ansh’s program can reduce neonatal mortality by at least 50%[10].

I had a nitpicky impact evaluation question, sorry if I'm missing something.

Is this 50% number based on your actual observed reduction in neonatal mortality, given your baseline of ~[13% to 27%]? Or is it based on the studies linked in the prior paragraph? I was just a bit surprised to see you cite these papers instead of your own preliminary data :)

It naively seems to me that since you have ~5 months of operational data (given a Jan 2024 launch) + 900 enrollments, maybe you can estimate your actual, tentative observed effects? (Looks like the Cochrane review paper gives some data on the health outcomes of infants at the point of being discharged + at the 1-3 month post discharge mark, so maybe you have some observed effects already?)

So reasonable if you just haven't gotten around to this yet, or if there's another consideration I haven't thought of. Good luck with your work!!

Nice, thanks for keeping track of this and reporting on the data!! <3

No pressure to respond, but I'm curious how long it took you to find the relevant email addresses, send the messages, then reply to all the people etc.? I imagine for me, the main costs would probably be in the added overhead (time + psychological) of having to keep track of so many conversations.

This seems like an interesting data dive, thanks for posting! :)

As an FYI to readers, you can find some high level metrics on the Forum (e.g. hours of engagement, monthly active users) live on the CEA dashboard.

Unrelated — I really like this comment + this other comment of yours as good examples of: "I notice the disagreement you are having is about an empirical and easily testable question, let me spend 5 min to grab the nearest data to test this." (I really admire / value this virtue <3 )

Thanks, I found this a helpful nudge, and wouldn't have known about this otherwise :)

By the way, I ended up buying a copy of Sexual Citizens because of this comment. I found it super interesting (if sad :( ), thanks for the rec!

...I'm a bit embarrassed that it took me fully until the section "anyone who has ever appeared on Love Island or The Apprentice" to realize that this might be satire :P (having opened this a few days post 4/1 lol)

Load more