If someone isn't already doing so, someone should estimate what % of (self-identified?) EAs donate according to our own principles. This would be useful (1) as a heuristic for the extent to which the movement/community/whatever is living up to its own standards, and (1i) assuming the answer is 'decently' it would be useful evidence for PR/publicity/responding to marginal-faith tweets during bouts of criticism.
Looking at the Rethink survey from 2020, they have some info about which causes EAs are giving to but they seem to note that not many people respond on this? And it's not quite the same question. To do: check GWWC for whether they publish anything like this.
Edit to add: maybe an imperfect but simple and quick instrument for this could be something like "For what fraction of your giving did you attempt a cost-effectiveness assessment (CEA), read a CEA, or rely on someone else who said they did a CEA?". I don't think it actually has to be about whether the respondent got the "right" result per se; the point is the principles. Deferring to GiveWell seems like living up to the principles because of how they make their recommendations, etc.
Good point and good fact.
My sense, though, is that if you scratch most "expand the moral circle" statements you find a bit of implicit moral realism. I think generally there's an unspoken "...to be closer to its truly appropriate extent", and that there's an unspoken assumption that there'll be a sensible basis for that extent. Maybe some people are making the statement prima facie though. Could make for an interesting survey.
Love to see these reports!
I have two suggestions/requests for 'crosstabs' on this info (which is naturally organised by evaluator, because that's what the project is!):
Is anyone keeping tabs on where AI's actually being deployed in the wild? I feel like I mostly see (and so this could be a me problem) big-picture stuff, but there seems to be a proliferation of small actors doing weird stuff. Twitter / X seems to have a lot more AI content, and apparently YouTube comments do now as well (per conversation I stumbled on while watching some YouTube recreationally - language & content warnings: https://youtu.be/p068t9uc2pk?si=orES1UIoq5qTV5TH&t=2240)
I think this is a really compelling addition to EA portfolio theory. Two half-formed thoughts:
Does portfolio theory apply better at the individual level than the community level? I think something like treating your own contributions (giving + career) as a portfolio makes a lot of sense, if you're explicitly trying to hedge personal epistemic risk. I think this is a slightly different angle on one of Jeff's points: is this "k-level 2" aggregate portfolio a 'better' aggregation of everyone's information than the "k-level 1" of whatever portfolio emerges from everyone individually optimising their own portfolios? You could probably look at this analytically... might put that on the to-do list.
At some point what matters is specific projects...? Like when I think about 'underfunded', I'm normally thinking there's good projects with high expected ROI that aren't being done, relative to some other cause area where the marginal project has a lower ROI. Maybe my point is something like - underfunding and accounting for it should be done at a different stage of the donation process, rather than in looking at overall what the % breakdown of the portfolio is. Maybe we're more private equity than index fund.
I wonder if there might be particularly strong regional effects to this - maybe Goa had quite a large dog population, quite a lot of rabies, or quite dense dog/human populations (affecting rabies, bite, and transmission incidences).
I think there could be room for further research to identify whether there would be better-looking (sub-country) regions - though like Helene_K found, data would be difficult.
Hey Alexander - thanks for the write-up! I found it useful as a local, and it seems valuable to be sharing/coordinating on this globally.
One thing that occurred to me would be to zoom in on the sectors of the economy that are exposed to AI. I think that in Australia, it might be relatively more concentrated than elsewhere - specifically in education, which is one of our biggest exports (though it gets accounted for domestically I think).
That could mean:
Isn't mechinterp basically setting out to build tools for AI self-improvement?
One of the things people are most worried about is AIs recursively improving themselves. (Whether all people who claim this kind of thing as a red line will actually treat this as a red line is a separate question for another post.)
It seems to me like mechanistic interpretability is basically a really promising avenue for that. Trivial example: Claude decides that the most important thing is being the Golden Gate Bridge. Claude reads up on Anthropic's work, gets access to the relevant tools, and does brain surgery on itself to turn into Golden Gate Bridge Claude.
More meaningfully, it seems like any ability to understand in a fine-grained way what's going on in a big model could be co-opted by an AI to "learn" in some way. In general, I think the case that seems most likely soonest is:
Learn in-context (e.g. results of experiments, feedback from users, things like we've recently observed in scheming papers...)
Translate this to appropriate adjustments to weights (identified using mechinterp research)
Execute those adjustments
Maybe I'm late to this party and everyone was already conceptualising mechinterp as a very dual-use technology, but I'm here now.
Honestly, maybe it leans more towards "offense" (i.e., catastrophic misalignment) than defense! It will almost inevitably require automation to be useful, so we're ceding it to machines out of the gate. I'd expect tomorrow's models to be better placed to make sense of and use of mechinterp techniques than humans are - partly just because of sheer compute, but also maybe (and now I'm into speculating on stuff I understand even less) the nature of their cognition is more suited to what's involved.