I’m working on a project to estimate the cost-effectiveness of AIS orgs, something like Animal Charity Evaluators does. This involves gathering data on metrics such as:
* People impacted (e.g., scholars trained).
* Research output (papers, citations).
* Funding received and allocated.
Some organizations (e.g., MATS, AISC) share impact analyses, there’s no broad comparison. AI safety orgs operate on diverse theories of change, making standardized evaluation tricky—but I think rough estimates could help with prioritization.
I’m looking for:
1. Previous work
2. Collaborators
3. Feedback on the idea
If you have ideas for useful metrics or feedback on the approach, let me know!
I'd love to see an 'Animal Welfare vs. AI Safety/Governance Debate Week' happening on the Forum. The risks from AI cause has grown massively in importance in recent years, and has become a priority career choice for many in the community. At the same time, the Animal Welfare vs Global Health Debate Week demonstrated just how important and neglected the cause of animal welfare remains. I know several people (including myself) who are uncertain/torn about whether to pursue careers focused on reducing animal suffering or mitigating existential risks related to AI. It would help to have rich discussions comparing both causes's current priorities and bottlenecks, and a debate week would hopefully expose some useful crucial considerations.
We should expect that the incentives and culture for AI-focused companies to make them uniquely terrible for producing safe AGI.
From a “safety from catastrophic risk” perspective, I suspect an “AI-focused company” (e.g. Anthropic, OpenAI, Mistral) is abstractly pretty close to the worst possible organizational structure for getting us towards AGI. I have two distinct but related reasons:
1. Incentives
2. Culture
From an incentives perspective, consider realistic alternative organizational structures to “AI-focused company” that nonetheless has enough firepower to host successful multibillion-dollar scientific/engineering projects:
1. As part of an intergovernmental effort (e.g. CERN’s Large Hadron Collider, the ISS)
2. As part of a governmental effort of a single country (e.g. Apollo Program, Manhattan Project, China’s Tiangong)
3. As part of a larger company (e.g. Google DeepMind, Meta AI)
In each of those cases, I claim that there are stronger (though still not ideal) organizational incentives to slow down, pause/stop, or roll back deployment if there is sufficient evidence or reason to believe that further development can result in major catastrophe. In contrast, an AI-focused company has every incentive to go ahead on AI when the case for pausing is uncertain, and minimal incentive to stop or even take things slowly.
From a culture perspective, I claim that without knowing any details of the specific companies, you should expect AI-focused companies to be more likely than plausible contenders to have the following cultural elements:
1. Ideological AGI Vision AI-focused companies may have a large contingent of “true believers” who are ideologically motivated to make AGI at all costs and
2. No Pre-existing Safety Culture AI-focused companies may have minimal or no strong “safety” culture where people deeply understand, have experience in, and are motivated by a desire to avoid catastrophic outcomes.
The first one should be self-explanatory. Th
The recently released 2024 Republican platform said they'll repeal the recent White House Executive Order on AI, which many in this community thought is a necessary first step to make future AI progress more safe/secure. This seems bad.
From https://s3.documentcloud.org/documents/24795758/read-the-2024-republican-party-platform.pdf, see bottom of pg 9.
SB 1047 is a critical piece of legislation for AI safety, but there haven’t been great ways of getting up to speed, especially since the bill has been amended several times. Since the bill's now finalized, better resources exist to catch up. Here's a few:
* A four-page summary of the bill [written by bill proponents]
* A recent post from Zvi Mowshowitz explaining the latest version of the bill
* A summary of the latest round of amendments [written by bill proponents]
* Latest bill text
If you are working in AI safety or AI policy, I think understanding this bill is pretty important. Hopefully this helps.
Being mindful of the incentives created by pressure campaigns
I've spent the past few months trying to think about the whys and hows of large-scale public pressure campaigns (especially those targeting companies — of the sort that have been successful in animal advocacy).
A high-level view of these campaigns is that they use public awareness and corporate reputation as a lever to adjust corporate incentives. But making sure that you are adjusting the right incentives is more challenging than it seems. Ironically, I think this is closely connected to specification gaming: it's often easy to accidentally incentivize companies to do more to look better, rather than doing more to be better.
For example, an AI-focused campaign calling out RSPs recently began running ads that single out AI labs for speaking openly about existential risk (quoting leaders acknowledging that things could go catastrophically wrong). I can see why this is a "juicy" lever — most of the public would be pretty astonished/outraged to learn some of the beliefs that are held by AI researchers. But I'm not sure if pulling this lever is really incentivizing the right thing.
As far as I can tell, AI leaders speaking openly about existential risk is good. It won't solve anything in and of itself, but it's a start — it encourages legislators and the public to take the issue seriously. In general, I think it's worth praising this when it happens. I think the same is true of implementing safety policies like RSPs, whether or not such policies are sufficient in and of themselves.
If these things are used as ammunition to try to squeeze out stronger concessions, it might just incentivize the company to stop doing the good-but-inadequate thing (i.e. CEOs are less inclined to speak about the dangers of their product when it will be used as a soundbite in a campaign, and labs are probably less inclined to release good-but-inadequate safety policies when doing so creates more public backlash than they were
Anthropic has just launched "computer use". "developers can direct Claude to use computers the way people do".
https://www.anthropic.com/news/3-5-models-and-computer-use