Hide table of contents

Summary

Open Philanthropy is launching a request for proposals to improve AI capability evaluations. We're looking to fund work on more demanding GCR-relevant benchmarks, better evaluation science, and improving third-party model access and infrastructure. 

Click here to apply

More details:

Below, we explain what we’re looking for, and why we think this work matters. 

Why capability evaluations matter

The ability to accurately evaluate AI capabilities is becoming increasingly important, for three main reasons:

1. Evaluations are key inputs to AI governance

Many current governance proposals rely heavily on knowing what AI systems can and cannot do. “If-then commitments” are one prominent example — companies agree to take specific actions (like pausing training) if their systems display certain capabilities. But for these approaches to work, we need reliable ways to measure those capabilities.

2. AI capabilities underpin key disagreements about AI risk

Many fundamental disagreements about AI risk stem from different beliefs about what AI systems can or will soon be able to do. For example, skepticism about certain loss-of-control scenarios often comes down to disagreement about whether AI systems could become effective autonomous agents with long-term planning capabilities. Better evaluations could help resolve some of these disagreements, or at least help us identify the key cruxes.

3. We need better situational awareness of what frontier models can and cannot do

Though seen some genuinely challenging, risk-relevant evaluations do exist (e.g. Cybench for AI cyberoffense capabilities, RE-Bench for AI R&D capabilities), but many crucial capabilities remain poorly measured, and benchmarks are saturating quickly. To respond appropriately, we need to understand what AI systems can and can’t do.

Three current problems with capability evaluations

Capability evaluations currently face three major challenges:

  1. Existing benchmarks for risk-relevant capabilities are inadequate. We need more demanding tests that can meaningfully evaluate frontier models' performance on tasks relevant to catastrophic risks, resist saturation even as capabilities advance, and rule in (not just rule out) serious risks.
  2. The science of capability evaluation remains underdeveloped. We don’t yet understand how many capabilities scale, the relationships between different capabilities, or how post-training enhancements will affect performance. This makes interpreting current evaluation results and predicting future results challenging.
  3. Third-party evaluators already face significant access constraints, and increasing security requirements will make access harder. Maintaining meaningful independent scrutiny will require advances in technical infrastructure, evaluation and audit protocols, and access frameworks. 

What we're looking to fund

To address these challenges, we're seeking proposals in three areas:

GCR-relevant capability evaluations for AI agents

We want to fund new evaluations that:

  1. Test agentic, risk-relevant capabilities, such as AI R&D, situational awareness, and adaptation to novel adversarial environments
  2. Are extremely challenging, ideally taking world-class experts multiple days

For more on why we think this is important, what we're looking for, and previous work we think is useful, see this section of our RFP.

Improving the science of capabilities development and evaluations

Current capability evaluations are more like snapshots than predictive tools: they tell us what models can do now, but not what they're likely to do next. We want to improve understanding of questions such as:

  1. How capabilities scale with different inputs
  2. Relationships between different capabilities
  3. Best practices for evaluation methodology

For open questions here we think are important, and past work we've found useful, see this.  

Improving third-party model access and evals infrastructure

Independent evaluations are crucial for reliably assessing AI capabilities. As the stakes get higher, we can't trust AI companies to verify their own claims. But as security requirements increase, getting meaningful external access will become harder.

We're looking for approaches to resolve the tension between security requirements and meaningful external oversight, including:

  1. Understanding necessary access requirements and how to secure them
  2. Improving evaluation infrastructure
  3. Developing verifiable auditing techniques

For open questions here we think are important, and past work we've found useful, see this.

How to engage

Even if you're not planning to apply for funding, this RFP contains many open research questions that we think are important for the field — we encourage you to read the full RFP if you're interested in capability evaluation. Consider applying if you have relevant expertise or ideas, and please share with others who might be interested. 

Anyone is eligible to apply. Applications will be open until 1st April. 

Click here to apply

37

0
2

Reactions

0
2

More posts like this

Comments3


Sorted by Click to highlight new comments since:

Flag that I didn't catch that this was an important announcement, and I think that's because it's posted by one user with initials. Hard to explicate exactly what's going on, but that made me think it was one anonymous user's reactions to an OP announcement rather than the real deal.

By contrast, the technical AIS RFP has three co-authors with full names, and I recognised them as people who work on that team. I'd guess posts with multiple full-name co-authors are more likely to be understood as important and therefore get more reach :) 

This seems to be of questionable effectiveness. Brief answers/challenges: 

Evaluations are key input to ineffective governance. The safety frameworks presented by the frontier labs are "safety-washing", more appropriately considered roadmaps towards an unsurvivable future.

Disagreement on AI capabilities underpin performative disagreements on AI Risk. As far as I know, there's no recent published substantial such disagreement - I'd like sources for your claim, please.  

We don't need more situational awareness of what current frontier models can and cannot do in order to respond appropriately. No decision-relevant conclusions can be drawn from evaluations in the style of Cybench and Re-Bench. 

Hi Søren, 

Thanks for commenting. Some quick responses:

> The safety frameworks presented by the frontier labs are "safety-washing", more appropriately considered roadmaps towards an unsurvivable future

I don’t see the labs as the main audience for evaluation results, and I don’t think voluntary safety frameworks should be how deployment and safeguard decisions are made in the long-term, so I don’t think the quality of lab safety frameworks is that relevant to this RFP.

> I'd like sources for your claim, please. 

Sure, see e.g. the sources linked to in our RFP for this claim: What Are the Real Questions in AI? and What the AI debate is really about.

I’m surprised you think the disagreements are “performative” – in my experience, many sceptics of GCRs from AI really do sincerely hold their beliefs.

> No decision-relevant conclusions can be drawn from evaluations in the style of Cybench and Re-Bench.

I think Cybench and RE-Bench are useful, if imperfect, proxies for frontier model capabilities at cyberoffense and ML engineering respectively, and those capabilities are central to threats from cyberattacks and AI R&D. My claim isn’t that running these evals will tell you exactly what to do: it’s that these evaluations are being used as inputs into RSPs and governance proposals more broadly, and provide some evidence on the likelihood of GCRs from AI, but will need to be harder and more robust to be relied upon.

More from cb
Curated and popular this week
 ·  · 22m read
 · 
The cause prioritization landscape in EA is changing. Prominent groups have shut down, others have been founded, and everyone’s trying to figure out how to prepare for AI. This is the third in a series of posts critically examining the state of cause prioritization and strategies for moving forward. Executive Summary * An increasingly common argument is that we should prioritize work in AI over work in other cause areas (e.g. farmed animal welfare, reducing nuclear risks) because the impending AI revolution undermines the value of working in those other areas. * We consider three versions of the argument: * Aligned superintelligent AI will solve many of the problems that we currently face in other cause areas. * Misaligned AI will be so disastrous that none of the existing problems will matter because we’ll all be dead or worse. * AI will be so disruptive that our current theories of change will all be obsolete, so the best thing to do is wait, build resources, and reformulate plans until after the AI revolution. * We identify some key cruxes of these arguments, and present reasons to be skeptical of them. A more direct case needs to be made for these cruxes before we rely on them in making important cause prioritization decisions. * Even on short timelines, the AI transition may be a protracted and patchy process, leaving many opportunities to act on longer timelines. * Work in other cause areas will often make essential contributions to the AI transition going well. * Projects that require cultural, social, and legal changes for success, and projects where opposing sides will both benefit from AI, will be more resistant to being solved by AI. * Many of the reasons why AI might undermine projects in other cause areas (e.g. its unpredictable and destabilizing effects) would seem to undermine lots of work on AI as well. * While an impending AI revolution should affect how we approach and prioritize non-AI (and AI) projects, doing this wisel
 ·  · 4m read
 · 
TLDR When we look across all jobs globally, many of us in the EA community occupy positions that would rank in the 99.9th percentile or higher by our own preferences within jobs that we could plausibly get.[1] Whether you work at an EA-aligned organization, hold a high-impact role elsewhere, or have a well-compensated position which allows you to make significant high effectiveness donations, your job situation is likely extraordinarily fortunate and high impact by global standards. This career conversations week, it's worth reflecting on this and considering how we can make the most of these opportunities. Intro I think job choice is one of the great advantages of development. Before the industrial revolution, nearly everyone had to be a hunter-gatherer or a farmer, and they typically didn’t get a choice between those. Now there is typically some choice in low income countries, and typically a lot of choice in high income countries. This already suggests that having a job in your preferred field puts you in a high percentile of job choice. But for many in the EA community, the situation is even more fortunate. The Mathematics of Job Preference If you work at an EA-aligned organization and that is your top preference, you occupy an extraordinarily rare position. There are perhaps a few thousand such positions globally, out of the world's several billion jobs. Simple division suggests this puts you in roughly the 99.9999th percentile of job preference. Even if you don't work directly for an EA organization but have secured: * A job allowing significant donations * A position with direct positive impact aligned with your values * Work that combines your skills, interests, and preferred location You likely still occupy a position in the 99.9th percentile or higher of global job preference matching. Even without the impact perspective, if you are working in your preferred field and preferred country, that may put you in the 99.9th percentile of job preference
 ·  · 6m read
 · 
I am writing this to reflect on my experience interning with the Fish Welfare Initiative, and to provide my thoughts on why more students looking to build EA experience should do something similar.  Back in October, I cold-emailed the Fish Welfare Initiative (FWI) with my resume and a short cover letter expressing interest in an unpaid in-person internship in the summer of 2025. I figured I had a better chance of getting an internship by building my own door than competing with hundreds of others to squeeze through an existing door, and the opportunity to travel to India carried strong appeal. Haven, the Executive Director of FWI, set up a call with me that mostly consisted of him listing all the challenges of living in rural India — 110° F temperatures, electricity outages, lack of entertainment… When I didn’t seem deterred, he offered me an internship.  I stayed with FWI for one month. By rotating through the different teams, I completed a wide range of tasks:  * Made ~20 visits to fish farms * Wrote a recommendation on next steps for FWI’s stunning project * Conducted data analysis in Python on the efficacy of the Alliance for Responsible Aquaculture’s corrective actions * Received training in water quality testing methods * Created charts in Tableau for a webinar presentation * Brainstormed and implemented office improvements  I wasn’t able to drive myself around in India, so I rode on the back of a coworker’s motorbike to commute. FWI provided me with my own bedroom in a company-owned flat. Sometimes Haven and I would cook together at the residence, talking for hours over a chopping board and our metal plates about war, family, or effective altruism. Other times I would eat at restaurants or street food booths with my Indian coworkers. Excluding flights, I spent less than $100 USD in total. I covered all costs, including international transportation, through the Summer in South Asia Fellowship, which provides funding for University of Michigan under