Hide table of contents

Ought is an applied machine learning lab. In this post we summarize our work on Elicit and why we think it's important.

We'd love to get feedback on how to make Elicit more useful to the EA community, and on our plans more generally.

This post is based on two recent LessWrong posts:

In short

Our mission is to automate and scale open-ended reasoning. To that end, we’re building Elicit, the AI research assistant. 

Elicit's architecture is based on supervising reasoning processes, not outcomes. This is better for supporting open-ended reasoning in the short run and better for alignment in the long run.

Over the last year, we built Elicit to support broad reviews of empirical literature. The literature review workflow runs on general-purpose infrastructure for executing compositional language model processes. Going forward, we'll expand to deep literature reviews, then other research workflows, then general-purpose reasoning.

Our mission

Our mission is to automate and scale open-ended reasoning. If we can improve the world’s ability to reason, we’ll unlock positive impact across many domains including AI governance & alignment, psychological well-being, economic development, and climate change.

As AI advances, the raw cognitive capabilities of the world will increase. The goal of our work is to channel this growth toward good reasoning. We want AI to be more helpful for qualitative research, long-term forecasting, planning, and decision-making than for persuasion, keeping people engaged, and military robotics.

Good reasoning is as much about process as it is about outcomes. In fact, outcomes are unavailable if we’re reasoning about the long term. So we’re generally not training machine learning models end-to-end using outcome data, but building Elicit compositionally and based on human reasoning processes.

The case for process-based ML systems

We can think about machine learning systems on a spectrum from process-based to outcome-based:

  • Process-based systems are built on human-understandable task decompositions, with direct supervision of reasoning steps. More
  • Outcome-based systems are built on end-to-end optimization, with supervision of final results. More

We think that process-based systems are better:

  1. In the short term, process-based ML systems have better differential capabilities: They help us apply ML to tasks where we don’t have access to outcomes. These tasks include long-range forecasting, policy decisions, and theoretical research. More
  2. In the long term, process-based ML systems help avoid catastrophic outcomes from systems gaming outcome measures and are thus more aligned. More
  3. Both process- and outcome-based evaluation are attractors to varying degrees: Once an architecture is entrenched, it’s hard to move away from it. This lock-in applies much more to outcome-based systems. More
  4. Whether the most powerful ML systems will primarily be process-based or outcome-based is up in the air. More
  5. So it’s crucial to push toward process-based training now.

Relative to the potential benefits, we think that process-based systems have gotten surprisingly little explicit attention in the AI alignment community.

How we think about success

We're pursuing our mission by building Elicit, a process-based AI research assistant.

 We succeed if:

  1. Elicit radically increases the amount of good reasoning in the world.
    1. For experts, Elicit pushes the frontier forward.
    2. For non-experts, Elicit makes good reasoning more affordable. People who don’t have the tools, expertise, time, or mental energy to make well-reasoned decisions on their own can do so with Elicit.
  2. Elicit is a scalable ML system based on human-understandable task decompositions, with supervision of process, not outcomes. This expands our collective understanding of safe AGI architectures.

Progress in 2021

We've made the following progress in 2021:

  1. We built Elicit to support researchers because high-quality research is a bottleneck to important progress and because researchers care about good reasoning processes. More
  2. We identified some building blocks of research (e.g. search, summarization, classification), operationalized them as language model tasks, and connected them in the Elicit literature review workflow. More
  3. On the infrastructure side, we built a streaming task execution engine for running compositions of language model tasks This engine is supporting the literature review workflow in production. More
  4. About 1,500 people use Elicit every month. More

Roadmap for 2022+

Our plans for 2022+:

  1. We expand literature review to digest the full text of papers, extract evidence, judge methodological robustness, and help researchers do deeper evaluations by decomposing questions like “What are the assumptions behind this experimental result?” More
  2. After literature review, we add other research workflows, e.g. evaluating project directions, decomposing research questions, and augmented reading. More
  3. To support these workflows, we refine the primitive tasks through verifier models and human feedback, and expand our infrastructure for running complex task pipelines, quickly adding new tasks, and efficiently gathering human data. More
  4. Over time, Elicit becomes a general-purpose reasoning assistant, transforming any task involving evidence, arguments, plans and decisions. More

We're hiring for basically all roles—ML engineer, front-end, full-stack, operations, product design, operations, even recruiting. Join our team!

Comments4


Sorted by Click to highlight new comments since:

Cool, thanks for sharing, I'm a big fan of Elicit! Some spontaneous thoughts:

We want AI to be more helpful for qualitative research, long-term forecasting, planning, and decision-making than for persuasion, keeping people engaged, and military robotics.

Are you worried that your work will be used for more likely regretable things like

  • improving the competence of actors who are less altruistic and less careful about unintended consequences (e.g. many companies,  militaries and government insitutions), and
  • speeding up AI capabilities research, and speeding it up more than AI safety research?

I suppose it will be difficult to have much control over insights you generate and it will be relatively easy to replicate your product if you make it publicly available?

Have you considered deemphasizing trying to offer a commercially successful product that will find broad application in the world, and focussing more strongly on designing systems that are safe and aligned with human values?

Regarding the competition between process-based vs. outcome-based machine learning

Today, process-based systems are ahead: Most systems in the world don’t use much machine learning, and to the extent that they use it, it’s for small, independently meaningful, fairly interpretable steps like predictive search, ranking, or recommendation as part of much larger systems. [from your referenced LessWrong post]

My first reaction was thinking that today's ML systems might not be the best comparison, and instead you might want to include all information processing systems, which include human brains. I guess human brains are mostly outcome-based systems with processed-based features:

  • we're monitoring our own thinking and adjust it if it fails to live up to standards we hold, and 
  • we communicate our thought processes for feedback and to teach others

But most of it seems outcome-based and fairly inscrutable?

Are you worried that your work will be used for more likely regretable things like

  • improving the competence of actors who are less altruistic and less careful about unintended consequences (e.g. many companies,  militaries and government insitutions), and

Less careful actors: Our goal is for Elicit to help people reason better. We want less careful people to use it and reason better than they would have without Elicit, recognizing more unintended consequences and finding actions that are more aligned with their values. The hope is that if we can make good reasoning cheap enough, people will use it. In a sense, we're all less careful actors right now.

Less altruistic actors: We favor more altruistic actors in deciding who to work with, give access to, and improve Elicit for. We also monitor use so that we can prevent misuse.

  • speeding up AI capabilities research, and speeding it up more than AI safety research?

I expect the overall impact on x-risk to be a reduction by (a) causing more and better x-risk reduction thinking to happen and (b) shifting ML efforts to a more alignable paradigm, even if (c) Elicit has a non-zero contribution to ML capabilities.

The implicit claim in the concern about speeding up capabilities is that Elicit has a large impact on capabilities because it is so useful. If that is true, we'd expect that it's also super useful for other domains e.g. AI safety. The larger Elicit’s impact on (c), the larger the corresponding impacts on (a) and (b).

To shift the balance away from (c) we’ll focus on supporting safety-related research and researchers, especially conceptual research. We're not doing this very well today but are actively thinking about it and moving in that direction. Given that, it would be surprising if Elicit helped a lot with ML capabilities relative to tools and organizations that are explicitly pushing that agenda.

Have you considered deemphasizing trying to offer a commercially successful product that will find broad application in the world, and focussing more strongly on designing systems that are safe and aligned with human values?

We’re a non-profit so have no obligation to make a commercially successful product. We’ll only focus on it to the extent that it furthers aligned reasoning. That said, I think the best outcome is that we make a widely adopted product that makes it easier for everyone to think through the consequences of their actions and act in alignment with their values.

Thanks a lot for elaborating, makes sense to me.

I was fuzzy about what I wanted to communicate with the term "careful", thanks for spelling out your perspective here. I'm still a little uneasy about the idea that generally improving the ability to plan better will also make sufficiently many actors more careful about avoiding problems that are particularly risky for our future. It just seems so rare that important actors care enough about such risks, even for things that humanity is able to predict and plan for reasonably well, like pandemics.

We're also only reporting our current guess for how things will turn out. We're monitoring how Elicit is used and we'll study its impacts and the anticipated impacts of future features, and if it turns out that the costs outweigh the benefits we will adjust our plans.

More from stuhlmueller
41
· · 1m read
Curated and popular this week
 ·  · 16m read
 · 
This is a crosspost for The Case for Insect Consciousness by Bob Fischer, which was originally published on Asterisk in January 2025. [Subtitle.] The evidence that insects feel pain is mounting, however we approach the issue. For years, I was on the fence about the possibility of insects feeling pain — sometimes, I defended the hypothesis;[1] more often, I argued against it.[2] Then, in 2021, I started working on the puzzle of how to compare pain intensity across species. If a human and a pig are suffering as much as each one can, are they suffering the same amount? Or is the human’s pain worse? When my colleagues and I looked at several species, investigating both the probability of pain and its relative intensity,[3] we found something unexpected: on both scores, insects aren’t that different from many other animals.  Around the same time, I started working with an entomologist with a background in neuroscience. She helped me appreciate the weaknesses of the arguments against insect pain. (For instance, people make a big deal of stories about praying mantises mating while being eaten; they ignore how often male mantises fight fiercely to avoid being devoured.) The more I studied the science of sentience, the less confident I became about any theory that would let us rule insect sentience out.  I’m a philosopher, and philosophers pride themselves on following arguments wherever they lead. But we all have our limits, and I worry, quite sincerely, that I’ve been too willing to give insects the benefit of the doubt. I’ve been troubled by what we do to farmed animals for my entire adult life, whereas it’s hard to feel much for flies. Still, I find the argument for insect pain persuasive enough to devote a lot of my time to insect welfare research. In brief, the apparent evidence for the capacity of insects to feel pain is uncomfortably strong.[4] We could dismiss it if we had a consensus-commanding theory of sentience that explained why the apparent evidence is ir
 ·  · 40m read
 · 
I am Jason Green-Lowe, the executive director of the Center for AI Policy (CAIP). Our mission is to directly convince Congress to pass strong AI safety legislation. As I explain in some detail in this post, I think our organization has been doing extremely important work, and that we’ve been doing well at it. Unfortunately, we have been unable to get funding from traditional donors to continue our operations. If we don’t get more funding in the next 30 days, we will have to shut down, which will damage our relationships with Congress and make it harder for future advocates to get traction on AI governance. In this post, I explain what we’ve been doing, why I think it’s valuable, and how your donations could help.  This is the first post in what I expect will be a 3-part series. The first post focuses on CAIP’s particular need for funding. The second post will lay out a more general case for why effective altruists and others who worry about AI safety should spend more money on advocacy and less money on research – even if you don’t think my organization in particular deserves any more funding, you might be convinced that it’s a priority to make sure other advocates get more funding. The third post will take a look at some institutional problems that might be part of why our movement has been systematically underfunding advocacy and offer suggestions about how to correct those problems. OUR MISSION AND STRATEGY The Center for AI Policy’s mission is to directly and openly urge the US Congress to pass strong AI safety legislation. By “strong AI safety legislation,” we mean laws that will significantly change AI developers’ incentives and make them less likely to develop or deploy extremely dangerous AI models. The particular dangers we are most worried about are (a) bioweapons, (b) intelligence explosions, and (c) gradual disempowerment. Most AI models do not significantly increase these risks, and so we advocate for narrowly-targeted laws that would focus their att
 ·  · 10m read
 · 
Citation: McKay, H. and Shah, S. (2025). Forecasting farmed animal numbers in 2033. Rethink Priorities. The report is also available on the Rethink Priorities website. Executive summary We produced rough-and-ready forecasts of the number of animals farmed in 2033 with the aim of helping advocates and funders with prioritization decisions. We focus on the most numerous groups of farmed animals: broiler chickens, finfishes, shrimps, and select insect species. Our forecasts suggest almost 6 trillion of these animals could be slaughtered in 2033 (Figure 1).   Figure 1: Invertebrates could account for 95% of farmed animals slaughtered in 2033 according to our midpoint estimates. Note that ‘Insects’ only includes black soldier fly larvae and mealworms. Our midpoint estimates point to a potential fourfold increase in the number of animals slaughtered from 2023 to 2033 and a doubling of the number of animals farmed at any time. Invertebrates drive the majority of this growth, and could account for 95% of farmed animals slaughtered in 2033 (see Figure 1) and three quarters of those alive at any time in our mid-point projections. We believe our forecasts point to an urgent need to address critical questions around the sentience and welfare of farmed invertebrates. Our estimates come with many caveats and warnings. In particular: * Species scope: For practicality, we produced numbers only for a few key animal groups: broiler chickens, finfishes, shrimp, and certain insects (black soldier flies and mealworms only). * Sensitivity to insect farming growth: Our forecasts are particularly sensitive to the growth in insect farming, which is highly sensitive to the success of insect farming business models and their ability to attract future investment. The recent and forecasted estimates, with 90% subjective credible intervals, can be viewed below in Table 1.  Table 1: Estimates of recent and forecasted numbers of broiler chickens, finfishes, shrimps, and insects slau