We're Ought. We're going to answer questions here on Tuesday August 9th at 10am Pacific. We may get to some questions earlier, and may continue answering a few more throughout the week.
About us:
- We're an applied AI lab, taking a product-driven approach to AI alignment.
- We're 10 people right now, roughly split between the Bay Area and the rest of the world (New York, Texas, Spain, UK).
- Our mission is to automate and scale open-ended reasoning. We are working on getting AI to be as helpful for supporting reasoning about long-term outcomes, policy, alignment research, AI deployment, etc. as it is for tasks with clear feedback signals.
- We're building the AI research assistant Elicit. Elicit's architecture is based on supervising reasoning processes, not outcomes, an implementation of factored cognition. This is better for supporting open-ended reasoning in the short run and better for alignment in the long run.
- Over the last year, we built Elicit to support broad reviews of empirical literature. We're currently expanding to deep literature reviews, then other research workflows, then general-purpose reasoning.
- We're hiring for full-stack, devops, ML, product analyst, and operations manager roles.
We're down to answer basically any question, including questions about our mission, theory of change, work so far, future plans, Elicit, relation to other orgs in the space, and what it's like to work at Ought.
I’d say what we’re afraid of is that we’ll have AI systems that are capable of sophisticated planning but that we don’t know how to channel those capabilities into aligned thinking on vague complicated problems. Ought’s work is about avoiding this outcome.
At this point we could chat about why it’s plausible that we’ll have such capable but unaligned AI systems, or about how Ought’s work is aimed at reducing the risk of such systems. The former isn’t specific to Ought, so I’ll point to Ajeya’s post Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover.
I just want to highlight the key assumption Ajeya’s argument rests on: The system is end-to-end optimized on a feedback signal (generally from human evaluations), i.e. all its compute is optimizing a signal that has now way to separate “fake it while in training” from “have the right intent” and so can lead to catastrophic outcomes when the system is deployed.
How does Ought’s work help avoid that outcome?
We’re breaking down complex reasoning into processes with parts that are not jointly end-to-end optimized. This makes it possible to use smaller models for individual parts, makes the computation more transparent, and makes it easier to verify that the parts are indeed implementing the function that we (or future models) think they’re implementing.
You can think of it as interpretability-by-construction: Instead of training a model end-to-end and then trying to see what circuits it learned and whether they’re implementing the right thing, take smaller models that you know are implementing the right thing and compose them (with AI help) into larger systems that are correct not primarily based on empirical performance but based on a priori reasoning.
This is complementary to traditional bottom-up interpretability work: The more decomposition can limit the amount of black-box compute and uninterpretable intermediate state, the less weight rests on circuits-style interpretability and ELK-style proposals.
We don’t think we’ll be able to fully avoid end-to-end training (it’s ML’s magic juice, after all), but we think that reducing it is helpful even on the margin. From our post on supervising process, which has a lot more detail on the points in this comment: “Inner alignment failures are most likely in cases where models don’t just know a few facts we don’t but can hide extensive knowledge from us, akin to developing new branches of science that we can’t follow. With limited compute and limited neural memory, the risk is lower.”