Comment Permalink

I’d say what we’re afraid of is that we’ll have AI systems that are capable of sophisticated planning but that we don’t know how to channel those capabilities into aligned thinking on vague complicated problems. Ought’s work is about avoiding this outcome.

At this point we could chat about why it’s plausible that we’ll have such capable but unaligned AI systems, or about how Ought’s work is aimed at reducing the risk of such systems. The former isn’t specific to Ought, so I’ll point to Ajeya’s post Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover.

I just want to highlight the key assumption Ajeya’s argument rests on: The system is end-to-end optimized on a feedback signal (generally from human evaluations), i.e. all its compute is optimizing a signal that has now way to separate “fake it while in training” from “have the right intent” and so can lead to catastrophic outcomes when the system is deployed.

How does Ought’s work help avoid that outcome?

We’re breaking down complex reasoning into processes with parts that are not jointly end-to-end optimized. This makes it possible to use smaller models for individual parts, makes the computation more transparent, and makes it easier to verify that the parts are indeed implementing the function that we (or future models) think they’re implementing.

You can think of it as interpretability-by-construction: Instead of training a model end-to-end and then trying to see what circuits it learned and whether they’re implementing the right thing, take smaller models that you know are implementing the right thing and compose them (with AI help) into larger systems that are correct not primarily based on empirical performance but based on a priori reasoning.

This is complementary to traditional bottom-up interpretability work: The more decomposition can limit the amount of black-box compute and uninterpretable intermediate state, the less weight rests on circuits-style interpretability and ELK-style proposals.

We don’t think we’ll be able to fully avoid end-to-end training (it’s ML’s magic juice, after all), but we think that reducing it is helpful even on the margin. From our post on supervising process, which has a lot more detail on the points in this comment: “Inner alignment failures are most likely in cases where models don’t just know a few facts we don’t but can hide extensive knowledge from us, akin to developing new branches of science that we can’t follow. With limited compute and limited neural memory, the risk is lower.”

See in context

AMA: Ought

by stuhlmueller, jungofthewon

Aug 3 20221 min read 52

41

AI safetyOughtAsk Me Anything

Frontpage

We're Ought. We're going to answer questions here on Tuesday August 9th at 10am Pacific. We may get to some questions earlier, and may continue answering a few more throughout the week.

About us:

We're an applied AI lab, taking a product-driven approach to AI alignment.
We're 10 people right now, roughly split between the Bay Area and the rest of the world (New York, Texas, Spain, UK).
Our mission is to automate and scale open-ended reasoning. We are working on getting AI to be as helpful for supporting reasoning about long-term outcomes, policy, alignment research, AI deployment, etc. as it is for tasks with clear feedback signals.
We're building the AI research assistant Elicit. Elicit's architecture is based on supervising reasoning processes, not outcomes, an implementation of factored cognition. This is better for supporting open-ended reasoning in the short run and better for alignment in the long run.
Over the last year, we built Elicit to support broad reviews of empirical literature. We're currently expanding to deep literature reviews, then other research workflows, then general-purpose reasoning.
We're hiring for full-stack, devops, ML, product analyst, and operations manager roles.

We're down to answer basically any question, including questions about our mission, theory of change, work so far, future plans, Elicit, relation to other orgs in the space, and what it's like to work at Ought.

41 Reactions

Mentioned in

25EA Organization Updates: July-August 2022

Comments52

Sorted by

New & upvoted

Click to highlight new comments since: Today at 12:38 AM

Yonatan CaleAug 4 202217

Yay, I was really looking forward to this! <3

My first question [meant to open a friendly conversation even though it is phrased in a direct way] is "why do you think this won't kill us all?"

Specifically, sounds like you're doing a really good job creating an AI that is capable of planning through complicated vague problems. That's exactly what we're afraid of, no?

Our mission is to automate and scale open-ended reasoning

ref

My next questions would depend on your answer here, but I'll guess a few follow ups in sub-comments

Epistemic Status: I have no idea what I'm talking about, just trying to form initial opinions

stuhlmuellerAug 9 20226

How does Ought’s work help avoid that outcome?

Yonatan CaleAug 11 20224

it’s plausible that we’ll have such capable but unaligned AI systems

I simply agree, no need to convince me there 👍

Ought's approach:

Instead of giving a training signal after the entire AI gives an output,
Do give a signal after each sub-module gives an output.

Yes?

My worry: The sub-modules will themselves be misaligned.

Is your suggestion: Limit compute and neural memory of sub-models in order to lower the risk

Yonatan CaleAug 11 20222

And my second worry is that the "big AI" (the collection of sub models) will be so good that you could ask it to perform a task and it will be exceedingly effective at it, in some misaligned-to-our-values (misaligned-to-what-we-actually-meant) way

Yonatan CaleAug 4 20222

The product you are building only gives advice, it doesn't take actions

If this would be enough, couldn't we make a normal AGI and ask only ask it for advice without giving it the capability to take actions?

stuhlmuellerAug 9 20223

For AGI there isn't much of a distinction between giving advice and taking actions, so this isn't part of our argument for safety in the long run. But in the time between here and AGI it's better to focus on supporting reasoning to help us figure out how to manage this precarious situation.

Yonatan CaleAug 9 20222

Do I understand correctly: "safety in the long run" is unrelated to what you're currently doing in any negative way - you don't think you're advancing AGI-relevant capabilities (and so there is no need to try to align-or-whatever your forever-well-below-AGI system), do I understand correctly?

Please feel free to correct me!

stuhlmuellerAug 9 20222

No, it's that our case for alignment doesn't rest on "the system is only giving advice" as a step. I sketched the actual case in this comment.

Yonatan CaleAug 4 20222

Sub-models will be aligned / won't be dangerous:

1

If your answer is "sub models only run for 15 minutes" [need to find where I read this] :

If that would help aligning [black box] sub-models, then couldn't we use it to align an entire [black box] AI?

Seems to me like the sub models might still be unaligned.

2

If your answer is "[black box] sub models will get human feedback or supervision" - would that be enough to align a [black box] AI?

stuhlmuellerAug 9 20223

The things that make submodels easier to align that we’re aiming for:

(Inner alignment) Smaller models, making it less likely that there’s scheming happening that we’re not aware of; making the bottom-up interpretability problem easier
(Outer alignment) More well-specified tasks, making it easier to generate a lot of in-distribution feedback data; making it easier to do targetted red-teaming

Yonatan CaleAug 9 20222

Would you share with me some typical example tasks that you'd give a submodel and typical good responses it might give back? (as a vision, so I'll know what you're talking about when you're saying things like "well specified tasks" - I'm not sure if we're imagining the same thing there. It doesn't need to be something that already works today)

James BradyAug 10 20225

In a research assistant setting, you could imagine the top-level task being something like "Was this a double-blind study?", which we might factor out as:

Were the participants blinded?
- Was there a placebo?
  - Which paragraphs relate to placebos?
    - Does this paragraph state there was a placebo?
      - …
- Did the participants know if they were in the placebo group?
  - …
Were the researchers blinded?
- …

In this example, by the time we get to the "Does this paragraph state there was a placebo?" level, a submodel is given a fairly tractable question-answering task over a given paragraph. A typical response for this example might be a confidence level and text spans pointing to the most relevant phrases.

Yonatan CaleAug 11 20222

Thank you, this was super informative! My understanding of Ought just improved a lot

Once you're able to answer questions like that, what do you build next?
Is "Was this a double-blind study?" an actual question that your users/customers are very interested in?
1. If not, could you give me some other example that is?

James BradyAug 11 20223

You're welcome!

The goal for Elicit is for it to be a research assistant, leading to more and higher quality research. Literature review is only one small part of that: we would like to add functionality like brainstorming research directions, finding critiques, identifying potential collaborators, …

Beyond that, we believe that factored cognition could scale to lots of knowledge work. Anywhere the tasks are fuzzy, open-ended, or have long feedback loops, we think Elicit (or our next product) could be a fit. Journalism, think-tanks, policy work.
It is, very much. Answering so-called strength of evidence questions accounts for big chunks of researchers' time today.

Yonatan CaleAug 4 20222

This research prioritizes reasoning over military robots

If you answer (from here) is:

The goal of our work is to channel this growth toward good reasoning. We want AI to be more helpful for qualitative research, long-term forecasting, planning, and decision-making than for persuasion, keeping people engaged, and military robotics.

Then:

Open ended reasoning could be used for persuasion, military robotics (and creating paperclips) too, no?

stuhlmuellerAug 9 20223

We're aiming to shift the balance towards supporting high-quality reasoning. Every tool has some non-zero usefulness for non-central use cases, but seems unlikely that it will be as useful as tools that were made for those use cases.

Yonatan CaleAug 9 20227

Every tool has some non-zero usefulness for non-central use cases, but seems unlikely that it will be as useful as tools that were made for those use cases.

I agree!

supporting high-quality reasoning

This sounds to me like almost the most generic-problem-solving thing someone could aim for, capable of doing many things without going outside the general use case.

As a naive example, couldn't someone use "high quality reasoning" to plan how to make military robotics? (though the examples I'm actually worried about are more like "use high quality reasoning to create paperclips", but I'm happy to use your one)

In other words, I'm not really worried about a chess robot being used for other things [update: wait, Alpha Zero seems to be more general purpose than expected], but I wouldn't feel as safe with something intentionally meant for "high quality reasoning"

[again, just sharing my concern, feel free to point out all the ways I'm totally missing it!]

stuhlmuellerAug 11 20223

I agree that misuse is a concern. Unlike alignment, I think it's relatively tractable because it's more similar to problems people are encountering in the world right now.

To address it, we can monitor and restrict usage as needed. The same tools that Elicit provides for reasoning can also be used to reason about whether a use case constitutes misuse.

This isn't to say that we might not need to invest a lot of resources eventually, and it's interestingly related to alignment ("misuse" is relative to some values), but it feels a bit less open-ended.

Yonatan CaleAug 11 20222

[debugging further]

Do you think misuse is a concern - to the point that if you couldn't monitor and restrict usage - you'd think twice about this product direction?

Or is this more "this is a small issue, and we can even monitor and restrict usage, but even if we couldn't then we wouldn't really mind"?

Lorenzo Buonanno🔸Aug 11 20221

What are your views on whether speeding up technological development is, in general, a good thing?

I'm thinking of arguments like https://forum.effectivealtruism.org/posts/gB2ad4jYANYirYyzh/a-note-about-differential-technological-development, that make me wonder if we should try to slow research instead of speeding it up.

Or do you think that Elicit will not speed up AGI capabilities research in a meaningful way? (Maybe because it will count as misuse)

It's something I'm really uncertain about personally, that's going to heavily influence my decisions/life, so I'm really curious about your thoughts!

Luke ThorburnAug 6 20227

What are your plans for making Elicit financially sustainable? Do you intend to commercialise it? If so, what pricing model are you leaning towards?

jungofthewonAug 9 20223

We’re experimenting with collecting donations from the individual researchers who use it. We might launch spin-off products in the future that are more commercial or enable Elicit overall to be financially sustainable e.g. an API that lets research orgs run Elicit on their own documents.

Luke ThorburnAug 6 20226

Capabilities like automated reasoning and improved literature search have the potential to reinforce or strengthen the effects of confirmation bias. For example, people can more easily find research to support their beliefs, or generate new reasons to support their beliefs. Have you done much thinking about this? Is it possible this risk outweighs the benefits of tools like Elicit? How might this risk be mitigated?

James BradyAug 9 20224

Great question! Yes, this is definitely on our minds as a potential harm of Elicit.

Of the people who end up with one-sided evidence right now, we can probably form two loose groups:

People who accidentally end up with it because good reasoning is hard and time-consuming to do.
People who seek it out because they want to bolster a pre-existing belief.

For the first group – the accidental ones – we’re aiming to make good reasoning as easy (and ideally easier than) finding one-sided evidence. Work we’ve done so far:

We have a “possible critiques” feature in Elicit which looks for papers which arrive at different conclusions. These critiques are surfaced – if available – whenever a user clicks in to see more information on a paper.
We have avoided using social standing cues such as citations when evaluating papers. We do expose those data in the app, but don’t – for example – boost papers cited by others. In this way, we hope to surface relevant and diverse papers from a range of authors, whether or not they happen to be famous.
At the same time, our chosen initial set of users (professional researchers) are relatively immune to accidentally doing one-sided research, because they care a lot about careful and correct reasoning.

For the second group – the intentional ones – we expect that Elicit might have a slight advantage right now over alternative tools, but longer-term probably won’t be more useful than other search tools that use language models with retrieval (e.g. this chatbot). And the better Elicit is, and the better other tools that care about good epistemics are, the easier it will be to reveal misleading arguments by this second group.

Roman HaukssonAug 9 20224

I plan on applying for an internship position at Ought soon – is there anything you think I should know about applying, besides what is listed on your website? To be more specific:

Have you ran the ML engineering internship program before? If so, what have previous interns said about the experience?
What is the work culture like?

Charlie GeorgeAug 9 20227

Hi, I'm Charlie the current ML intern at Ought. Great to hear that you're interested in applying! With regards to your questions:

Personally, I'm really enjoying the experience. Many of the projects at Ought test the limit of current ML capabilities so you would be working on hard problems for which you can't just use a plug-and-play solution.
I have found the culture to be very positive. Everyone on the team is very friendly and supportive(in addition to being extremely talented).

LockeAug 10 20223

What's the universe of studies that elicit is searching through? I did a couple queries in domains I'm familiar with and it missed a few studies that I would have expected it to pick up. It'd be helpful if there was some sense for the user of what's being searched so they know where else to look.

stuhlmuellerAug 10 20222

Elicit is using using the Semantic Scholar Academic Graph dataset. We're working on expanding to other sources. If there are particular ones that would be helpful, message me?

LockeAug 10 20223

How do you validate Elicit's conclusion shown in the tool? Some of the data pulled from papers seems like it could easily be unkowingly wrong. For instance, consider the column on the number of participants in the paper. Do you have a protocol for manually verifying

jungofthewonAug 10 20223

Users can give feedback (thumbs up / down) when they see a wrong answer (image below). We also run evaluation with contractors and test Elicit for our own research.

Lorenzo Buonanno🔸Aug 9 20223

Thanks for the AMA!

I found your factored cognition project really interesting, is anyone still researching this? (besides the implementation in Elicit)

Are you currently collaborating with other EA orgs doing research?
What are the biggest success stories of people using Elicit?

jungofthewonAug 9 20227

Thanks for your question!

I found your factored cognition project really interesting, is anyone still researching this? (besides the implementation in Elicit)

Outside of Elicit, not sure. johnwentworth implied there are new researchers interested in this space.

Are you currently collaborating with other EA orgs doing research?

Nothing formal at the moment but we study a lot of independent EA researchers closely. Researchers at GiveWell and Happier Lives Institute have been particularly helpful recently. In the past, we’ve also worked closely with organizations like CSET, READI, and Effective Thesis.

What are the biggest success stories of people using Elicit?

Unfortunately I don’t have explicit permission to share details about them right now but will try to gesture.

We have many everyday success stories - Elicit saving people a ton of time, helping people ramp up in new domains, showing them research they didn’t find anywhere else (including on Google Scholar).

Elicit helped one researcher refine their PhD dissertation proposal questions, another respond to a last-minute peer review request, another prep an investor presentation, and another find a bunch of different parameters to determine carbon metrics for a forest restoration grant proposal.

You can see some of the success measures & testimonials linked here.

Lorenzo Buonanno🔸Aug 9 20222

Thanks so much and kudos for sharing the LessWrong post, even if it's unjustifiably uncharitable it's an interesting perspective.

jungofthewonAug 9 20222

Yea once we're done here we might go back over there and write some comments :P I agree that it's an interesting perspective. I also liked the comments!

stuhlmuellerAug 9 20226

I found your factored cognition project really interesting, is anyone still researching this? (besides the implementation in Elicit)

Some people who are explicitly interested in working on it: Sam Bowman at NYU, Alex Gray at OpenAI. On the ML side there’s also work like Selection-Inference that isn’t explicitly framed as factored cognition but also avoids end-to-end optimization in favor of locally coherent reasoning steps.

Lorenzo Buonanno🔸Aug 9 20222

Wow, super happy to hear that, thanks!

stuhlmuellerAug 9 20222

Oh, forgot to mention Jonathan Uesato at Deepmind who's also very interested in advancing the ML side of factored cognition.

simeon_cAug 9 20223

Why do you think your approach is better than working straight on alignment?

stuhlmuellerAug 9 20224

To clarify, here’s how I’m interpreting your question:

“Most technical alignment work today looks like writing papers or theoretical blog posts and addressing problems we’d expect to see with more powerful AI. It mostly doesn’t try to be useful today. Ought claims to take a product-driven approach to alignment research, simultaneously building Elicit to inform and implement its alignment work. Why did Ought choose this approach instead of the former?”

First, I think it’s good for the community to take a portfolio approach and for different teams to pursue different approaches. I don’t think there is a single best approach, and a lot of it comes down to the specific problems you’re tackling and team fit.

For Ought, there’s an unusually good fit between our agenda and Elicit the product—our whole approach is built around human-endorsed reasoning steps, and it’s hard to do that without humans who care about good reasoning and actually want to apply it to solve hard problems. If we were working on ELK I doubt we’d be working on a product.

Second, as a team we just like building things. We have better feedback loops this way and the nearer-term impacts of Elicit on improving quality of reasoning in research and beyond provide concrete motivation in addition to the longer-term impacts.

Some other considerations in favor of taking a product-driven approach are:

Deployment plans help us choose tasks. We did “pure alignment research” when we ran our initial factored cognition experiments. At the time, choosing the right task felt about as hard as choosing the right mechanism or implementing it correctly. For example, we want to study factored cognition - should we factor reasoning about essays? SAT questions? Movie reviews? Forecasting questions? When experiments failed, it was hard to know whether we could have stripped down the task more to better test the mechanism, or whether the mechanism in fact didn’t solve the problem at hand. Our findings seemed brittle and highly dependent on assumptions about the task, unlikely to hold up in future deployment scenarios. Now that we have a much clearer incremental deployment story in mind we can better think about what research is more or less likely to be useful. FWIW I suspect this challenge of task specification is a pretty underrated obstacle for many alignment researchers.
Eventually we’ll have to cross the theory-practice gap. At some point alignment research will likely have to cover the theory-practice gap. There are different ways to do this - we could first develop theoretical foundations, then basic algorithms, then implement them in the real-world, or co-evolve the different parts and propagate constraints between them as we go. I think both ways have pros and cons, but it seems important that some groups pursue the latter, especially in a world with short AI timelines.

Risks with trying to do both are:

Balancing multiple stakeholders. Sometimes an impressive research demonstration isn’t actually useful in the short term, or the most useful things for users don’t teach us anything new about alignment. Models are barely capable enough to be acceptable stand-ins for crowd workers, which limits what we can learn; conversely, the best way to solve some product problems could just be to scale up the models. Overall, our product-research dynamic has been net positive and creates virtuous cycles where the product grounds the research and the research improves the product. But it’s a fine line to tread and a tension we have to actively manage. I can easily imagine other teams or agendas where this would be net negative. I imagine we’d also have a harder time making good choices here if we were a for-profit.
Being too solution-driven. From a product angle, I sometimes worry that we might over-apply the decomposition hammer. But an important part of our research goal is understanding where decomposition is useful / competitive / necessary so it’s probably fine as long as we course-correct quickly.

James BradyAug 10 20222

Another benefit of our product-driven approach is that we aim to provide a positive contribution to the alignment community. By which I mean:

Thanks to amazing prior work in straight alignment research, we already have some idea of anti-patterns and risks that we all want to avoid. What we're still lacking are safety attractors: i.e. alternative approaches which are competitive with and safer than the current paradigm.

We want for Elicit to be an existence proof that there is a better way to solve certain complex tasks, and for our approach to go on to be adopted by others – because it's in their self-interest, not because it's safe.

Dan EltonAug 4 20223

There are and have been a lot startups working on similar things (AI to assist researchers), going back to IBM's ill-fated Watson. Your demo makes it look very useful and is definitely the most impressive I've seen. I'm deeply suspicious of demos, however.

How can you test if your system is actually useful for researchers?

[One (albeit imperfect) way to gauge utility is to see if people are willing to pay money for it and keep paying money for it over time. However, I assume that is not the plan here. I guess another thing would be to track how much people use it over time or see if they fall away from using it. Another of course would be an RCT, although it's not clear how it would be structured.]

jungofthewonAug 9 20223

This is a live product - not just a demo! You can use it at elicit.org.

More than 45K users have tried it and ~ 10K use it each month. Users say that Elicit saves them ~ 1-2 hours / week. They proactively share positive feedback on places like Twitter and with their colleagues or friends: Elicit’s growth is entirely by word of mouth.

I agree that having people pay for it is one of the greatest indicators of value. We’ll have to balance financial sustainability with the desire to make high-quality accessible.

At some point, we probably will do a more formal evaluation e.g. RCT type study.

Sharang PhadkeAug 12 20222

Hi, thanks for doing this!

Could you share the origin / founder stories behind Ought and Elicit? I think it can be helpful for the community to hear how interesting new EA projects come together and get off the ground.

Also really enjoy using Elicit!

niplavAug 8 20222

You seem to have abandoned Ergo. What were the reasons for doing so, and what would the maximal price you'd be willing to pay (in person-consulting-hours) to keep it maintained?

I'm asking because I have worked on a tentative prototype for a library intended to make it easier to deal with forecasting datasets which, if finished, would duplicate some of the work that went into Ergo, and I was considering merging the two.

stuhlmuellerAug 9 202213

We built Ergo (a Python library for integrating model-based and judgmental forecasting) as part of our work on forecasting. In the course of this work we realized that for many forecasting questions the bottleneck isn’t forecasting infrastructure per se, but the high-quality research and reasoning that goes into creating good forecasts, so we decided to focus on that aspect.

I’m still excited about Ergo-like projects (including Squiggle!). Developing it further would be a valuable contribution to epistemic infrastructure. Ergo is an MIT-licensed open-source project so you can basically do whatever you want with it. As a small team we have to focus on our core project, but if there are signs of life from an Ergo successor (5+ regular users, say) I’d be happy to talk for a few hours about what we learned from Ergo.

niplavAug 9 20221

That sounds promising! I might get back to you on that :-)

vishalAug 6 20222

Are you an EA org? :)

jungofthewonAug 9 20225

Ought is a team of people, each of whom have their own worldviews, values, and communities. For some Oughters, being an EA is an important part of their identity. Others have only heard about it recently!

I think everyone at Ought cares about being effective and altruistic though :)

L4Z3RAug 9 20221

Is there anything like cybersecurity for AI? I've been exploring quite a bit about it but I don't seem to find any organisations that work on it. Also, is there any cybersecurity role in your organization?

LinchAug 10 20222