Critiques of prominent AI safety labs: Redwood Research

Omega

^{^}

We go into more detail on this in a follow-up comment.

^{^}

We cannot help but be reminded of Frank H. Westheimer's advice to his research students: “Why spend a day in the library when you can learn the same thing by working in the laboratory for a month?"

^{^}

Thanks to Jacob Steinhardt for helping us clarify this point.

^{^}

As a benchmark example, Sergey Levine’s lab at UC Berkeley published 5 papers of comparable quality to the Redwood papers in 2022 (and 30 papers total, although the others were substantially lower quality, and note that the papers aren’t as relevant to alignment). Sergey Levine’s lab has a substantially lower budget than Redwood’s. However, in defense of Redwood, Sergey’s lab does have a head count comparable to or larger than Redwood: it is currently listed as comprising 2 post-docs, 22 graduate students and 29 (part-time) undergraduate researchers.

^{^}

For example, a speaker at the ML Winter Camp that took place in Berkeley in winter 2022-2023 stated that they believed that the only person with a good research agenda was Paul Christiano, and he sent all his research ideas to Redwood. They then went on to say that the best thing for the participants to aim for was working for Redwood (or, if they were smart enough, ARC - but they weren’t smart enough). This reminds us a lot of the rhetoric from individuals talking to EA groups, and at AIRCS and CFAR workshops around MIRI’s research around 2015-2017. MIRI had not produced much legible work (eventually announcing they were non-disclosed by default) and people would essentially base their recommendations on trusting the MIRI staff. Eventually MIRI said that they failed at their current research directions, and there was a general switch in focus to large language models.

^{^}

Redwood Research commented that they view their causal scrubbing work as more significant. We view this work as substantially more novel and working on an important problem (evaluating mechanistic interpretability explanations), but we’re unsure as to the degree to which causal scrubbing will provide a tractable solution to this.

^{^}

More in this comment, thank you to @FayLadybug for pointing this out.

^{^}

8/20 grad students / postdoc researchers at CHAI are mostly x-risk focused, plus a few ops staff and Stuart Russell

^{^}

We couldn’t find a public statement on the topic (this post briefly mentions it), but this is common knowledge amongst the TAIS community

jsteinhardt

149

I'll briefly comment on a few parts of this post since my name was mentioned (lack of comment on other parts does not imply any particular position on them). Also, thanks to the authors for their time writing this (and future posts)! I think criticism is valuable, and having written criticism myself in the past, I know how time-consuming it can be.

I'm worried that your method for evaluating research output would make any ambitious research program look bad, especially early on. Specifically:

The failure of Redwood's adversarial training project is unfortunately wholly unsurprising given almost a decade of similarly failed attempts at defenses to adversarial robustness from hundreds or even thousands of ML researchers.

I think for any ambitious research project that fails, you could tell a similarly convincing story about how it's "obvious in hindsight" it would fail. A major point of research is to find ideas that other people don't think will work and then show that they do work! For many of my most successful research projects, people gave me advice not to work on them because they thought it would predictably fail, and if I had failed then they could have said something similar to what you wrote above.

I think Redwood's failures here are ones of execution and not of problem selection--I thought the problem they picked was pretty interesting but they could have much more quickly realized the particular approaches they were taking to it were unlikely to pan out. If they had done that, perhaps they would have switched to other approaches that ended up succeeding, or just pivoted to interpretability faster. In any case, I definitely wouldn't want to discourage them or future organizations from using a similar problem selection process.

(If you asked a random ML researcher if the problem seemed feasible, they would have said no. But I wouldn't have used that as a reason not to work on the project.)

CTO Buck Shlegeris has 3 years of software engineering experience and a limited ML research background.

My personal judgment is that Buck is a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning. I do think Buck will do best in a setting where he's either paired with a good empirical ML researcher or gains more experience there himself (he's already gotten a lot better in the past year). But overall I view Buck as on par with a research scientist at a top ML university.

Omega

My personal judgment is that Buck is a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning. I do think Buck will do best in a setting where he's either paired with a good empirical ML researcher or gains more experience there himself (he's already gotten a lot better in the past year). But overall I view Buck as on par with a research scientist at a top ML university.

Thank you for this comment, some of the contributors of this post have updated their views of Buck as a researcher as a result.

Omega

Thanks for this detailed comment Jacob. We're in agreement with your first point, but on re-reading the post we can see why it seems like we think the problem selection was also wrong - we don't believe this. We will clarify the distinction between problem selection and execution in the main post soon.

Our main concerns was that we think it is important, when working on a problem where a lot of prior research has been done, to come in to it with a novel approach or insight. We think its possible the team could have done this via a more thorough literature review or engaging with domain experts. Where we may disagree is that our suggestion of doing more desk research before hand might result in researchers dismissing ideas too easily, and thus experimenting and learning less.

We think this is definitely possible, but feel it can be less costly in some cases, and in particular could have been useful in the case of the adversarial training project. As we write later on in the passage you quoted above, we think that the problem with the adversarial training project was that we think Redwood focused on an unusually challenging threat model (unrestricted adversarial examples), and although we think there were some aspects of the textual domain that make the problem easier, the large number of textual adversarial attacks indicated it was unlikely to be sufficient.

jsteinhardt

Thanks for this! I think we still disagree though. I'll elaborate on my position below, but don't feel obligated to update the post unless you want to.

* The adversarial training project had two ambitious goals, which were the unrestricted threat model and also a human-defined threat model (e.g. in contrast to synthetic L-infinity threat models that are usually considered).
* I think both of these were pretty interesting goals to aim for and at roughly the right point on the ambition-tractability scale (at least a priori). Most research projects are less ambitious and more tractable, but I think that's mostly a mistake.
* Redwood was mostly interested in the first goal and the second was included somewhat arbitrarily iirc. I think this was a mistake and it would have been better to start with the simplest case possible to examine the unrestricted threat model. (It's usually a mistake to try to do two ambitious things at once rather than nailing one, moreso if one of the things is not even important to you.)
* After the original NeurIPS paper Redwood moved in this direction and tried a bunch of simpler settings with unrestricted threat models. I was an advisor on this work. After several months with less progress than we wanted, we stopped pursuing this direction. It would have been better to get to a point where we could make this call sooner (after 1-2 months). Some of the slowness was indeed due to unfamiliarity with the literature, e.g. being stuck on something for a few weeks that was isomorphic to a standard gradient hacking issue. My impression (not 100% certain) is Redwood updated quite a bit in the direction of caring about related literature as a result of this, and I'd guess they'd be a lot faster doing this a second time, although still with room to improve.

Note by academic standards the project was a "success" in the sense of getting into NeurIPS, although the reviewers seemed to most like the human-defined aspect of the threat model rather than the unrestricted aspect.

Omega

This section has now been updated

NunoSempere

119

the quantity and quality of output is underwhelming given the amount of money and staff time invested.
Of Redwood’s published research, we were impressed by Redwood's interpretability in the wild paper, but would consider it to be no more impressive than progress measures for grokking via mechanistic interpretability, executed primarily by two independent researchers, or latent knowledge in language models without supervision, performed by two PhD students.^[4] These examples are cherry-picked to be amongst the best of academia and independent research, but we believe this is a valid comparison because we also picked what we consider the best of Redwood's research and Redwood's funding is very high relative to other labs.

I'm missing a lot of context here, but my impression is that this argument doesn't go through, or at least is missing some steps:

We think that the best Redwood research is of similar quality to work by [Neel Nanda, Tom Lieberum and others, mentored by Jacob Steinhardt]
Work by those others doesn't cost $20M
Therefore the work by Redwood shouldn't cost $20M

Instead, the argument which would go through would be:

Open Philanthropy spent $20M on Redwood Research
That $20M produced [such and such research]
This is how you could have spent $20M to produce [better research]
Therefore, Open Philanthropy shouldn't have spent $20M on Redwood Research, but instead on [alternatives]
1. (or spent $20M on [alternatives] and on Redwood Research, if the value of Redwood Research is still above the bar)

But you haven't shown step 3, the tradeoff against the counterfactual. It seems likely that the situation is such that producing good AI safety research depends on somewhat idiosyncratic non-monetary factors. Sometimes you will find a talented independent researcher or a PhD student that will produce quality research for relatively small amounts of money, sometimes you will spend $20M to get an outcome of a similar quality. I could see that being the case if the bottleneck isn't money, which seems plausible.

Also note that building an institution is potentially much more scalable than funding one-off independent researchers.

As I said, I'm missing lots of context (i.e., I haven't read Redwood's research, seems within the normal range of possibility that it wouldn't be worth $20M), but I thought I'd give my two cents.

Neel Nanda

Neel Nanda, Tom Lieberum and others, mentored by Jacob Steinhardt

I will clarify in my personal case that I did the grokking work as an independent research project and that Jacob only became involved in the project after I had done the core research, and his mentorship was specifically about the process of distillation and writing up the results (to be clear, his mentorship here was high value! But I think that the paper benefited less from his mentorship than is implied by the reference class of having him as the final author)

jsteinhardt

I agree with this.

NunoSempere

-5

Cheers

NunoSempere

Also, no reputational harm intended, sorry.

Nate Thomas

Re your point about "building an institution" and step 3: We think the majority of our expected value comes from futures in which we produce more research value per dollar than in the past.

(Also, just wanted to note again that $20M isn't the right number to use here, since around 1/3rd of that funding is for running Constellation, as mentioned in the post.)

Omega

Thanks for mentioning the $20M point Nate - I've edited the post to make this a little more clear and would suggest people use $14M as the number instead.

NunoSempere

-3

Cheers

Omega

Meta note: We believe this response is the 80/20 in terms of quality vs time investment. We think it’s likely we could improve the comment with more work, but wanted to share our views earlier rather than later.

We think one thing we didn’t spell out very explicitly in this post, was the distinction between 1) how effectively we believed Redwood spent their resources and 2) whether we think OP should have funded them (and at what amount). As this post is focused on Redwood, I’ll focus more on 1) and comment briefly on 2) - but note that we plan to expand on this further in a follow-up post. We will add a paragraph which disambiguates between these two points more clearly.

Argument 1): We think Redwood could produce at least the same quality and quantity of research, with fewer resources (~$4-8 million over 2 years)

The key reasons we think 1) are:

If they had more senior ML staff or advisors, they could have avoided some mistakes on their agenda that we see as avoidable. This wouldn’t necessarily come at a large monetary cost given their overall budget (around $200-300K for 1 FTE).
We estimate as much as 25-30% of their spending went towards scaling up projects (e.g. REMIX) before they had a clear research agenda they were confident in. To be fair to Redwood, this premature scaling was more defensible prior to the FTX collapse when the general belief was that there was a "funding overhang". Nate in his comment also mentions that scaling was raised by both Holden and Ajeya (at OP), and now sees this as an error on their part.

Argument 2): OP should have spent less on Redwood, 2a) and there were other comparable funding opportunities

The key reasons we think 2) are:

There are other TAIS labs (academic and not) that we believe could absorb and spend considerably more funding than they currently receive. Example non-profits include CAIS and FAR AI and underfunded safety-interested academic groups include David Krueger and Dylan Hadfield-Menell's groups. Opportunities are more limited if focusing specifically on interpretability, but there are still a number of promising options. For example, Neel Nanda mentioned three academics he considers do good interpretability work: OP has funded one of them (David Bau) but as far as we know not the other two (of course, they may not have room for more funding, or OP may have investigated and decided not to fund them for other reasons).

A key reason OP may not think some of these labs are worth funding on the margin is that they are substantially more bullish on certain safety research agendas than others. We have some concerns about how the OP LT team decide which agendas to support but will explore this further in our Constellation post, so won’t comment in more depth at this point. As one of the main funders of TAIS work, in a field which is very speculative and new, we think OP should be more open to a broad range of research agendas than they are.
We think that small, young organizations without a track record beyond founder reputation should in general be given smaller grants and build up a track record before trying to scale. We think it’s plausible that several of the issues we pointed out could have been mitigated by this funding structure.

Neel Nanda

There are other TAIS labs (academic and not) that we believe could absorb and spend considerably more funding than they currently receive.

My understanding is that, had Redwood not existed, OpenPhil would not have significantly increased their funding to these other places, and broadly has more money than they know what to do with (especially in the previous EA funding environment!). I don't know whether those other places have applied for grants, or why they aren't as funded as they could be, but this doesn't seem that related to me. And more broadly there are a bunch of constraints on grant makers like time to evaluate a grant, having enough context to competently evaluate it or external advisors with context who they trust, etc. Eg, I'm a bit hesitant about funding Interpretability academics who I think will go full steam ahead on capabilities (I think it's often worth doing anyway, but not obvious to me, and the one time I recommended a grant here it did consume quite a lot of my time to evaluate the nuances)

And that grant making is just really not an efficient market, and there's lots of good grants that don't happen fordumb reasons

Concretely, it's plausible to me that taking themarginal 1 million given to Redwood and dividing it evenly among the other labs you mention seems good. But that doesn't feel like the right counterfactual here.

jsteinhardt

To push back on this point, presumably even if grantmaker time is the binding resource and not money, Redwood also took up grantmaker time from OP (indeed I'd guess that OP's grantmaker time on RR is much higher than for most other grants given the board member relationship). So I don't think this really negates Omega's argument--it is indeed relevant to ask how Redwood looks compared to grants that OP hasn't made.

Personally, I am pretty glad Redwood exists and think their research so far is promising. But I am also pretty disappointed that OP hasn't funded some academics that seem like slam dunks to me and think this reflects an anti-academia bias within OP (note they know I think this and disagree with me). Presumably this is more a discussion for the upcoming post on OP, though, and doesn't say whether OP was overvaluing RR or undervaluing other grants (mostly the latter imo, though it seems plausible that OP should have been more critical about the marginal $1M to RR especially if overhiring was one of their issues).

richard_ngo

I am also pretty disappointed that OP hasn't funded some academics that seem like slam dunks to me and think this reflects an anti-academia bias within OP (note they know I think this and disagree with me).

My prior is that people who Jacob thinks are slam-dunks should basically always be getting funding, so I'm pretty surprised by this anecdote. (In general I also expect that there are a lot of complex details in cases like these, so it doesn't seem implausible that it was the right call, but it seemed worth registering the surprise.)

Ajeya

I work at Open Philanthropy, and in the last few months I took on much of our technical AI safety grantmaking.

In November and December, Jacob sent me a list of academics he felt that someone at Open Phil should reach out to and solicit proposals from. I was interested in these opportunities, but at the time, I was full-time on processing grant proposals that came in through Open Philanthropy's form for grantees affected by the FTX crash and wasn't able to take them on.

This work tailed off in January, and since then I've focused on a few bigger grants, some writing projects, and thinking through how I should approach further grantmaking. I think I should have reached out to at least a few of the people Jacob suggested earlier (e.g. in February). I didn't make any explicit decision to reject someone that Jacob thought was a slam dunk because I disagreed with his assessment — rather, I was slower to reach out to talk to people he thought I should fund than I could have been.

I plan to talk to several of the leads Jacob sent my way in Q2, and (while I would plan to think through the case for these grants myself to the extent I can) I expect to end up agreeing a lot with Jacob's assessments.

With that said, Jacob and I do have more nebulous higher-level disagreements about things like how truth-tracking academic culture tends to be and how much academic research has contributed to AI alignment so far, and in some indirect way these disagreements probably contributed to me prioritizing these reach outs less highly than someone else might have.

Neel Nanda

This seems fair, I'm significantly pushing back on this as criticism of Redwood, and as focus on the "Redwood has been overfunded" narrative. I agree that they probably consumed a bunch of grant makers time, and am sympathetic to the idea that OpenPhil is making a bunch of mistakes here.

I'm curious which academics you have in mind as slam dunks?

Omega

Thanks Nuno, I'm sharing this comment with the other contributors and will respond in depth soon. I think you're right that we could be more explicit on 3).

NunoSempere

-3

Cheers

Nate Thomas

Thanks to the authors for taking the time to think about how to improve our organization and the field of AI takeover prevention as a whole. I share a lot of the concerns mentioned in this post, and I’ve been spending a lot of my attention trying to improve some of them (though I also have important disagreements with parts of the post).

Here’s some information that perhaps supports some of the points made in the post and adds texture, since it seems hard to properly critique a small organization without a lot of context and inside information. (This is adapted from my notes over the past few months.)

Most importantly, I am eager to increase our rate of research output – and critically to have that increase be sustainable because it’s done by a more stable and well-functioning team. I don’t think we should be satisfied with the current output rate, and I think this rate being too low is in substantial part due to not having had the right organizational shape or sufficiently solid management practices (which, in empathy with the past selves of the Redwood leadership team, is often a tricky thing for young organizations to figure out, and is perhaps especially tricky in this field).

I think the most important error that we’ve made so far is trying to scale up too quickly. I feel bad about the ways in which this has contributed to people who’ve worked here having an unexpectedly bad experience. I believe this was upstream of other organizational mistakes and that it put stress on our relative inexperience in management. While having fewer staff gives fewer people a chance to have roles working on our type of AI alignment research, I expect it will help increase the management quality per person. For example, I think there will be more and better opportunities for researchers at Redwood to grow, which is something I’ve been excited to focus on. I think scaling too quickly was somewhat downstream of not having an extremely clear articulation of what specific flavor of research output we are aiming to produce and, in turn, having a tested organization that we believe reliably produces those outputs.

I think this was an unforced error on our part – for example, Holden and Ajeya expressed concerns to me about this multiple times. My thinking at the time was something like “this sure seems like a pretty confusing field in a lot of ways, and (something something act-omission bias) I’m worried that if we chose an unrealistically high standard for clarity to gate on for organizational growth, then we might learn more slowly than we might otherwise, and fail to give people opportunities to contribute to the field.” I now think that I was wrong about this.

With that said, I’ll also briefly note some of the ways I disagree with the content and framing of this post:

We think our “causal scrubbing” work is our most significant output so far – substantially more important than, for example, our “Interpretability in the Wild” work.
At the beginning of our adversarial training project, we reviewed the literature (including the papers in the list that the above post links to) and discussed the project proposal with relevant experts. I think we made important mistakes in that project, but I don’t think that we failed to understand the state of the field.

I am moderately optimistic about Redwood’s current trajectory and our potential to contribute to making the future go well. I feel substantially better about the place that we’re in now relative to where we were, say, 6 months ago. We remain a relatively young organization making an unusual bet.

I really appreciate feedback, and if anyone reading this wants to send feedback to us about Redwood, you can email info at rdwrs.com or, if you prefer anonymity, visit www.admonymous.co/redwood.

[anonymous]

Ditto pseudonym, I recognize from another comment that there is an upcoming Constellation post from the original poster and a more effortful response forthcoming there, but I still think that despite receiving this piece in advance I am kind of surprised the following were not responded to?

Lack of Senior ML Research Staff
Lack of Comm... w/ ML Community
Conflicts of interest with funders

I guess people are busy and this is not a priority - seems like people are mostly thinking about Underwhelming Research Output (and Nate himself seems to say as much here)

pseudonym

Hi Nate, can you comment a bit more about this section?

We’ve heard multiple cases of people being fired after something negative happens in their life (personal, conflict at work, etc) that causes them to be temporarily less productive at work. While Redwood management have made some efforts to offer support to staff (e.g. offering unpaid leave on some occasions), we believe it may not have been done consistently, and are aware of cases where termination happened with little warning.

I feel like this would be among the more negative updates I would make about Redwood if true, but think it would be possible that there are differences in how a specific event is seen by different parties. Specifically, these seem to reflect weaker organizational or management practices that aren't to do with Redwood making an "unusual bet" (though relevant to it being a young organization).

Specifically:

Has Redwood ever terminated someone for losing productivity that they otherwise wouldn't have, due to a personal life event?
Does Redwood have a policy around leave that includes support for personal life events?
Does Redwood have a clear termination process including warnings before a termination where reasonable, and opportunities for an employee to course-correct with the support of the organization?

Oliver Balfour

Does Redwood have a clear termination process including warnings before a termination where reasonable

I think I'm an unusual case, but I found out a short term contract had been ended early through an automated email, and I received no response when contacting several Redwood staff to check if I had been terminated.

I think this is very uncharacteristic though: they're all good people and I'm net optimistic about Redwood's future. I think they can improve their communication around hiring/trialling/firing processes though.

Edit: I've chatted with Buck and it seems like this was a communication problem.

NickLaing

I know nothing about this organisation, and very little about this field, but this is an impressively humble and open response from a leader of an org in the face of a very critical article. No comment on content, but I appreciate the approach @Nate Thomas

ElizabethBarnes

Thanks for taking the time to write thoughtful criticism. Wanted to add a few quick notes (though note that I'm not really impartial as I'm socially very close with Redwood)

- I personally found MLAB extremely valuable. It was very well-designed and well-taught and was the best teaching/learning experience I've had by a fairly wide margin
- Redwood's community building (MLAB, REMIX and people who applied to or worked at Redwood) has been a great pipeline for ARC Evals and our biggest single source for hiring (we currently have 3 employees and 2 work triallers who came via Redwood community building efforts).
- It was also very useful for ARC Evals to be able to use Constellation office space while we were getting started, rather than needing to figure this out by ourselves.
- As a female person I feel very comfortable in Constellation. I've never felt that I needed to defer or was viewed for my dating potential rather than my intellectual contributions. I do think I'm pretty happy to hold my ground and sometimes oblivious to things that bother other people, so that might not be a very strong evidence that it isn't an issue for other people. However, I have been bothered in the past by places that try to make up the gender balance by hiring a lot of women for non-technical roles. In these places, people assume that the women who are there are non-technical. I think it would make the environment worse for me personally if there was pressure for Constellation to balance the gender ratios.
- I think there have been various ways in which Redwood culture and management style were not great. I think some of this was due to difficult tradeoffs or normal challenges of being a new organization, and some of it was unforced errors. I think they are mostly aware of the issues and taking steps to fix them, although I don't think I expect them to be excellent at management that soon. Some of my recommendations (which I've told them before and think they have mostly taken on board):
-- If Buck is continuing to manage people (and maybe also if not), he should get management coaching
-- Give employees lots of concrete positive feedback (at least once per week)
-- When letting people go, be very clear that hiring is noisy, people perform differently at different organizations; Redwood is a challenging and often low-management environment that, like a PhD program, is not a good fit for everyone; they shouldn't be too discouraged. (I think Redwood believes this but hasn't been as clear as they could be about communicating it)
-- Make sure expectations are clear for work trials
-- Make growth for their employees a serious priority, especially for their top performers - this should be something that is done deliberately with time set aside for it

Neel Nanda

I personally found MLAB extremely valuable. It was very well-designed and well-taught and was the best teaching/learning experience I've had by a fairly wide margin

Strong +1, I was really impressed with the quality of MLAB. I got a moderate amount out of doing it over the summer, and would have gotten much much more if I had done it a year or two before. I think that kind of outreach is high value, though plausibly a distraction from the core mission

Larks

About 10+ people (5 Constellation members) have mentioned that there is social pressure to defer or act a certain way when at Constellation.

At least as written, this is so broad as to be effectively meaningless. All organisations exert social pressure on members to act in a certain way (e.g. to wipe down the exercise machines after use). Similarly, basically all employers require some degree of deference to management; typical practice is that management solicit feedback from workers but in turn compliance with instructions is mandatory.

What you describe could be bad... or it could be totally typical. There's no real way for the reader to judge based on what you've written.

Omega

Hi Larks, thanks for the pushback here. We agree that this is hard to judge. Unfortunately, some of what this was was about the general atmosphere of the place which is unfortunately a bit fuzzy.

People said they feel a pressure conform / defer to these people as well for example at lunchtime conversations. People have also said they can't act as free or as loose as they would like in Constellation. So it's maybe something like feeling like you have to behave in a certain way or in line with what you perceive the funders and senior leadership want in order to fit in.

Although this may be present in other offices, we think this pressure is more pronounce at Constellation than other coworking spaces like the Open Phil offices or Lightcone, where we think there is more of an ability to say and do what you want.

We know this probably isn't as satisfying as it could be, but appreciate you taking the time to point this out and we will edit the post to acknowledge this.

Omega

Update: the post has been edited.

richard_ngo

One quick point: I feel pretty confused about the "Lack of Senior ML Research Staff" criticism. Senior ML research staff are one of the biggest bottlenecks in alignment, and so this feels particularly un-actionable as a criticism, especially given that you're leading with it. (That's particularly true when it comes to hiring for full-time roles, but I expect also relevant when it comes to recruiting good advisors.)

You concretely note that Redwood "terminated some of their more experienced ML research staff", but once you've hired somebody you get a huge amount of data on their performance on many different axes, which makes it hard to interpret this as a bias against experienced ML researchers.

DanielFilan

Seems like to the degree it's valid, it's actionable for people who might consider working with or funding Redwood.

MathiasKB🔸

Good critique, my main conclusion is that redwood seems reasonable overall and not far out of line from other ai safety orgs. Benchmarked against non-ai safety orgs, I would have my usual critique that redwood (and other longtermist orgs) seems unreasonably expensive for reasons I don't quite understand. Does salary really make that big a difference in attracting talent? If that is the case, what does that say about our community's values?

In any case, remember that every org has issues. When listing every issue an org has in a row it can give an impression of things being worse than they really are. Would love a similar critique be made of the organization I co-founded once we grow to a similar size. More critique is good for the community.

We should be able to write scathing criticisms without getting mad at each other. We need to be able to read criticisms and not go completely ham and want to see the org and everyone associated guillotined.

Linch

Benchmarked against non-ai safety orgs, I would have my usual critique that redwood (and other longtermist orgs) seems unreasonably expensive for reasons I don't quite understand. Does salary really make that big a difference in attracting talent? If that is the case, what does that say about our community's values?

Can you say more about what your implicit benchmark actually is here? Taken literally, "non-ai safety orgs" possibly describes almost all human organizations.

aog

Startups would be another good reference class. VCs are incentivized to scale as fast as possible so they can cash out and reinvest their money, but they rarely give a new organization as much money as Redwood received.

Startups usually receive a seed round of ~$2M cash to cover the first year or two of business, followed by ~$10M for Series A to cover another year or two. Even Stripe, a VC wunderkind that’s raised billions privately while scaling to thousands of employees around the world, began with $2M for their first year, $38M for the next three years (2012-2014), and $70M for the next two years after that.

I’m not sure how long Redwood’s $21M is meant to cover, but if it’s less than a period of 4 years, then they’re spending more than the typical 5M/year for a Series A startup. There’s a good argument to be made that OP can be more risk tolerant than most VCs and take a big swing on scaling Redwood quickly. But beyond cost-effectiveness, another downside of fast funding is that scaling organizations effectively is very difficult, and it could be counterproductive to hire quickly before you have senior management in place with clear lines of tractable work.

Some numbers here (https://www.investopedia.com/articles/personal-finance/102015/series-b-c-funding-what-it-all-means-and-how-it-works.asp) and here (https://www.fundz.net/what-is-series-a-funding-series-b-funding-and-more). For Stripe funding numbers, google crunchbase Stripe Seed / Series A / Series B.

Linch

Thanks, this is helpful. One thing to flag is that I wouldn't find the 2012-2014 numbers very convincing; my impression is that VC funding increased a lot until 2022, and 2021 was a year where capital was particularly cheap, for reasons that in hindsight were not entirely dissimilar to why longtermist EA was (relatively) well-funded in the last two years.

aog

Yep that's a good point. Here's one source on it, funding amounts definitely increased throughout the 2010s. An alternative explanation could be that valuations have increased more than funding amounts. There's some data to support this, but you'd need a more careful comparison of startups within the same reference class to be sure.

Startup Funding Explained: Pre-seed, Seed, & Series A-D - Finmark

One chart shows how seed stage valuations were a rare bright spot in VC during a turbulent period | Fortune

Linch

Thanks, appreciate the concrete data!

NunoSempere

I appreciate this comment for giving concrete data that improves my model of the world. Thanks.

NickLaing

Maybe his own org and other global development orgs? I think it's almost always a mistake for a non-profit to get this much money this quickly, regardless of how much potential they have or the good reputation of their founders. It is difficult to gradually build an org and organically make the inevitable mistakes when you are given 10 million dollars in the first year.

I won't speak for @MathiasKB , but these agree some of my benchmarks outside the AI realm - he can share what he means :).

The Center for Effective aid policy Matthias and co is a brand new org, so don't have evidence of outputs or financials yet. They were given 170,000US to start up. To many in the development world even 170k might still seem like a lot for an NGO to start with, but it's still a lot less than 10 million.
Last year our org OneDay Health which has a decent chance of being effective employed 43 staff, launched 8 Health centers, treated 50,000 patients in the most remote rural parts of Uganda, and our total expenditure for the year was $104,000 US dollars.
If we are looking at a development org with a budget on a similar scale, Last Mile Health has a 10 year track record, grew steadily, has won countless awards (Social innovation, TED prizes etc), has been a crucial part of the global movement for rolling out community health workers impacting improving health access to millions of people accross 5+ countries, and employs hundreds of people both in the US and developing world. They spent about 26 million dollars last year. Which is a lot of money, but in the ballpark of Redwood research and only after many years of high performance, proven recognition and growth.

Even as a global development guy, I think AI alignment research is important, but it is somewhat hard to understand why it's a good idea for a new, small org like this to get this much money from the getgo. Perhaps start with 1 million in the first year with the CEO and co-founder taking a low-ish salary while the org builds their reputation then ramp things up after that?

Mind you if we really do only have 5-20 years before potentially dangerous GAI, maybe we have to sacrifice sustainable growth and stewardship of money at the altar of having a chance to save the world?

MathiasKB🔸

ha good point! I specifically had non-ai EA orgs in mind, could have made that clearer!

Dan H

The failure of Redwood's adversarial training project is unfortunately wholly unsurprising given almost a decade of similarly failed attempts at defenses to adversarial examples from hundreds or even thousands of ML researchers. For example, the RobustBench benchmark shows the best known robust accuracy on ImageNet is still below 50% for attacks with a barely perceptible perturbation.

The better reference class is adversarially mined examples for text models. Meta and other researchers were working on a similar projects before Redwood started doing that line of research. https://github.com/facebookresearch/anli is an example. (Reader: evaluate your model's consistency for what counts as alignment research--does this mean non-x-risk-pilled Meta researchers do some alignment research, if we believe RR project constituted exciting alignment research too?)

Separately, I haven't seen empirical demonstrations that pursuing this line of research can have limited capabilities externalities or result in differential technological progress. Robustifying models against some kinds of automatic adversarial attacks (1,2) does seem to be separable from improving general capabilities though, and I think it'd be good to have more work on that.

We recommend this article by an MIT CS professor which is partly about how creating a sustainable work culture can actually increase productivity.

This researcher's work attitude is only part of a spectrum. Many researchers find great returns working 80+ hours a week. Some labs differentiate themselves by having usual hours, but many successful labs have their members work a lot, and that works out well. For example, Dawn Song's students work a ton, and some other Berkeley grad students in other labs are intimidated by her lab's hours, but that's OK because her graduate students find that environment suitable. It'd be nice if this post was more specific about how much of the work culture discontent is about hours vs other issues.

Paul_Christiano

The better reference class is adversarially mined examples for text models. Meta and other researchers were working on a similar projects before Redwood started doing that line of research. https://github.com/facebookresearch/anli is an example

I agree that's a good reference class. I don't think Redwood's project had identical goals, and would strongly disagree with someone saying it's duplicative. But other work is certainly also relevant, and ex post I would agree that other work in the reference class is comparably helpful for alignment

Reader: evaluate your model's consistency for what counts as alignment research--does this mean non-x-risk-pilled Meta researchers do some alignment research, if we believe RR project constituted exciting alignment research too?

Of course! I'm a bit unusual amongst the EA crowd in how enthusiastic I am about "normal" robustness research, but I'm similarly unusual amongst the EA crowd in how enthusiastic I am this proposed research direction for Redwood, and I suspect those things will typically go together.

Separately, I haven't seen empirical demonstrations that pursuing this line of research can have limited capabilities externalities or result in differential technological progress.

I'm still not convinced by this perspective. I would frame the situation as:

There's a task we really want future people to be good at---finding places where models behave in obviously-undesirable ways, and understanding the limitations of such evaluations and the consequences of training on adversarial inputs.
That task isn't obviously improving automatically with model capabilities, it seems like something that requires knowledge and individual+institutional expertise.
So maybe we should practice a lot to get better at that task, sharing what we learn and building a larger community of researchers and engineers with relevant experience.

Your objection sounds like: "That may be true but there's not a lot of evidence that this doesn't also make models more capable, which would be bad." And I don't find that very persuasive---I don't think there is such a strong default presumption that generic research accelerates capabilities enough to be a meaningful cost.

On the question of what generates differential technological progress, I think I'm comparably skeptical of all of the evidence on offer for claims of the form "doing research on X leads to differential progress on Y," and the best guide we have (both in alignment and in normal academic research!) is basically common-sense arguments along the lines of "investigating and practicing doing X tends to make you better at doing X."

Dan H

I don't think Redwood's project had identical goals, and would strongly disagree with someone saying it's duplicative.

I agree it is not duplicative. It's been a while, but if I recall correctly the main difference seemed to be that they chose a task with gave them a extra nine of reliability (started with an initially easier task) and pursued it more thoroughly.

think I'm comparably skeptical of all of the evidence on offer for claims of the form "doing research on X leads to differential progress on Y,"

I think if we find that improvement of X leads to improvement on Y, then that's some evidence, but it doesn't establish that it's differential. If we find that improvement on X also leads to progress on thing Z that is highly indicative of general capabilities, then that's evidence against. If we find that it mainly affects Y but not other things Z, then that's reasonable evidence it's differential. For example, so far, transparency hasn't affected general capabilities, so I read that as evidence of differential technological progress. As another example, I think trojan defense research differentially improves our understanding our trojans; I don't see it making models better at coding or gaining new general instrumental skills.

I think commonsense is too unreliable of a guide when thinking about deep learning; deep learning findings are phenomena are often unintelligible even in hindsight (I still don't understand why some of my research papers' methods work). That's why I'd prefer empirical evidence. Empirical research claiming to differentially improve safety should demonstrate a differential safety improvement empirically.

ElizabethBarnes

In my understanding, there was another important difference in Redwood's project from the standard adversarial robustness literature: they were looking to eliminate only 'competent' failures (ie cases where the model probably 'knows' what the correct classification is), and would have counted it a success if there were still failures if the failure was due to a lack of competence on the model's part (e.g. 'his mitochondria were liberated' -> implies harm but only if you know enough biology)

I think in practice in their exact project this didn't end up being a super clear conceptual line, but at the start it was plausible to me that only focusing on competent failures made the task feasible even if the general case is impossible.

Nate Thomas

Thanks for the comment Dan. I agree that the adversarially mined examples literature is the right reference class, of which the two that you mention (Meta’s Dynabench and ANLI) were the main examples (maybe the only examples? I forget) while we were working on this project.

I’ll note that Meta’s Dynabench sentiment model (the only model of theirs that I interacted with) seemed substantially less robust than Redwood’s classifier (e.g. I was able to defeat it manually in about 10 minutes of messing around, whereas I needed the tools we made to defeat the Redwood model).

Dan H

I think the adversarial mining thing was hot in 2019. IIRC, Hellaswag and others did it; I'd venture maybe 100 papers did it before RR, but I still think it was underexplored at the time and I'm happy RR investigated it.

Fai

Thank you for the post!

I have long suspected that EA organizations in other cause areas have been put to higher standards of evaluation while getting funding (I am mainly referring to EA ones, but not only) than AI safety. I think I have slightly updated upward on the likeliness of this view being right after reading this post.

More information on the comparison I am suspecting and updating, using EA animal welfare organizations as example as I had some experience in this cause area. My suspicion is that, relative to AI safety grants animal welfare organizations receive much more scrutiny on their track records, experience of staff, work culture, etc.

Also, my observation is that in animal welfare organizations efforts to try to pay more sustainable and competitive salaries (from what are quite low levels and huge relative pay-cuts) to staff is not particularly welcome by all donors. (to be fair to the donors, some EA animal welfare organizations paying very low salaries is due to their management who refuse to pay higher). I am therefore puzzled why this kind of pressure doesn't seem to exist as much in some other EA cause areas (and why it has to exist, in its current extent, in EA animal welfare). Granted, an underlying reason AI safety organizations pay high salaries is because the salaries people who can work in AI safety organizations can get in for profits are high(er) and they are already having huge pay-cuts to work in non-profit AI safety organizations. But it does seem to me judging from the salary levels said in this post Redwood might be experiencing much less pressure to suppress salary levels, comparatively. Also notice that they also earn significantly more than their peers who work in academia, which is something that isn't generally seen in EA animal welfare.

I think I am not the only one having this kind of suspicion. At least 5 people from EA animal welfare have expressed to me their concerns, even complaints, that non-longtermist organizations are being treated unfairly relative to longtermist organizations, especially AI safety ones. According to my observation and I hope I am wrong, there seems to be some anti-longtermism/anti - AI safety sentiment flowing around in the animal welfare cause. I think this might be causing some community building problems within EA and maybe worth addressing. (Fwiw I endorse some form of longtermism and I see a connection between animal welfare and longtermism. I now work on AI's impact on animals)

Linch

I find salary pretty confusing. My current guess is that EAs are too willing to flatten salary across different counterfactuals and experience levels, rather than too unwilling. In particular, one intuitive heuristic in my head is something like "many people are willing to give up 20-50% of salary to do the right thing, but relatively few people are willing to give up >>70%."

Maybe this is wrong? I know there's empirical research that people with more money benefit less from percentage increases in their spending, so I can see why e.g. someone with a 50k salary taking a 25% paycut is similarly (or more!) costly to someone with a 300k salary taking a 70% paycut. But it's not very intuitive to me, and I'm confused why this point is not more often brought up when discussing questions of salary fairness.

billz

Hey this is Bill -- I help run Constellation. Thank you for sharing feedback.

It sounds like you’re planning to write a future post on Constellation that I imagine might have more specifics and that we will have an opportunity to engage with in advance, so maybe it makes sense to respond more then. At the risk of oversimplifying a complex topic, it’s important to Constellation’s mission that Constellation is a good place to work and talk with others, and we care a lot about the culture being welcoming and comfortable for members and visitors.

We really appreciate feedback. If anyone has feedback on Constellation that they’d feel comfortable sharing with me, you can email me at [email protected].

Omega

Hi Bill, yes your understanding is correct - we will be writing a post in the future abotu Constellation, and we will share a draft ahead of time with you / Redwood.

Neel Nanda

Minor note that an anonymous feedback form might help to elicit negative feedback here. I appreciate the openness to criticism! (I don;t have significant negative feedback, I like constellation a lot, this is just a general note)

billz

Agreed. We have a Constellation-internal anonymous form that isn’t set up well for external feedback, and I didn’t want to block on setting it up before replying.

Neel Nanda

Of Redwood’s published research, we were impressed by Redwood's interpretability in the wild paper, but would consider it to be no more impressive than progress measures for grokking via mechanistic interpretability, executed primarily by two independent researchers, or latent knowledge in language models without supervision, performed by two PhD students.[4] These examples are cherry-picked to be amongst the best of academia and independent research, but we believe this is a valid comparison because we also picked what we consider the best of Redwood's research and Redwood's funding is very high relative to other labs.

I'm flattered, but as Nuno notes I think this is a poor and somewhat unfair argument.

I think that causal scrubbing is probably on par with interpretability in the wild for good interp work, and has helped influence work at other interpretability labs (within TAIS)
Most academic interp work, in my biased opinion, is just not very good when it comes to genuinely scaling to large LLMs, or being relevant to my work (with exceptions - I think Christopher Pott's lab, David Bau's and Martin Watternberg's do good work)
Afaict, interpretability in the wild was not an organisational priority, and predominantly worked on by 3 junior staff (with advice from Jacob Steinhardt).
I personally really like interp in the wild, and it's influenced my research much more than most other interp work
As Nuno notes, I can;t see how else to spend $20M to get more good interp work (naively, I'm not claiming no such ways exist)
Redwood is mostly pursuing a different path to interp, I personally think this is less promising, but I like having a diverse range of agendas and wish more power to them

I also broadly think that publishing and engaging with the broader ML community is less obviously good for interpretability, as noted I just don't think most work is very relevant. I think it's a bet worth making (and am excited about interp in the wild and my grokking work getting into ICLR!), but definitely not obviously worth the effort, eg I think it's probably the right call that Anthropic doesn;t try to publish their work. Putting pre-prints on Arxiv seems pretty cheap, and I'm pro that, but I think seriously aiming for academic publications is a lot of work (more than 10-20% of a project IMO) and I feel pretty good about Redwood only trying for this when they have employees who are particularly excited about it.

To be clear, I totally think Redwood's output can increase substantially, that this is worthy goal, and that some of the criticism here is directionally correct and could be high value. But I think this is a claim of "Redwood are already one of the top 5 interp labs in the world by my lights (but I have a pretty low bar...), and I'd love them to be even better"

I'm commenting on the parts of this post that I most disagreed with and feel most qualified to opine upon. I broadly think this was a good post, agree with much of the criticism (though disagree with much of it too). Thanks for writing it! I hope it helps Redwood become a better org.

Omega

(written in first person because one post author wrote it)

As Nuno notes, I can;t see how else to spend $20M to get more good interp work (naively, I'm not claiming no such ways exist)

I think this is the area we disagree on the most. Examples of other ideas:

1. Generously fund the academics who you do think are doing good work (as far as I can tell, two of them -- Christopher Pott and Martin Watternberg -- get no funding from OP, and David Bau gets an order of magnitude less). This is probably more on OP than Redwood, but Redwood could also explore funding academics and working on projects in collaboration with them.

2. Poach experienced researchers who are executing well on interpretability but working on what (by Redwood's lights) are less important problems, and redirect them to more important problems. Not everyone would want to be "redirected", but there's a decent fraction of people who would love to work on more ambitious problems but are currently not incentivized to do so, and a broader range of people are open to working on a wide range of problems so long as they are interesting. I would expect these individuals to cost a comparable amount to what Redwood currently pays (somewhat less if poaching from academia, somewhat more if poaching from industry) but be able to execute more quickly as well as spread valuable expertise around the organization.

3. Make one-year seed grants of around $100k to 20 early-career researchers (PhD students, independent researchers) to work on interpretability, nudging them towards a list of problems viewed important by Redwood. Provide low-touch mentorship (e.g. once a month call). Scale up the grants and/or hire people from the projects that did well after the one-year trial.

I wouldn't confidently claim that any of these approaches would necessarily best Redwood, but there's a large space of possibilities that could be explored and largely has not been. Notably, the ideas above differ from Redwood's high-level strategy to date by: (a) making bets on a broad portfolio of agendas; (b) starting small and evaluating projects before scaling; (c) bringing in external expertise and talent.

I also broadly think that publishing and engaging with the broader ML community is less obviously good for interpretability, as noted I just don't think most work is very relevant. I think it's a bet worth making (and am excited about interp in the wild and my grokking work getting into ICLR!), but definitely not obviously worth the effort, eg I think it's probably the right call that Anthropic doesn;t try to publish their work. Putting pre-prints on Arxiv seems pretty cheap, and I'm pro that, but I think seriously aiming for academic publications is a lot of work (more than 10-20% of a project IMO) and I feel pretty good about Redwood only trying for this when they have employees who are particularly excited about it.

I think I largely agree the percentage of interpretability papers that are relevant to large-scale alignment is disappointingly low. However, the denominator is very large, so I still expect the majority of TAIS-relevant interpretability work to happen outside TAIS organizations. Given this I'd argue there's considerable value communicating to this subset of the ML research community. Perhaps a peer-reviewed publication is not the best way to do this: I'd be happy to see Redwood staff e.g. giving talks at a select subset of academic labs, but to the best of our knowledge this hasn't happened.

I agree that getting from the stage of "scrappy preprint / blog post that your close collaborators can understand" to "peer-reviewed publication" can be 10-20% of a project's time. However, in my experience the clarity of the write-up and rigor of the results often increase considerably in that 10-20%. There are some parts of the publication process that are complete wastes of time (reformatting from single to double column, running an experiment that you already know the results of but that reviewer 2 really wants to see), but in my experience these have been a minority of the work -- no more than 5% of the overall project time. I'm curious if you view this as being significantly more costly than I do, or the improvements to the project from peer-review as being less significant.

Neel Nanda

Sorry for the long + rambly comment! I appreciate the pushback, and found clarifying my thoughts on this useful

I broadly agree that all of the funding ideas you point to seem decent. My biggest crux is that the counterfactual of not funding Redwood is not that one of those gets funded, and that the real constraints here around logistical effort, grantmaker time, etc. I wrote a comment downthread with further thoughts on these points.

And that it is not Redwood's job to solve this - they're pursuing a theory of change that does not depend on these, and it seems very unreasonable to suggest that they should pursue one of these other uses of money instead, even if you think that the use of money is a great idea.

Re 1, concretely, I've been trying to help one of those professors get more funding for his lab, and think this is a high impact use of money. But think that evaluating professors is hard, thinking through capabilities externalities is hard, figuring out a lab's room for more funding is hard, it's harder to burn a ton of money productively in academia, eg >$1mn (eg, it's pretty hard to just hire a bunch of engineers, and interp doesn't really need a ton of compute). There's also dumb network problems where the academics don't know how to get funding, it's not very legible how to apply to OpenPhil, not everyone is comfortable taking EA money, etc (I would like these problems to be solved, to be clear). I don't think it's a matter of just having more money.

Poach experienced researchers who are executing well on interpretability but working on what (by Redwood's lights) are less important problems, and redirect them to more important problems. Not everyone would want to be "redirected", but there's a decent fraction of people who would love to work on more ambitious problems but are currently not incentivized to do so

I don't know anyone like this. If you do, please intro me! (I met someone vaguely in this category and helped them to get an FTX grant at the start of November.... But they only tangentially fit your description). I'm pretty unconvinced there's many people like this out there who could be redirected to productively do what I consider good interp work - beyond just motivation and interest in doing independent-ish work, there's also significant considerations of research taste, having mentorship to do work I think is important, etc.

Make one-year seed grants of around $100k to 20 early-career researchers (PhD students, independent researchers) to work on interpretability, nudging them towards a list of problems viewed important by Redwood. Provide low-touch mentorship (e.g. once a month call). Scale up the grants and/or hire people from the projects that did well after the one-year trial.

Seems good, I'd be excited about this happening. I consider my MATS scholars to be vaguely in the spirit of this, and I've been very impressed with them. But, like, this is so not bottlenecked on money. It's a substantial program that would take effort to run, it's not clear to me that these people would do good work without mentorship (1/month might be sufficient), it's not clear that this adds much value beyond existing independent researcher grants, etc. But I do think it's a decent idea - if anyone is interested in making this happen, please reach out!

However, the denominator is very large, so I still expect the majority of TAIS-relevant interpretability work to happen outside TAIS organizations

There's some work I think is cool, but it tends to be concentrated in a small handful of actually good labs (eg I like ROME and Emergent World Representations a lot). There's a bunch of work I think isn't great, but sometimes has great gems in it. But honestly I think that well over a majority of impact weighted TAIS work was done by the TAIS community (specifically, Chris Olah + collaborator's work is quite possibly a majority in my mind). I'd be interested in being pointed to work that you think is great that I'm missing - I personally find literature reviews to be pretty tedious, and think I underinvest in this kind of thing.

More broadly, my position is that engaging with academia is a theory of change, but one of many. It's a significant investment of time, some people are much better at it than others (eg, I personally just hate writing papers, and am much worse at it than just directly trying to do good research, or write blog posts/educational materials/good tooling), it's hard to direct in targeted ways, benefits a bunch from legible signalling and credentials, etc. I also think Redwood are more pessimistic on it than I am, and eg I am personally not convinced that trying to get grokking into ICLR was a good use of time and effort (though I hope it was!). I think Redwood are making a pretty reasonable bet here.

As a negative example here, I think Distill was a major investment of effort into influencing academia, including on doing better interp work, and it basically failed as far as I can tell (despite, to my eyes, Distill papers being notably higher quality and more interesting than conference papers)

I'm curious if you view this as being significantly more costly than I do, or the improvements to the project from peer-review as being less significant.

I want to distinguish two things - putting in the effort to make a write-up really good, and putting in the effort to eg get it accepted at ICLR/ICML/NeurIPS. I am pretty pro making write-ups really good (I personally am not very good at it and try to avoid it where possible, but this is a personal taste not a value judgement). Eg I really like Anthropic interp papers (though am biased) and think the effort put into presentation and clarity is pretty well spent. And I think that part of submitting to a top conference is making things tightly and clearly phrased, having good figures, making them well presented, having good evidence for your results.

IMO the biggest cost is shaping the results and narrative of your work to fit the kind of thing that reviewers look for, and think is good. I broadly think this just isn't that correlated with what good interp work looks like. I think this can be extremely expensive if you let it shape your research process, choice of projects, etc for "this would make a good publication". In cases like grokking, I did the research I wanted to do, and we then decided to go for a publication, which I think was basically fine, and much less costly. But it did involve significant reshaping and optimisation of the narrative (I am personally sad that progress measures got into the title lol).

Idk, these are complex questions, and there are people I respect who are way more or less pro academia + publishing than me. I am personally pretty biased against academia and publishing, and this affects my value judgements here.

YafahEdelman

I am sympathetic to several of the high level criticisms in this post but have a few relatively minor criticisms.

1) Redwood Funding

This post says "Redwood's funding is very high relative to other labs."
I think this is very false: OpenAI, Anthropic, and DeepMind have all recieved hundreds of millions of dollars, an order of magnitude above Redwood's funding.
This post says "Redwood’s funding is much higher than any other non-profit lab that [OpenPhil funds]."
This is false, OpenAI was a non-profit when it received 30 million dollars from OpenPhil (link to grant), 50% more than this post cites Redwood as receiving.
This post casts OP having seats on the Board of Redwood as a negative. I think that in fact, having board seats on a place you fund is pretty normal I think, and considered responsible - the lack of this by VCs was a noted failure after the FTX collapse.

2) Field Experience

The post says:

Redwood's most experienced ML researcher spent 4 years working at OpenAI prior to joining Redwood. This is comparable experience to someone straight out of a PhD program, the minimum experience level of research scientists at most major AI labs.

This does not strike me as true - modern ML Research is an extremely new field, many research scientists in it did not start out with PhDs in ML.

3) Publishing is Relative to Productivity

I think it plausible that Redwood publishes a normal amount relative to their research productivity. This post seems to agree with that. I think them publishing more, absent them doing more research, would be bad, as it would lead to them publishing lower quality research.

My impression is also that Redwood's published papers have stood out for being unusually thorough and informative about their research among ML papers.

Omega

Regarding 3) Publishing is relative to productivity, we are not entirely sure what you mean, but can try to clarify our point a little more.

We think it's plausible that Redwood's total volume of publicly available output is appropriate relative to the quantity of high-quality research they have produced. We have heard from some Redwood staff that there are important insights that have not been made publicly available outside of Redwood, but to some extent this is true of all labs, and it's difficult for us to judge without further information whether these insights would be worth staff time to write up.

The main area we are confident in suggesting Redwood change is making their output more legible to the broader ML research community. Many of their research projects, including what Redwood considers their most notable project to date -- causal scrubbing -- are only available as Alignment Forum blog posts. We believe there is significant value in writing them up more rigorously and following a standard academic format, and releasing them as arXiv preprints. We would also suggest Redwood more frequently submit their results to peer-reviewed venues, as the feedback from peer review can be valuable for honing the communication of results, but acknowledge that it is possible to effectively disseminate findings without this: e.g. many of OpenAI and Anthropic's highest-profile results were never published in a peer-reviewed venue.

Releasing arXiv preprints would have two dual benefits. First, it would make it significantly more likely to be noticed, read and cited by the broader ML community. This makes it more likely that others build upon the work and point out deficiencies in it. Second, the more structured nature of an academic paper forces a more detailed exposition, making it easier for reader's to judge, reproduce and build upon. If, for example, we compare Neel's original grokking blog post to the grokking paper, it is clear the paper is significantly more detailed and rigorous. This level of rigor may not be worth the time for every project, but we would at least expect it for an organization's flagship projects.

Omega

Field Experience

Many research scientist roles at AI research labs (e.g. DeepMind and Google Brain^[1]) expect researchers to have PhD's in ML - this would be a minimum of 5 years doing relevant research.

Not all labs have a strict requirement on ML PhD's. Many people at OpenAI and Anthropic don’t have PhD's in ML either, but often have PhD’s in related fields like Maths or Physics. There are a decent number of people at OpenAI without PhD's, (Anthropic is relatively stricter on this than OpenAI). Labs like MIRI don't require this, but they are doing more conceptual researchly, and relatively little, if any, ML research (to the best of our knowledge, they are private by default).

^{^}
Note that while we think for-profit AI labs are not the right reference class for comparing funding, we do think that all AI labs (academic, non-profit or for-profit) are the correct reference class when considering credentials for research scientists.

Neel Nanda

Fwiw, my read is that a lot of "must have an ML PhD" requirements are gatekeeping nonsense. I think you learn useful skills doing a PhD in ML, and I think you learn some skills doing a non-ML PhD (but much less that's relevant, though physics PhDs are probably notably more relevant than maths). But also that eg academia can be pretty terrible for teaching you skills like ML engineering and software engineering, lots of work in academia is pretty irrelevant in the world of the bitter lesson, and lots of PhDs have terrible mentorship.

I care about people having skills, but think that a PhD is only an OK proxy for them, and would broadly respect the skills of someone who worked at one of the top AI labs for four years straight out of undergrad notably more than someone straight out of a PhD program

I particularly think that in interpretability, lots of standard ML experience isn't that helpful, and can actively teach bad research taste and focus on pretty unhelpful problems

(I do think that Redwood should prioritise "hiring people with ML experience" more, fwiw, though I hold this opinion much more strongly around their adversarial training work than their interp work)

nostalgebraist

I care about people having skills, but think that a PhD is only an OK proxy for them, and would broadly respect the skills of someone who worked at one of the top AI labs for four years straight out of undergrad notably more than someone straight out of a PhD program

I completely agree.

I've worked in ML engineering and research for over 5 years at two companies, I have a PhD (though not in ML), and I've interviewed many candidates for ML engineering roles.

If I'm reviewing a resume and I see someone has just graduated from a PhD program (and does not have other job experience), my first thoughts are

This person might have domain experience that would be valuable in the role, but that's not a given.
This person probably knows how to do lit searches, ingest information from an academic paper at a glance, etc., which are definitely valuable skills.
This person's coding experience might consist only of low-quality, write-once-read-never rush jobs written to produce data/figures for a paper and discarded once the paper is complete.
More generally, this person might or might not adapt well to a non-academic environment.

I've never interviewed a candidate with 4 years at OpenAI on their resume, but if I had, my very first thoughts would involve things like

OpenAI might be the most accomplished AI capabilities lab in the world.
I'm interviewing for an engineering role, and OpenAI is specifically famous for moving capabilities forward through feats of software engineering (by pushing the frontier of huge distributed training runs) as opposed to just having novel ideas.
Anthropic's success at training huge models, and doing extensive novel research on them, is an indication of what former OpenAI engineers can achieve outside of OpenAI in a short amount of time.
OpenAI is not a huge organization, so I can trust that most people there are contributing a lot, i.e. I can generalize from the above points to this person's level of ability.

I dunno, I might be overrating OpenAI here?

But I think the comment in the post at least requires some elaboration, beyond just saying "many places have a PhD requirement." That's an easy way to filter candidates, but it doesn't mean people in the field literally think that PhD work is fundamentally superior to (and non-fungible with) all other forms of job experience.

Neel Nanda

I agree re PhD skillsets (though think that some fraction of people gain a lot of high value skills during a PhD, esp re research taste and agenda settings).

I think you're way overrating OpenAI though - in particular, Anthropic's early employees/founders include more than half of the GPT-3 first authors!! I think the company has become much more oriented around massive distributed LLM training runs in the last few years though, so maybe your inference that people would gain those skills is more reasonable now.

Omega

Hi Fay, Thank you for engaging with the post. We appreciate you taking the time to check the claims we make.

1) Redwood Funding

Regarding OP’s investment in OpenAI - you are correct that OpenAI received a larger amount of money. We didn’t include this because in since the grant in 2017, OpenAI transitioned to a capped for-profit. I (the author of this particular comment) was actually not aware that OpenAI had been at one point a research non-profit at one point. I wil be updating the original post to add this information in - we appreciate you flagging it.
In general, we disagree that the correct reference class for evaluating Redwood's funding is for-profit alignment labs like OpenAI, Anthropic or DeepMind because they have significantly more funding from (primarily non-EA) investors, and have different core objectives and goals. We think the correct reference class for Redwood is other TAIS labs (academic and research nonprofit) such as CHAI, CAIS, FAR AI and so on. I will add some clarification to the original post with more context.

(We will discuss the point on OP having board seats at Redwood in a separate comment)

Omega

I will be updating the original post to add this information in - we appreciate you flagging it.

Update: This has now been edited in the original post.

Superstimulus

Two of Redwood's leadership team have or have had relationships to an OP grant maker. A Redwood board member is married to a different OP grantmaker.

I'm surprised no one has commented about this yet; this seems incredibly problematic. Some things I'd like to know:

Who exactly are the people involved?
Did the timeline of these relationships line up with the timeline of the funding decisions?
Were the OP grant makers in the right position for their relationships to affect OP's decision to fund Redwood?

Omega

Hi Akash,

Thank you for sharing your thougths & those concrete action items - I agree it would be nice to have a set of recommendations in an ideal world.

This post took at least 50 hours (collectively) to write, and was delayed in publishing by a few days due to busy schedules. I think if we had more time, I would have shared the final version with a small set of non-redwood beta reviewers for comments which would have caught things like this (and e.g. Nunos' comment).

We plan to do this for future posts (if you're reading this and would like to give comments on future posts, please DM us!).

We'll consider adding an intervention section to future reports time permitting (we still think there is value in sharing our observations, as a lot of this information is not available to people without relevant networks.

(I may come back (again, time permitting) and respond to your point on Redwood having many problems to deal with at a later stage)

CharismaticIguana

We’ve heard multiple cases of people being fired after something negative happens in their life (personal, conflict at work, etc) that causes them to be temporarily less productive at work [...] where termination happened with little warning.

We do believe that providing support, enabling people to improve, and building a healthy culture is generally more productive over time, even though it may increase some costs in the short term and add some ongoing maintenance costs. We do not believe that Redwood is making a well-calculated tradeoff that is increasing its productivity, and believe that it’s instead making short-sighted decisions that contribute to burnout and a bad overall culture.

As someone who started working at Redwood Research expecting an awesome experience and unfortunately had an awful one instead mainly because of the above, I wasn't aware multiple others had similar experiences as well - though it doesn't come as a surprise. Knowing my case was not an isolated one provides me with some solace, and will certainly contribute on healing the work-related trauma I've gained from the experience.

My condolences go to others who were unfairly treated there. I wish nothing but the best to RR leadership, and I hope they become better at interacting with employees and the broader community.

Lorenzo Buonanno🔸

Moderator comment

Here's a note the moderators are starting to add on all anonymous content like this:

This was posted by an anonymous account. It's pretty easy to create an anonymous account and post things like this without corroboration. This doesn't mean that they're always false, or that the things stated here are false, but I'd recommend that people use their best judgment and wait for serious evidence and/or corroboration before seriously updating on information shared here.

The moderation team doesn't want to remove all content like this - it is in fact important to air issues like this sometimes, but it's also important that we don't naively accept everything posted by anonymous users.

Edit: moved the notice to the top of the comment, to make it clearer we're not singling out this particular comment

JakubK

an academic researcher in the Bay, who would earn around $40,000-50,000 per year, and a comparable researcher in a for-profit lab, who earns $200,000-500,000.

Totally unrelated to the purpose of the post, but is this for real? $50,000 seems absurdly low, especially since the Bay Area has a high cost of living.

Neel Nanda

Academic salaries are crazy low (which is one of my many reasons for not wanting to do a PhD lol)

Omega

Hi Jakub, these are standard rates for EECS PhD students (PhD students in other disciplines get paid less). Here are a couple as an example:

Berkeley EECS PhD students are paid $45K per year at the PhD level. (from personal acquaintances at in the Berkeley EECS program)
MIT EECS PhD students are paid ~$49.2K per year at the PhD level. (source)

JakubK

Ah ok, I thought "academic researcher" referred to professors/lecturers/postdocs, not PhD students.

Omega

(We missed this submission, apologies to the poster for not sharing this in a more timely fashion).

A male constellation Member (current or former Redwood Staff) & MLAB / REMIX program participant writes:

One thing I think was missed: the spending culture seemed a little over the top. There were some servers that had been unused racking up $10k+ bills that weren't wound down with any urgency.

Elias Schmied

Thanks for this post - I've learned things about the AI safety community that I didn't realize before. I wonder if much of the value of external criticism isn't in changing the behavior of those being criticized, but rather in explicitly stating and making into common knowledge negative factors that by default are not talked about publically as much. (Both for future projects to do things differently, and for people today to update about how to relate to the entities involved).

Dawn Drescher

Thanks for giving me some insight into an org I had previously only known by name!

Some clarifying questions:

CTO Buck Shlegeris has 3 years of software engineering experience

What do you count as software engineering experience? The linked LinkedIn profile looks like he has > 10 years of experience in the field.

Redwood leadership does not seem to be attempting to address this gap. Instead, they have terminated some of their more experienced ML research staff.

Can you confirm that Redwood really fired them as opposed to them quitting? (The first is unusual in my experience; the second very common.) You mention employees quitting in various places but because they’re anonymous, I can’t tell whether that refers to the same people. Thanks!

Omega

Hi Dawn!

What do you count as software engineering experience? The linked LinkedIn profile looks like he has > 10 years of experience in the field.

Our critique on lack of senior ML staff is focused specifically on lack of machine learning expertise (as opposed to general TAIS work). We are counting substantive software engineering experience such as his work at PayPal and TripleByte.

On the topic of general TAIS experience, I think Buck has at most 7 years experience as he joined MIRI in 2017. (It is our understanding that a decent portion of his time at MIRI was spent recruiting). That being said, years of experience is not the only measure of experience, Jacob Steinhardt comments above that he believes Buck is "a stronger researcher than most people with ML PhDs. He is weaker at empirical ML than this baseline, but very strong conceptually in ways that translate well to machine learning."

Can you confirm that Redwood really fired them as opposed to them quitting? (The first is unusual in my experience; the second very common.) You mention employees quitting in various places but because they’re anonymous, I can’t tell whether that refers to the same people. Thanks!

To our knowledge, their more experienced ML research staff were let go. We refer to different employees quitting at later stages. In an earlier draft we had named a few of them, but decided to remove the names due to anonymity concerns.

Dawn Drescher

Thanks for clarifying!

I’m still so confused about the second point, but you probably also don’t know the details of what happened there.

JakubK

We do not critique MIRI and OpenAI as there have been several conversations and critiques of these organizations (1,2,3).

None of the citations critique MIRI as far as I can tell. What critiques of MIRI did you have in mind?

Omega

We will edit this section to make it more clear, but the MIRI critique is the MIRI hyperlink - Paul Christiano's critique of Eliezer.

Omega

Update: this has now been edited in the original post.

carboniferous_umbraculum

My mind comes back to points like this very often:

We cannot help but be reminded of Frank H. Westheimer's advice to his research students: “Why spend a day in the library when you can learn the same thing by working in the laboratory for a month?"

It's an - or perhaps yet another - example of how sometimes the Bay Area/SE/entrepreneurial mindset is almost diametrically opposed to certain mindsets coming from academia and how this community is trying to balance them or get the best of both worlds (which isn't a stupid thing to try to do per se, it just seems like sometimes it's very tricky). In the spirit of the former you kinda want to move fast (and break things), but the latter wants you to remember the virtues of deliberately taking the time to demonstrate how thorough and methodical you are being (and partly so that you don't, say, squander your resources by running a foreseeably dud experiment)

kokotajlod

I feel like it was only a year or so ago that the standard critique of the AI safety community was that they were too abstract, too theoretical, that they lacked hands-on experience, lacked contact with empirical reality, etc...

Joseph Lemien

“Why spend a day in the library when you can learn the same thing by working in the laboratory for a month?"

I'm quite confused by this. Could you explain what your intended meaning is with this? It seems that the claim here is that spending a month to learn something is better than spending a day to learn something, which strikes me as very odd. Is there an implication here that working in a laboratory gives a person a better/deeper understanding, and therefore is better to the more light/superficial understanding from a library?

Omega

Hi Joseph, that quote is meant to be facetious. The scientist who originally said the quote was trying to encourage the opposite to his students - that researching before experimenting can save them time.

HenningB

Thanks for the effort you have invested to research and write up the constructive feedback. How much time did you roughly spend on this?

Seo Sanghyeon

The author commented elsewhere that it took at least 50 hours.

Omega

Yep that's right. This is probably an underestimate, but we would need to spend some time figuring it out. We've spent at least 10 hours replying to cc

rhollerith

If a permanent ban went into effect today on training ML models on anything larger than a single consumer-grade GPU card, e.g., Nvidea RTX 40 series, the work of MIRI researcher Scott Garrabrant would not be affected at all. How much of Redwood's research would stop?

This needs more specificity. Obviously for Garrabrant's work to have any effect, it will need to influence the design and deployment of an AI eventually; it's just that is his research approach is probably decades away from when it can be profitably applied to an actual deployed AI. On the other hand, any AI researcher can remain productive if denied the use of a GPU cluster for a week: for example, he or she can use the week to tidy up his or her office and do related "housekeeping" tasks. I guess what I want to know is if there is a ban on GPU clusters, how long -- weeks? months? years? -- can the median researcher at Redwood remain productive without abandoning most of his or her work up to now if there is a ban? And is there any researcher at Redwood doing work that is a lot more robust against such a ban than the median researcher at Redwood?

If you (the team that wrote this post) had the power to decide which organizations will get shut down (with immediate effect) would Redwood be one of the orgs you shut down? Assume that you had enough power that if you chose to, you could shut down all meaningful research on AI and that you could be as selective as you like about which organizations and parts (e.g., academic departments) of organizations to shut down.

Thanks in advance.

Michele Campolo

Hey, I just wanted to thank you for writing this!

I'm looking forward to reading future posts in the series; actually, I think it would be great to have series like this one for each major cause area.

Critiques of prominent AI safety labs: Redwood Research

Critiques of prominent AI safety labs: Redwood Research

Summary of our views

An Overview of Redwood Research

Criticisms and Suggestions

Lack of Senior ML Research Staff

Lack of Communication & Engagement with the ML Community

Underwhelming Research Output

Work Culture Issues

Creating an intense work culture where management sees few responsibilities towards employees

Not prioritizing creating an inclusive workplace

Conclusion

2) Field Experience

3) Publishing is Relative to Productivity