Redwood Research is a longtermist organization working on AI alignment based in Berkeley, California. We're going to do an AMA this week; we'll answer questions mostly on Wednesday and Thursday this week (6th and 7th of October). I expect to answer a bunch of questions myself; Nate Thomas and Bill Zito and perhaps other people will also be answering questions.
Here's an edited excerpt from this doc that describes our basic setup, plan, and goals.
Redwood Research is a longtermist research lab focusing on applied AI alignment. We’re led by Nate Thomas (CEO), Buck Shlegeris (CTO), and Bill Zito (COO/software engineer); our board is Nate, Paul Christiano and Holden Karnofsky. We currently have ten people on staff.
Our goal is to grow into a lab that does lots of alignment work that we think is particularly valuable and wouldn’t have happened elsewhere.
Our current approach to alignment research:
- We’re generally focused on prosaic alignment approaches.
- We expect to mostly produce value by doing applied alignment research. I think of applied alignment research as research that takes ideas for how to align systems, such as amplification or transparency, and then tries to figure out how to make them work out in practice. I expect that this kind of practical research will be a big part of making alignment succeed. See this post for a bit more about how I think about the distinction between theoretical and applied alignment work.
- We are interested in thinking about our research from an explicit perspective of wanting to align superhuman systems.
- When choosing between projects, we’ll be thinking about questions like “to what extent is this class of techniques fundamentally limited? Is this class of techniques likely to be a useful tool to have in our toolkit when we’re trying to align highly capable systems, or is it a dead end?”
- I expect us to be quite interested in doing research of the form “fix alignment problems in current models” because it seems generally healthy to engage with concrete problems, but we’ll want to carefully think through exactly which problems along these lines are worth working on and which techniques we want to improve by solving them.
We're hiring for research, engineering, and an office operations manager.
You can see our website here. Other things we've written that might be interesting:
- A description of our current project
- Some docs/posts that describe aspects of how I'm thinking about the alignment problem at the moment: The theory-practice gap. The alignment problem in different capability regimes.
We're up for answering questions about anything people are interested in.
The people we seek advice from on our research most often are Paul Christiano and Ajeya Cotra. Paul is a somewhat experienced ML researcher, who among other things led some of the applied alignment research projects that I am most excited about.
On our team, the people with the most relevant ML experience are probably Daniel Ziegler, who was involved with GPT-3 and also several OpenAI alignment research projects, and Peter Schmidt-Nielsen. Many of our other staff have research backgrounds (including publishing ML papers) that make me feel pretty optimistic about our ability to have good ML ideas and execute on the research.
I think it kind of depends on what kind of ML research you’re trying to do. I think our projects require pretty similar types of expertise to eg Learning to Summarize with Human Feedback, and I think we have pretty analogous expertise to the team that did that research (and we’re advised by Paul, who led it).
I think that there are particular types of research that would be hard for us to do, due to not having certain types of expertise.
I think that a lot of the research we are most interested in doing is not super bottlenecked on having the top ML researchers, in the same way that Learning to Summarize with Human Feedback doesn’t seem super bottlenecked on having the top ML researchers. I feel like the expertise we end up needing is some mixture of ML stuff like “how do we go about getting this transformer to do better on this classification task”, reasoning about the analogy to the AGI alignment problem, and lots of random stuff like making decisions about how to give feedback to our labellers.
I don’t feel very concerned about this; in my experience, ML researchers are usually pretty willing to consider research on its merits, and we have had good interactions with people from various AI labs about our research.