EDIT: I'm only going to answer a few more questions, due to time constraints. I might eventually come back and answer more. I still appreciate getting replies with people's thoughts on things I've written.
I'm going to do an AMA on Tuesday next week (November 19th). Below I've written a brief description of what I'm doing at the moment. Ask any questions you like; I'll respond to as many as I can on Tuesday.
Although I'm eager to discuss MIRI-related things in this AMA, my replies will represent my own views rather than MIRI's, and as a rule I won't be running my answers by anyone else at MIRI. Think of it as a relatively candid and informal Q&A session, rather than anything polished or definitive.
----
I'm a researcher at MIRI. At MIRI I divide my time roughly equally between technical work and recruitment/outreach work.
On the recruitment/outreach side, I do things like the following:
- For the AI Risk for Computer Scientists workshops (which are slightly badly named; we accept some technical people who aren't computer scientists), I handle the intake of participants, and also teach classes and lead discussions on AI risk at the workshops.
- I do most of the technical interviewing for engineering roles at MIRI.
- I manage the AI Safety Retraining Program, in which MIRI gives grants to people to study ML for three months with the goal of making it easier for them to transition into working on AI safety.
- I sometimes do weird things like going on a Slate Star Codex roadtrip, where I led a group of EAs as we travelled along the East Coast going to Slate Star Codex meetups and visiting EA groups for five days.
On the technical side, I mostly work on some of our nondisclosed-by-default technical research; this involves thinking about various kinds of math and implementing things related to the math. Because the work isn't public, there are many questions about it that I can't answer. But this is my problem, not yours; feel free to ask whatever questions you like and I'll take responsibility for choosing to answer or not.
----
Here are some things I've been thinking about recently:
- I think that the field of AI safety is growing in an awkward way. Lots of people are trying to work on it, and many of these people have pretty different pictures of what the problem is and how we should try to work on it. How should we handle this? How should you try to work in a field when at least half the "experts" are going to think that your research direction is misguided?
- The AIRCS workshops that I'm involved with contain a variety of material which attempts to help participants think about the world more effectively. I have thoughts about what's useful and not useful about rationality training.
- I have various crazy ideas about EA outreach. I think the SSC roadtrip was good; I think some EAs who work at EA orgs should consider doing "residencies" in cities without much fulltime EA presence, where they mostly do their normal job but also talk to people.
Reading through some of your blog posts and other writing, I get the impression that you put a lot of weight on how smart people seem to you. You often describe people or ideas as "smart" or "dumb," and you seem interested in finding the smartest people to talk to or bring into EA.
I am feeling a bit confused by my reactions. I think I am both a) excited by the idea of getting the "smart people" together so that they can help each other think through complicated topics and make more good things happen, but b) I feel a bit sad and left out that I am probably not one of the smart people.
Curious about your thoughts on a few things related to this... I'll put my questions as separate comments below.
2) Somewhat relatedly, there seems to be a lot of angst within EA related to intelligence / power / funding / jobs / respect / social status / etc., and I am curious if you have any interesting thoughts about that.
I feel really sad about it. I think EA should probably have a communication strategy where we say relatively simple messages like "we think talented college graduates should do X and Y", but this causes collateral damage where people who don't succeed at doing X and Y feel bad about themselves. I don't know what to do about this, except to say that I have the utmost respect in my heart for people who really want to do the right thing and are trying their best.
I don't think I have very coherent or reasoned thoughts on how we should handle this, and I try to defer to people who I trust whose judgement on these topics I think is better.
If you feel comfortable sharing: who are the people whose judgment on this topic you think is better?
1) Do you have any advice for people who want to be involved in EA, but do not think that they are smart or committed enough to be engaging at your level? Do you think there are good roles for such people in this community / movement / whatever? If so, what are those roles?
I used to expect 80,000 Hours to tell me how to have an impactful career. Recently, I've started thinking it's basically my own personal responsibility to figure it out. I think this shift has made me much happier and much more likely to have an impactful career.
80,000 Hours targets the most professionally successful people in the world. That's probably the right idea for them - giving good career advice takes a lot of time and effort, and they can't help everyone, so they should focus on the people with the most career potential.
But, unfortunately for most EAs (myself included), the nine priority career paths recommended by 80,000 Hours are some of the most difficult and competitive careers in the world. If you’re among the 99% of people who are not Google programmer / top half of Oxford / Top 30 PhD-level talented, I’d guess you have slim-to-none odds of succeeding in any of them. The advice just isn't tailored for you.
So how can the vast majority of people have an impactful career? My best answer: A lot of independent thought and planning. Your own personal brainstorming and reading and asking around and exploring, not just following stoc... (read more)
Hi Aidan,
I’m Brenton from 80,000 Hours - thanks for writing this up! It seems really important that people don’t think of us as “tell[ing] them how to have an impactful career”. It sounds absolutely right to me that having a high impact career requires “a lot of independent thought and planning” - career advice can’t be universally applied.
I did have a few thoughts, which you could consider incorporating if you end up making a top level post. The most substantive two are:
Many of the priority paths are broader than you might be thinking:
Most people won’t be able to step into an especially high impact role directly out of undergrad, so unsurprisingly, many of the priority paths require people to build up career capital before they can get into high impact positions. We’d think of people who are building up career capital focused on (say) AI policy as being ‘on ... (read more)
I think this comment is really lovely, and a very timely message. I'd support it being turned into a top-level post so more people can see it, especially if you have anything more to add.
Seconded.
Thank you both very much, I will do that, and I almost definitely wouldn't have without your encouragement.
If anyone has more thoughts on the topic, please comment or reach out to me, I'd love to incorporate them into the top-level post.
I think similar areas were covered in these two posts as well 80,000 Hours - how to read our advice and Thoughts on 80,000 Hours’ research that might help with job-search frustrations.
I agree this is a very helpful comment. I would add: these roles in my view are not *lesser* in any sense, for a range of reasons and I would encourage people not to think of them in those terms.
- You might have a bigger impact on the margins being the only - or one of the first few - people thinking in EA terms in a philanthropic foundation than by adding to the pool of excellence at OpenPhil. This goes for any role that involves influencing how resources are allocated - which is a LOT, in charity, government, industry, academic foundations etc.
- You may not be in the presidential cabinet, or a spad to the UK prime minister, but those people are supported and enabled by people building up the resources, capacity, overton window expansion elsewhere in government and civil service. The 'senior person' on their own may not be able to achieve purchase with key policy ideas and influence.
- A lot of xrisk research, from biosecurity to climate change, draws on and depends on a huge body of work on biology, public policy, climate science, renewable energy, insulation in homes, and much more. Often there are gaps in research on extreme scenarios due to lack of incentives for this kind
... (read more)I also like the analogy, let's run with it. Suppose I'm reasoning from the point of view of the movement as a whole, and we're trying to put together a soccer team. Suppose also that there are two types of positions, midfield and striker. I'm not sure if this is true for strikers in what I would call soccer, but suppose the striker has a higher skillcap than midfield.[1] I'll define skillcap as the amount of skill with the position before the returns begin to diminish.
Where skill is some product of standard deviation of innate skill and hours practiced.
Back to the problem of putting together a soccer team, if you're starting with a bunch of players of unknown innate skill, you would get a higher expected value to tell 80% of your players to train to be strikers, and 20% to be midfielders. Because you have a smaller pool, your midfielders will have less innate talent for the position. You can afford to lose this however, as the effect will be small compared to the gain in the increased performance of the strikers.
That's not to say that you should fill your entire team with wannabe strikers. When you select your team you'll undoubtedly leave out some very dedicated strikers in favor
... (read more)I really enjoy the extent to which you've both taken the ball and run with it ;)
I think a lot of this is right and important, but I especially love:
We're all doing the best we can with the privileges we were blessed with.
"Do you have any advice for people who want to be involved in EA, but do not think that they are smart or committed enough to be engaging at your level?"--I just want to say that I wouldn't have phrased it quite like that.
One role that I've been excited about recently is making local groups be good. I think that having better local EA communities might be really helpful for outreach, and lots of different people can do great work with this.
4) You seem like you have had a natural strong critical thinking streak since you were quite young (e.g., you talk about thinking that various mainstream ideas were dumb). Any unique advice for how to develop this skill in people who do not have it naturally?
For the record, I think that I had mediocre judgement in the past and did not reliably believe true things, and I sometimes had made really foolish decisions. I think my experience is mostly that I felt extremely alienated from society, which meant that I looked more critically on many common beliefs than most people do. This meant I was weird in lots of ways, many of which were bad and some of which were good. And in some cases this meant that I believed some weird things that feel like easy wins, eg by thinking that people were absurdly callous about causing animal suffering.
My judgement improved a lot from spending a lot of time in places with people with good judgement who I could learn from, eg Stanford EA, Triplebyte, the more general EA and rationalist community, and now MIRI.
I feel pretty unqualified to give advice on critical thinking, but here are some possible ideas, which probably aren't actually good:
- Try to learn simple models of the world and practice applying them to claims you hear, and then being confused when they don't match. Eg learn introductory microeconomics and then whenever you hear a claim about the world that intro micro has an opinion on, try t
... (read more)3) I've seen several places where you criticize fellow EAs for their lack of engagement or critical thinking. For example, three years ago, you wrote:
Do you think this has improved at all? And what are the current things that you are annoyed most EAs do not seem to know or engage with?
I no longer feel annoyed about this. I'm not quite sure why. Part of it is probably that I'm a lot more sympathetic when EAs don't know things about AI safety than global poverty, because learning about AI safety seems much harder, and I think I hear relatively more discussion of AI safety now compared to three years ago.
One hypothesis is that 80000 Hours has made various EA ideas more accessible and well-known within the community, via their podcast and maybe their articles.
What evidence would persuade you that further work on AI safety is unnecessary?
I’m going to instead answer the question “What evidence would persuade you that further work on AI safety is low value compared to other things?”
Note that a lot of my beliefs here disagree substantially with my coworkers.
I’m going to split the answer into two steps: what situations could we be in such that I thought we should deprioritize AI safety work, and for each of those, what could I learn that would persuade me we were in them.
Situations in which AI safety work looks much less valuable:
- We’ve already built superintelligence, in which case the problem is moot
- Seems like this would be pretty obvious if it happened
- We have clear plans for how to align AI that work even when it’s superintelligent, and we don’t think that we need to do more work in order to make these plans more competitive or easier for leading AGI projects to adopt.
- What would persuade me of this:
- I’m not sure what evidence would be required for me to be inside-view persuaded of this. I find it kind of hard to be inside view persuaded, for the same reason that I find it hard to imagine being persuaded that an operating system is secure.
- But I can imagine what it
... (read more)Thanks, that's really interesting! I was especially surprised by "If I thought there was a <30% chance of AGI within 50 years, I'd probably not be working on AI safety."
Yeah, I think that a lot of EAs working on AI safety feel similarly to me about this.
I expect the world to change pretty radically over the next 100 years, and I probably want to work on the radical change that's going to matter first. So compared to the average educated American I have shorter AI timelines but also shorter timelines to the world becoming radically different for other reasons.
I find these statements surprising, and would be keen to hear more about this from you. I suppose that the latter goes a long way towards explaining the former. Personally, there are few technologies that I think are likely to radically change the world within the next 100 years (assuming that your definition of radical is similar to mine). Maybe the only ones that would really qualify are bioengineering and nanotech. Even in those fields, though, I expect the pace of change to be fairly slow if AI isn't heavily involved.
(For reference, while I assign more than 30% credence to AGI within 50 years, it's not that much more).
Most of them are related to AI alignment problems, but it's possible that I should work specifically on them rather than other parts of AI alignment.
An s-risk could occur via a moral failure, which could happen even if we knew how to align our AIs.
In the 80k podcast episode with Hilary Greaves she talks about decision theory and says:
I understand from that that there is little engagement of MIRI with the academia. What is more troubling for me is that it seems that the cases for the major decision theories are looked upon with skepticism from academic experts.
Do you think that is really the case? How do you respond to that? It would personally feel much better if I knew that there are some academic decision
... (read more)Yeah, this is an interesting question.
I’m not really sure what’s going on here. When I read critiques of MIRI-style decision theories (eg from Will or from Wolfgang Schwartz), I feel very unpersuaded by them. This leaves me in a situation where my inside views disagree with the views of the most obvious class of experts, which is always tricky.
- When I read those criticisms by Will MacAskill and Wolfgang Schwartz, I feel like I understand their criticisms and find them unpersuasive, as opposed to not understanding their criticisms. Also, I feel like they don’t understand some of the arguments and motivations for FDT. I feel a lot better disagreeing with experts when I think I understand their arguments and when I think I can see particular mistakes that they’re making. (It’s not obvious that this is the right epistemic strategy, for reasons well articulated by Gregory Lewis here.)
- Paul’s comments on this resolved some of my concerns here. He thinks that the disagreement is mostly about what questions decision theory should be answering. He thinks that the updateless decision theories are obviously more suitable to building AI than eg CDT or ED
... (read more)FWIW, I could probably be described as a "skeptic" of updateless decision theories; I’m pretty sympathetic to CDT. But I also don’t think we should build AI systems that consistently take the actions recommended by CDT. I know at least a few other people who favor CDT, but again (although small sample size) I don’t think any of them advocate for designing AI systems that consistently act in accordance with CDT.
I think the main thing that’s going on here is that academic decision theorists are primarily interested in normative principles. They’re mostly asking the question: “What criterion determines whether or not a decision is ‘rational’?” For example, standard CDT claims that an action is rational only if it’s the action that can be expected to cause the largest increase in value.
On the other hand, AI safety researchers seem to be mainly interested in a different question: “What sort of algorithm would it be rational for us to build into an AI system?” The first question doesn’t
... (read more)The comments here have been very ecumenical, but I'd like to propose a different account of the philosophy/AI divide on decision theory:
1. "What makes a decision 'good' if the decision happens inside an AI?" and "What makes a decision 'good' if the decision happens inside a brain?" aren't orthogonal questions, or even all that different; they're two different ways of posing the same question.
MIRI's AI work is properly thought of as part of the "success-first decision theory" approach in academic decision theory, described by Greene (2018) (who also cites past proponents of this way of doing decision theory):
... (read more)I actually agree with you about this. I have in mind a different distinction, although I might not be explaining it well.
Here’s another go:
Let’s suppose that some decisions are rational and others aren’t. We can then ask: What is it that makes a decision rational? What are the necessary and/or sufficient conditions? I think that this is the question that philosophers are typically trying to answer. The phrase “decision theory” in this context typically refers to a claim about necessary and/or sufficient conditions for a decision being rational. To use different jargon, in this context a “decision theory” refers to a proposed “criterion of rightness.”
When philosophers talk about “CDT,” for example, they are typically talking about a proposed criterion of rightness. Specifically, in this context, “CDT” is the claim that a decision is rational only if taking it would cause the largest expected increase in value. To avoid any ambig
... (read more)I agree that these three distinctions are important:
- "Picking policies based on whether they satisfy a criterion X" vs. "Picking policies that happen to satisfy a criterion X". (E.g., trying to pick a utilitarian policy vs. unintentionally behaving utilitarianly while trying to do something else.)
- "Trying to follow a decision rule Y 'directly' or 'on the object level'" vs. "Trying to follow a decision rule Y by following some other decision rule Z that you think satisfies Y". (E.g., trying to naïvely follow utilitarianism without any assistance from sub-rules, heuristics, or self-modifications, vs. trying to follow utilitarianism by following other rules or mental habits you've come up with that you expected to make you better at selecting utilitarianism-endorsed actions.)
- "A decision rule that prescribes outputting some action or policy and doesn't care how you do it" vs. "A decision rule that prescribes following a particular set of cognitive steps that will then output some action or policy". (E.g., a rule that says 'maximize the aggregate welfare of moral patients' vs. a specif
... (read more)By triggering the bomb, you're making things worse from your current perspective, but making things better from the perspective of earlier you. Doesn't that seem strange and deserving of an explanation? The explanation from a UDT perspective is that by updating upon observing the bomb, you actually changed your utility function. You used to care about both the possible worlds where you end up seeing a bomb in the box, and the worlds where you don't. After updating, you think you're either a simulation within Omega's prediction so your action has no effect on yourself or you're in the world with a real bomb, and you no longer care about the version of you in the world with a million dollars in the box, and this accounts for the conflict/inconsistency.
Giving the human tendency to change our (UDT-)utility functions by updating, it's not clear what to do (or what is right), and I think this reduces UDT's intuitive appeal and makes it less of a slam-dunk over CDT/EDT. But it seems to me that it takes switching to the UDT perspective to even understand the nature of the problem. (Quite possibly this isn't adequately explained in MIRI's decision theory papers.)
For more on this divide/points of disagreement, see Will MacAskill's essay on the alignment forum (with responses from MIRI researchers and others)
https://www.alignmentforum.org/posts/ySLYSsNeFL5CoAQzN/a-critique-of-functional-decision-theory
and previously, Wolfgang Schwartz's review of Functional Decision Theory
https://www.umsu.de/wo/2018/688
(with some Lesswrong discussion here: https://www.lesswrong.com/posts/BtN6My9bSvYrNw48h/open-thread-january-2019#WocbPJvTmZcA2sKR6)
I'd also be interested in Buck's perspectives on this topic.
See also Paul Christiano's take: https://www.lesswrong.com/posts/n6wajkE3Tpfn6sd5j/christiano-decision-theory-excerpt
Back in July, you held an in-person Q&A at REACH and said "There are a bunch of things about AI alignment which I think are pretty important but which aren’t written up online very well. One thing I hope to do at this Q&A is try saying these things to people and see whether people think they make sense." Could you say more about what these important things are, and what was discussed at the Q&A?
I don’t really remember what was discussed at the Q&A, but I can try to name important things about AI safety which I think aren’t as well known as they should be. Here are some:
----
I think the ideas described in the paper Risks from Learned Optimization are extremely important; they’re less underrated now that the paper has been released, but I still wish that more people who are interested in AI safety understood those ideas better. In particular, the distinction between inner and outer alignment makes my concerns about aligning powerful ML systems much crisper.
----
On a meta note: Different people who work on AI alignment have radically different pictures of what the development of AI will look like, what the alignment problem is, and what solutions might look like.
----
Compared to people who are relatively new to the field, skilled and experienced AI safety researchers seem to have a much more holistic and much more concrete mindset when they’re talking about plans to align AGI.
For example, here are some of my beliefs about AI alignment (none of which are original ideas of mine):
--
I think it’s pretty plausible that meta-learning systems are ... (read more)
+1, this is the thing that surprised me most when I got into the field. I think helping increase common knowledge and agreement on the big picture of safety should be a major priority for people in the field (and it's something I'm putting a lot of effort into, so send me an email at richardcngo@gmail.com if you want to discuss this).
Also +1 on this.
Suppose you find out that Buck-in-2040 thinks that the work you're currently doing is a big mistake (which should have been clear to you, now). What are your best guesses about what his reasons are?
I think of myself as making a lot of gambles with my career choices. And I suspect that regardless of which way the propositions turn out, I'll have an inclination to think that I was an idiot for not realizing them sooner. For example, I often have both the following thoughts:
But even if it feels obvious in hindsight, it sure doesn't feel obvious now.
So I have big gambles that I'm making, which might turn out to be wrong, but which feel now like they will have been reasonable-in-hindsight gambles either way. The main two such gambles are thinking AI alignment might be really important in the next couple decades and working on MIRI's approaches to AI alignment instead of some other approach.
When I ask myself "what things have I not really considered as much ... (read more)
How much do you worry that MIRI's default non-disclosure policy is going to hinder MIRI's ability to do good research, because it won't be able to get as much external criticism?
I worry very little about losing the opportunity to get external criticism from people who wouldn't engage very deeply with our work if they did have access to it. I worry more about us doing worse research because it's harder for extremely engaged outsiders to contribute to our work.
A few years ago, Holden had a great post where he wrote:
... (read more)
In November 2018 you said "we want to hire as many people as engineers as possible; this would be dozens if we could, but it's hard to hire, so we'll more likely end up hiring more like ten over the next year". As far as I can tell, MIRI has hired 2 engineers (Edward Kmett and James Payor) since you wrote that comment. Can you comment on the discrepancy? Did hiring turn out to be much more difficult than expected? Are there not enough good engineers looking to be hired? Are there a bunch of engineers who aren't on the team page/haven't been announced yet?
(This is true of all my answers but feels particularly relevant for this one: I’m speaking just for myself, not for MIRI as a whole)
We’ve actually made around five engineer hires since then; we’ll announce some of them in a few weeks. So I was off by a factor of two.
Before you read my more detailed thoughts: please don’t read the below and then get put off from applying to MIRI. I think that many people who are in fact good MIRI fits might not realize they’re good fits. If you’re unsure whether it’s worth your time to apply to MIRI, you can email me at buck@intelligence.org and I’ll (eventually) reply telling you whether I think you might plausibly be a fit. Even if it doesn't go further than that, there is great honor in applying to jobs from which you get rejected, and I feel warmly towards almost everyone I reject.
With that said, here are some of my thoughts on the discrepancy between my prediction and how much we’ve hired:
- Since I started doing recruiting work for MIRI in late 2017, I’ve updated towards thinking that we need to be pickier with the technical caliber of engineering hires than I originally t
... (read more)What's the biggest misconception people have about current technical AI alignment work? What's the biggest misconception people have about MIRI?
How should talented EA software engineers best put their skills to use?
The obvious answer is “by working on important things at orgs which need software engineers”. To name specific examples that are somewhat biased towards the orgs I know well:
I have two main thoughts on how talented software ... (read more)
As an appendix to the above, some of my best learning experiences as a programmer were the following (starting from when I started programming properly as a freshman in 2012). (Many of these aren’t that objectively hard (and would fit in well as projects in a CS undergrad course); they were much harder for me because I didn’t have the structure of a university course to tell me what design decisions were reasonable and when I was going down blind alleys. I think that this difficulty created some great learning experiences for me.)
- I translated the proof of equivalence between regular expressions and finite state machines from “Introduction to Automata Theory, Languages, and Computation” into Haskell.
- I wrote a program which would take a graph describing a circuit built from resistors and batteries and then solve for the currents and potential drops.
- I wrote a GUI for a certain subset of physics problems; this involved a lot of deconfusion-style thinking as well as learning how to write GUIs.
- I went to App Academy and learned to write full stack web applications.
- I wrote a compiler from C to assembly in Scala. It took a long time for me to figure out that I sh
... (read more)(Notably, the other things you might work on if you weren't at MIRI seem largely to be non-software-related)
I hadn't actually noticed that.
One factor here is that a lot of AI safety research seems to need ML expertise, which is one of my least favorite types of CS/engineering.
Another is that compared to many EAs I think I have a comparative advantage at roles which require technical knowledge but not doing technical research day-to-day.
I'm emphasizing strategy 1 because I think that there are EA jobs for software engineers where the skill ceiling is extremely high, so if you're really good it's still worth it for you to try to become much better. For example, AI safety research needs really great engineers at AI safety research orgs.
In your experience, what are the main reasons good people choose not to do AI alignment research after getting close to the field (at any org)? And on the other side, what are the main things that actually make the difference for them positively deciding to do AI alignment research?
The most common reason that someone who I would be excited to work with at MIRI chooses not to work on AI alignment is that they decide to work on some other important thing instead, eg other x-risk or other EA stuff.
But here are some anonymized recent stories of talented people who decided to do non-EA work instead of taking opportunities to do important technical work related to x-risk (for context, I think all of these people are more technically competent than me):
- One was very comfortable in a cushy, highly paid job which they already had, and thought it would be too inconvenient to move to an EA job (which would have also been highly paid).
- One felt that AGI timelines are probably relatively long (eg they thought that the probability of AGI in the next 30 years felt pretty small to them), which made AI safety feel not very urgent. So they decided to take an opportunity which they thought would be really fun and exciting, rather than working at MIRI, which they thought would be less of a good fit for a particular skill set which they'd been developing for years; this person thinks that they might come back and work on x-risk after they've had another job for a few year
... (read more)In "Ways I've changed my mind about effective altruism over the past year" you write:
I am not sure if you still feel this way, but this makes me wonder what the current conversations are about with other people at EA orgs. Could you give some examples of important understandings or new ideas you have gained from such conversations in the last, say, 3 months?
I still feel this way, and I've been trying to think of ways to reduce this problem. I think the AIRCS workshops help a bit, I think that my SSC trip was helpful and EA residencies might be helpful.
A few helpful conversations that I've had recently with people who are strongly connected to the professional EA community, which I think would be harder to have without information gained from these strong connections:
How much do you agree with the two stories laid out in Paul Christiano's post What Failure Looks Like?
I think of hard takeoff as meaning that AI systems suddenly control much more resources. (Paul suggests the definition of "there is a one year doubling of the world economy before there's been a four year doubling".)
Unless I'm very mistaken, the point Paul is making here is that if you have a world where AI systems in aggregate gradually become more powerful, there might come a turning point where the systems suddenly stop being controlled by humans. By analogy, imagine a country where the military wants to stage a coup against the president, and their power increases gradually day by day, until one day they decide they have enough power to stage the coup. The power wielded by the military increased continuously and gradually, but the amount of control of the situation wielded by the president at some point suddenly falls.
What EY is doing now? Is he coding, writing fiction or new book, working on math foundations, providing general leadership?
Not sure why the initials are only provided. For the sake of clarity to other readers, EY = Eliezer Yudkowsky.
Meta: A big thank you to Buck for doing this and putting so much effort into it! This was very interesting and will hopefully encourage more dissemination of knowledge and opinions publicly
I thought this was great. Thanks, Buck
It was a good time; I appreciate all the thoughtful questions.
+1. So good to see stuff like this
+2 helpful and thoughtful answers; really appreciate the time put in.
My sense of the current general landscape of AI Safety is: various groups of people pursuing quite different research agendas, and not very many explicit and written-up arguments for why these groups think their agenda is a priority (a notable exception is Paul's argument for working on prosaic alignment). Does this sound right? If so, why has this dynamic emerged and should we be concerned about it? If not, then I'm curious about why I developed this picture.
I think the picture is somewhat correct, and we surprisingly should not be too concerned about the dynamic.
My model for this is:
1) there are some hard and somewhat nebulous problems "in the world"
2) people try to formalize them using various intuitions/framings/kinds of math; also using some "very deep priors"
3) the resulting agendas look at the surface level extremely different, and create the impression you have
but actually
4) if you understand multiple agendas deep enough, you get a sense
Overall, given our current state of knowledge, I think running these multiple efforts in parallel is a better approach with higher chance of success that an idea that we should invest a lot in resolving disagreements/prioritizing, and everyone should work on the "best agenda".
This seems to go against some core EA heuristic ("compare the options, take the best") but actually is more in line with what rational allocation of resources in the face of uncertainty.
Thanks for the reply! Could you give examples of:
a) two agendas that seem to be "reflecting" the same underlying problem despite appearing very different superficially?
b) a "deep prior" that you think some agenda is (partially) based on, and how you would go about working out how deep it is?
Sure
a)
For example, CAIS and something like "classical superintelligence in a box picture" disagree a lot on the surface level. However, if you look deeper, you will find many similar problems. Simple to explain example: problem of manipulating the operator - which has (in my view) some "hard core" involving both math and philosophy, where you want the AI to somehow communicate with humans in a way which at the same time allows a) the human to learn from the AI if the AI knows something about the world b) the operator's values are not "overwritten" by the AI c) you don't want to prohibit moral progress. In CAIS language this is connected to so called manipulative services.
Or: one of the biggest hits of past year is the mesa-optimisation paper. However, if you are familiar with prior work, you will notice many of the proposed solutions with mesa-optimisers are similar/same solutions as previously proposed for so called 'daemons' or 'misaligned subagents'. This is because the problems partially overlap (the mesa-optimisation framing is more clear and makes a stronger case for "this is what to expect by default"). ... (read more)
Not Buck, but one possibility is that people pursuing different high-level agendas have different intuitions about what's valuable, and those kind of disagreements are relatively difficult to resolve, and the best way to resolve them is to gather more "object-level" data.
Maybe people have already spent a fair amount of time having in-person discussions trying to resolve their disagreements, and haven't made progress, and this discourages them from writing up their thoughts because they think it won't be a good use of time. However, this line of reasoning might be mistaken -- it seems plausible to me that people entering the field of AI safety are relatively impartial judges of which intuitions do/don't seem valid, and the question of where new people in the field of AI safety should focus is an important one, and having more public disagreement would improve human capital allocation.
I think your sense is correct. I think that plenty of people have short docs on why their approach is good; I think basically no-one has long docs engaging thoroughly with the criticisms of their paths (I don't think Paul's published arguments defending his perspective count as complete; Paul has arguments that I hear him make in person that I haven't seen written up.)
My guess is that it's developed because various groups decided that it was pretty unlikely that they were going to be able to convince other groups of their work, and so they decided to just go their own ways. This is exacerbated by the fact that several AI safety groups have beliefs which are based on arguments which they're reluctant to share with each other.
(I was having a conversation with an AI safety researcher at a different org recently, and they couldn't tell me about some things that they knew from their job, and I couldn't tell them about things from my job. We were reflecting on the situation, and then one of us proposed the metaphor that we're like two people who were sliding on ice next to each other and then pushed away and have now chosen our paths and can't... (read more)
FWIW, it's not clear to me that AI alignment folks with different agendas have put less effort into (or have made less progress on) understanding the motivations for other agendas than is typical in other somewhat-analogous fields. Like, MIRI leadership and Paul have put >25 (and maybe >100, over the years?) hours into arguing about merits of their differing agendas (in person, on the web, in GDocs comments), and my impression is that central participants to those conversations (e.g. Paul, Eliezer, Nate) can pass the others' ideological Turing tests reasonably well on a fair number of sub-questions and down 1-3 levels of "depth" (depending on the sub-question), and that might be more effort and better ITT performance than is typical for "research agenda motivation disagreements" in small niche fields that are comparable on some other dimensions.
This is the goal, but it's unclear that it's having much of an effect. I feel like I relatively often have conversations with AI safety researchers where I mention something I highlighted in the newsletter, and the other person hasn't heard of it, or has a very superficial / wrong understanding of it (one that I think would be corrected by reading just the summary in the newsletter).
This is very anecdotal; even if there are times when I talk to people and they do know the paper that I'm talking about because of the newsletter, I probably wouldn't notice / learn that fact.
(In contrast, junior researchers are often more informed than I would expect, at least about the landscape, even if not the underlying reasons / arguments.)
The 2017 MIRI fundraiser post says "We plan to say more in the future about the criteria for strategically adequate projects in 7a" and also "A number of the points above require further explanation and motivation, and we’ll be providing more details on our view of the strategic landscape in the near future". As far as I can tell, MIRI hasn't published any further explanation of this strategic plan. Is MIRI still planning to say more about its strategic plan in the near future, and if so, is there a concrete timeframe (e.g. "in a few months", "in a year", "in two years") for publishing such an explanation?
(Note: I asked this question a while ago on LessWrong.)
AI takeoff: continuous or discontinuous?
I don’t know. When I try to make fake mathematical models of how AI progress works, they mostly come out looking pretty continuous. And AI Impacts has successfully pushed my intuitions in a slow takeoff direction, by exhaustively cataloging all the technologies which didn't seem to have discontinuous jumps in efficiency. But on the other hand it sometimes feels like there’s something that has to “click” before you can have your systems being smart in some important way; this pushes me towards a discontinuous model. Overall I feel very confused.
When I talk to Paul Christiano about takeoff I feel persuaded by his arguments for slow takeoff, when I talk to many MIRI people I feel somewhat persuaded by their arguments for fast takeoff.
Do you have any thoughts on Qualia Research Institute?
I feel pretty skeptical of their work and their judgement.
I am very unpersuaded by their Symmetry Theory of Valence, which I think is summarized by “Given a mathematical object isomorphic to the qualia of a system, the mathematical property which corresponds to how pleasant it is to be that system is that object’s symmetry“.
I think of valence as the kind of thing which is probably encoded into human brains by a bunch of complicated interconnected mechanisms rather than by something which seems simple from the perspective of an fMRI-equipped observer, so I feel very skeptical of this. Even if it was true about human brains, I’d be extremely surprised if the only possible way to build a conscious goal-directed learning system involved some kind of symmetrical property in the brain state, so this would feel like a weird contingent fact about humans rather than something general about consciousness.
And I’m skeptical of their judgement for reasons like the following. Michael Johnson, the ED of QRI, wrote:
... (read more)Buck- for an internal counterpoint you may want to discuss QRI's research with Vaniver. We had a good chat about what we're doing at the Boston SSC meetup, and Romeo attended a MIRI retreat earlier in the summer and had some good conversations with him there also.
To put a bit of a point on this, I find the "crank philosophy" frame a bit questionable if you're using only thin-slice outside view and not following what we're doing. Probably, one could use similar heuristics to pattern-match MIRI as "crank philosophy" also (probably, many people have already done exactly this to MIRI, unfortunately).
FWIW I agree with Buck's criticisms of the Symmetry Theory of Valence (both content and meta) and also think that some other ideas QRI are interested in are interesting. Our conversation on the road trip was (I think) my introduction to Connectome Specific Harmonic Waves (CSHW), for example, and that seemed promising to think about.
I vaguely recall us managing to operationalize a disagreement, let me see if I can reconstruct it:
... (read more)Most things that look crankish are crankish.
I think that MIRI looks kind of crankish from the outside, and this should indeed make people initially more skeptical of us. I think that we have a few other external markers of legitimacy now, such as the fact that MIRI people were thinking and writing about AI safety from the early 2000s and many smart people have now been persuaded that this is indeed an issue to be concerned with. (It's not totally obvious to me that these markers of legitimacy mean that anyone should take us seriously on the question "what AI safety research is promising".) When I first ran across MIRI, I was kind of skeptical because of the signs of crankery; I updated towards them substantially because I found their arguments and ideas compelling, and people whose judgement I respected also found them compelling.
I think that the signs of crankery in QRI are somewhat worse than 2008 MIRI's signs of crankery.
I also think that I'm somewhat qualified to assess QRI's work (as someone who's spent ~100 paid hours thinking about philosophy of mind in the last few years), and when I look at it, I think it looks pretty crankish and wrong.
QRI is tackling a very difficult problem, as is MIRI. It took many, many years for MIRI to gather external markers of legitimacy. My inside view is that QRI is on the path of gaining said markers; for people paying attention to what we're doing, I think there's enough of a vector right now to judge us positively. I think these markers will be obvious from the 'outside view' within a short number of years.
But even without these markers, I'd poke at your position from a couple angles:
I. Object-level criticism is best
First, I don't see evidence you've engaged with our work beyond very simple pattern-matching. You note that "I also think that I'm somewhat qualified to assess QRI's work (as someone who's spent ~100 paid hours thinking about philosophy of mind in the last few years), and when I look at it, I think it looks pretty crankish and wrong." But *what* looks wrong? Obviously doing something new will pattern-match to crankish, regardless of whether it is crankish, so in terms of your rationale-as-stated, I don't put too much stock in your pattern detection (and perhaps you shouldn't either). If we want to avoid... (read more)
For a fuller context, here is my reply to Buck's skepticism about the 80% number during our back-and-forth on Facebook -- as a specific comment, the number is loosely held, more of a conversation-starter than anything else. As a general comment I'm skeptical of publicly passing judgment on my judgment based on one offhand (and unanswered- it was not engaged with) comment on Facebook. Happy to discuss details in a context we'll actually talk to each other. :)
--------------my reply from the Facebook thread a few weeks back--------------
I think the probability question is an interesting one-- one frame is asking what is the leading alternative to STV?
At its core, STV assumes that if we have a mathematical representation of an experience, the symmetry of this object will correspond to how pleasant the experience is. The latest addition to this (what we're calling 'CDNS') assumes that consonance under Selen Atasoy's harmonic analysis of brain activity (connectome-specific harmonic waves, CSHW) is a good proxy for this in humans. This makes relatively clear predictions across all human states and could fairly easily be extended to non-human animals, in... (read more)
Mike, while I appreciate the empirical predictions of the symmetry theory of valence, I have a deeper problem with QRI philosophy, and it makes me skeptical even if the predictions come to bear.
In physics, there are two distinctions we can make about our theories:
The classic Many Worlds vs. Copenhagen is a dispute of the second kind, at least until someone can create an experiment which distinguishes the two. Another example of the second type of dispute is special relativity vs. Lorentz ether theory.
Typically, philosophers of science and most people who follow Lesswrong philosophy, will say that the way to resolve disputes of the second kind is to find out which interpretation is simplest. That's one reason why most people follow Einstein's special relativity over the Lorentz ether theory.
However, simplicity of an interpretation is often hard to measure. It's made more complicated for two reasons,
- First, there's no formal way of measuring simplicity even in principle in a way that is language independent.
- Second, there are ontological disputes about what type of the
... (read more)I think it would be worthwhile to separate these out from the text, and (especially) to generate predictions that are crisp, distinctive, and can be resolved in the near term. The QRI questions on metaculus are admirably crisp (and fairly near term), but not distinctive (they are about whether certain drugs will be licensed for certain conditions - or whether evidence will emerge supporting drug X for condition Y, which offer very limited evidence for QRI's wider account 'either way').
This is somewhat more promising from your most recent post:
This is crisp, plausibly distinctive, yet resolving this requires a lot of neuroimaging work which (presumably) won't be conducted anytime soon. In the interim, there isn't much to persuade a sceptical prior.
How would you describe the general motivation behind MIRI's research approach? If you feel you don't want to answer that, feel free to restrict this specifically to the agent foundations work.
I’m speaking very much for myself and not for MIRI here. But, here goes (this is pretty similar to the view described here):
If we build AI systems out of business-as-usual ML, we’re going to end up with systems probably trained with some kind of meta learning (as described in Risks from Learned Optimization) and they’re going to be completely uninterpretable and we’re not going to be able to fix the inner alignment. And by default our ML systems won’t be able to handle the strain of doing radical self-improvement, and they’ll accidentally allow their goals to shift as they self-improve (in the same way that if you tried to make a physicist by giving a ten year old access to a whole bunch of crazy mind altering/enhancing drugs and the ability to do brain surgery on themselves, you might have unstable results). We can’t fix this with things like ML transparency or adversarial training or ML robustness. The only hope of building aligned really-powerful-AI-systems is having a much clearer picture of what we’re doing when we try to build these systems.
Thanks :)
I'm hearing "the current approach will fail by default, so we need a different approach. In particular, the new approach should be clearer about the reasoning of the AI system than current approaches."
Noticeably, that's different from a positive case that sounds like "Here is such an approach and why it could work."
I'm curious how much of your thinking is currently split between the two rough possibilities below.
First:
Alternatively, second:
What do you think are the biggest mistakes that the AI Safety community is currently making?
Paul Christiano is a lot more optimistic than MIRI about whether we could align a Prosaic AGI. In a relatively recent interview with AI Impacts he said he thinks "probably most of the disagreement" about this lies in the question of "can this problem [alignment] just be solved on paper in advance" (Paul thinks there's "at least a third chance" of this, but suggests MIRI's estimate is much lower). Do you have a sense of why MIRI and Paul disagree so much on this estimate?
I think Paul is probably right about the causes of the disagreement between him and many researchers, and the summary of his beliefs in the AI Impacts interview you linked matches my impression of his beliefs about this.
What has been the causal history of you deciding that it was worth leaving your previous job to work with MIRI? Many people have a generic positive or negative view of MIRI, but it's much stronger to decide to actually work there.
Earning to give started looking worse and worse the more that I increased my respect for Open Phil; by 2017 it seemed mostly obvious that I shouldn’t earn to give. I stayed at my job for a few months longer because two prominent EAs gave me the advice to keep working at my current job, which in hindsight seems like an obvious mistake and I don’t know why they gave that advice. Then in May, MIRI advertised a software engineer internship program which I applied to; they gave me an offer, but I would have had to quit my job to take the offer, and Triplebyte (which I’d joined as the first engineer) was doing quite well and I expected that if I got another software engineering job it would have much lower pay. After a few months I decided that there were enough good things I could be doing with my time that I quit Triplebyte and started studying ML (and also doing some volunteer work for MIRI doing technical interviewing for them).
I tried to figure out whether MIRI’s directions for AI alignment were good, by reading a lot of stuff that had been written online; I did a pretty bad job of thinking about all this.
At this point MIRI offered me a full time job and ... (read more)
On the SSC roadtrip post, you say "After our trip, I'll write up a post-mortem for other people who might be interested in doing things like this in the future". Are you still planning to write this, and if so, when do you expect to publish it?
Q1: Has MIRI noticed a significant change in funding following the change in disclosure policy?
Q2: If yes to Q1, what was the direction of the change?
Q3: If yes to Q1, were you surprised by the degree of the change?
ETA:
Q4: If yes to Q3, in which direction were you surprised?
It’s not clear what effect this has had, if any. I am personally somewhat surprised by this--I would have expected more people to stop donating to us.
I asked Rob Bensinger about this; he summarized it as “We announced nondisclosed-by-default in April 2017, and we suspected that this would make fundraising harder. In fact, though, we received significantly more funding in 2017 (https://intelligence.org/2019/05/31/2018-in-review/#2018-finances), and have continued to receive strong support since then. I don't know that there's any causal relationship between those two facts; e.g., the obvious thing to look at in understanding the 2017 spike was the cryptocurrency price spike that year. And there are other factors that changed around the same time too, e.g., Colm [who works at MIRI on fundraising among other things] joining MIRI in late 2016.“