Derek Shiller

Philosophy Researcher @ Rethink Priorities
1006 karmaJoined Derekshiller.com


I think it is valuable to have this stuff on record. If it isn't recorded anywhere, then anyone who wants to reference this position in another academic work -- even if it is the consensus within a field -- is left presenting it in a way that makes it look like their personal opinion.

Thanks for recording these thoughts!

Here are a few responses to the criticisms.

I think RP underrates the extent to which their default values will end up being the defaults for model users (particularly some of the users they most want to influence)

This is a fair criticism: we started this project with the plan of providing somewhat authoritative numbers but discovered this to be more difficult than we initially expected and instead opted to express significant skepticism about the default choices. Where there was controversy (for instance, in how many years forward we should look), we opted for middle-of-the-road choices. I agree that it would add a lot of value to get reasonable and well-thought-out defaults. Maybe the best way to approach controversy would be to opt for different sets of parameter defaults that users could toggle between based on what different people in the community think.

I found it difficult to provide very large numbers on future population per star - I think with current rates of economic and compute growth, the number of digital people could be extremely high very quickly.

The ability to try to represent digital people with populations per star was a last-minute choice. We originally just aimed for that parameter to represent human populations. (It isn’t even completely obvious to me that stars are the limiting factor on the number of digital people.) However, I also think these things don’t matter since the main aim of the project isn’t really affected by exactly how valuable x-risk projects are in expectation. If you think there may be large populations, the model is going to imply incredibly high rates of return on extinction risk work. Whether those are the obvious choice or not depends not on exactly how high the return, but on how you feel about the risk, and the risks won't change with massively higher populations.

I think some x-risk interventions could plausibly have very long run effects on x-risk (e.g. by building an aligned super intelligence)

If you think we’ll likely have an aligned super-intelligence within 100 years, then you might try to model this by setting risks very low after the next century and treating your project as just a small boost on its eventual discovery. However, you might not think that either superaligned AI or extinction is inevitable. One thing we don’t try to do is model trajectory changes, and those seem potentially hugely significant, but also rather difficult to model with any degree of confidence.

The x-risk model seems to confuse existential risk and extinction risk (medium confidence - maybe this was explained somewhere, and I missed it)

We distinguish extinction risk from risks of sub-extinction catastrophes, but we don’t model any kind of as-bad-as-extinction risks.

There is some nuance to the case that seems to get overlooked in the poll. I feel completely free to express opinions in a personal capacity that might be at odds with my employer, but I also feel that there are some things it would be inappropriate to say while carrying out my job without running it by them first. It seems like you're interested in the latter feeling, but the poll is naturally interpreted as addressing the former.

I think I agree that safety researchers should prefer not to take a purely ceremonial role at a big company if they have other good options, but I'm hesitant to conclude that no one should be willing to do it. I don't think it is remotely obvious that safety research at big companies is ceremonial.

There are a few reasons why some people might opt for a ceremonial role:

  1. It is good for some AI safety researchers to have access to what is going on at top labs, even if they can't do anything about it. They can at least keep tabs on it and can use that experience later in their careers.

  2. It seems bad to isolate capabilities researchers from safety concerns. I bet capabilities researchers would take safety concerns more seriously if they eat lunch every day with someone who is worried than if they only talk to each other.

  3. If labs do engage in behavior that is flagrantly reckless, employees can act as whistleblowers. Non-employees can't. Even if they can't prevent a disaster, they can create a paper trail of internal concerns which could be valuable in the future.

  4. Internal politics might change and it seems better to have people in place already thinking about these things.

Do you think it would be better if no one who worked at OpenAI / Anthropic / Deepmind worked on safety? If those organizations devoted less of their budget to safety? (Or do you think we should want them to hire for those roles, but hire less capable or less worried people, so individuals should avoid potentially increasing the pool of talent from which they can hire?)

Ditto for the "AI Misalignment Megaproject": $8B+ expenditure to only have a 3% chance of success (?!), plus some other misc discounting factors. Seems like you could do better with $8B.

I think we're somewhat bearish on the ability of money by itself to solve problems. The technical issues around alignment appear quite challenging, especially given the pace of development, so it isn't clear that any amount of money will be able to solve them. If the issues are too easy on the other hand, then your investment of money is unlikely to be needed and so your expenditure isn't going to reduce extinction risk.

Even if the technical issues are in the goldilocks spot of being solvable but not trivially so, the political challenges around getting those solutions adopted seem extremely daunting. There is a lot we don't explicitly specify in these parameter settings: if the money is coming from a random billionaire unaffiliated with AI scene then it might be harder to get expertise and buy in then if it is coming from insiders or the federal government.

All that said, it is plausible to me that we should have a somewhat higher chance of having an impact coupled with a lower chance of a positive outcome. A few billion dollars is likely to shake things up even if the outcome isn't what we hoped for.

I think you should put very little trust in the default parameters of the projects. It was our initial intention to create defaults that reflected the best evidence and expert opinion, but we had difficulty getting consensus on what these values should be and decided instead to explicitly stand back from the defaults. The parameter settings are adjustable to suit your views, and we encourage people to think about what those parameter settings should be and not take the defaults too seriously.

For readers' context, AI safety technical research is 80,000 Hours' top career path, whereas one could argue extinction risk from natural disasters is astronomically low.

The parameters allow you to control how far into the future you look and the outcomes include not just effects on the long-term future from the extinction / preservation of the species but also on the probabilities of near-term catastrophes that cause large numbers of death but don't cause extinction. Depending on your settings, near-term catastrophes can dominate the expected value. For the default settings for natural disasters and bio-risk, much of the value of mitigation work (at least over the next 1000 years) comes from prevention of relatively small-scale disasters. I don't see anything obviously wrong about this result and I expect that 80K's outlook is based on a willingness to consider effects more than 1000 years in the future.

Thanks for your impressions. I think your concerns largely align with ours. The model should definitely be interpreted with caution, not just because of the correlations it leaves out, but because of the uncertainty with the inputs. For the things that the model leaves out, you've got to adjust its verdicts. I think that this is still very useful because it gives us a better baseline to update from.

As for where we get inputs from, Marcus might have more to say. However, I can speak to the history of the app. Previously, we were using a standard percentage improvement, e.g. a 10% increase in DALYs averted per $. Switching to allowing users to choose a specific target effectiveness number gave us more flexibility. I am not sure what made us think that the percentages we had previously set were reasonable, but I suspect it came from experience with similar projects.

I think that the real effect of chatbots won't be better access to tailored porn, but rather constant artificial companionship. It won't be AI girlfriends that are the threat, but AI friends. AI could be set up to be funnier, more loyal, more empathetic, and more available than human friends / partners. This seems like it could have significant effects on human psychology, for both the better and worse.

I appreciate your attention to these details!

These values that we included in the CCM for these interventions should probably be treated as approximate and only accurate to roughly an order of magnitude. These actual numbers may be a bit dated and probably don't fully reflect current thinking about the marginal value of GHD interventions. I'll talk with the team about whether they should be updated, but note that this wasn't a deliberate re-evaluation of past work.

That said, it important to keep in mind that there are disagreements about what different kinds of effects are worth, such as Open Philanthropy's reassessment of cash transfers (to which both they and GiveWell pin their effectiveness evaluations). We can't directly compare OP's self-professed bar with GiveWell's self-professed bar as if the units are interchangeable. This is a complexity that is not well represented in the CCM. The Worldview Investigations team has not tried to adjudicate such disagreements over GHD interventions.

Load more