Arden Koehler

2601 karmaJoined


Carl Shulman questioned the tension between AI welfare & AI safety on the 80k podcast recently -- I thought this was interesting! Basically argues AI takeover could be even worse for AI welfare. From the end of the section.

Rob Wiblin: Maybe a final question is it feels like we have to thread a needle between, on the one hand, AI takeover and domination of our trajectory against our consent — or indeed potentially against our existence — and this other reverse failure mode, where humans have all of the power and AI interests are simply ignored. Is there something interesting about the symmetry between these two plausible ways that we could fail to make the future go well? Or maybe are they just actually conceptually distinct?

Carl Shulman: I don’t know that that quite tracks. One reason being, say there’s an AI takeover, that AI will then be in the same position of being able to create AIs that are convenient to its purposes. So say that the way a rogue AI takeover happens is that you have AIs that develop a habit of keeping in mind reward or reinforcement or reproductive fitness, and then those habits allow them to perform very well in processes of training or selection. Those become the AIs that are developed, enhanced, deployed, then they take over, and now they’re interested in maintaining that favourable reward signal indefinitely.

Then the functional upshot is this is, say, selfishness attached to a particular computer register. And so all the rest of the history of civilisation is dedicated to the purpose of protecting the particular GPUs and server farms that are representing this reward or something of similar nature. And then in the course of that expanding civilisation, it will create whatever AI beings are convenient to that purpose.

So if it’s the case that, say, making AIs that suffer when they fail at their local tasks — so little mining bots in the asteroids that suffer when they miss a speck of dust — if that’s instrumentally convenient, then they may create that, just like humans created factory farming. And similarly, they may do terrible things to other civilisations that they eventually encounter deep in space and whatnot.

And you can talk about the narrowness of a ruling group and say, and how terrible would it be for a few humans, even 10 billion humans, to control the fates of a trillion trillion AIs? It’s a far greater ratio than any human dictator, Genghis Khan. But by the same token, if you have rogue AI, you’re going to have, again, that disproportion.

Thanks for this valuable reminder!

btw, the link on "more about legal risks" at the top goes to the wrong place.

Cool project - I tried to subscribed to the podcast, to check it out. But I couldn't find it on pocketcasts, so I didn't (didn't seem worth me using a 2nd platform).

I wanted to subscribe because I've wanted an audio feed that will help me be in touch with events outside my more specific areas of interest that i hear about through niche channels while I commute, while not going quite as broad / un-curated as the BBC news (which I currently use for this) -- and this seemed like potentially a good middle ground.

tiny other feedback: the title feels aggressive to me vs. some nearby alternatives (e.g. just "relevance news" or something) - since it nearly states that anything that is not there is not actually relevant at all, which is a fairly strong claim I could see people getting unhappy about.

The project aligns closely with the fund's vision of a "principles-first EA" community, we’d be excited for the EA community’s outputs to look more like Richard’s.

Is this saying that the move to principle's first EA as a strategic perspective for EAF goes with a belief that more EA work should be "principles first" & not cause specific? (so that more of the community's outputs look like Richard's)? I wouldn't have necessarily inferred that just from the fact that you're making this strategic shift (could be ore of a comp advantage / focus thing) so wanted to clarify.

Speaking in a personal capacity here --

We do try to be open to changing our minds so that we can be cause neutral in the relevant sense, and we do change our cause rankings periodically and spend time and resources thinking about them (in fact we’re in the middle of thinking through some changes now). But how well set up are we, institutionally, to be able to in practice make changes as big as deprioritising risks from AI if we get good reasons to? I think this is a good question, and want to think about it more. So thanks!

Just want to say here (since I work at 80k & commented abt our impact metrics & other concerns below) that I think it's totally reasonable to:

  1. Disagree with 80,000 Hours's views on AI safety being so high priority, in which case you'll disagree with a big chunk of the organisation's strategy.
  2. Disagree with 80k's views on working in AI companies (which, tl;dr, is that it's complicated and depends on the role and your own situation but is sometimes a good idea). I personally worry about this one a lot and think it really is possible we could be wrong here. It's not obvious what the best thing to do here is, and we discuss this a bunch internally. But we think there's risk in any approach to issue, and are going with our best guess based on talking to people in the field. (We reported on some of their views, some of which were basically 'no don't do it!', here.)
  3. Think that people should prioritise personal fit more than 80k causes them to. To be clear, we think (& 80k's content emphasises) that personal fit matters a lot. But it's possible we don't push this hard enough. Also, because we think it's not the only thing that matters for impact (& so also talk a lot about cause and intervention choice), we tend to present this as a set of considerations to navigate that involves some trade-offs. So It's reasonable to think that 80k encourages too much trading off of personal fit, at least for some people. 

Hey, Arden from 80,000 Hours here – 

I haven't read the full report, but given the time sensitivity with commenting on forum posts, I wanted to quickly provide some information relevant to some of the 80k mentions in the qualitative comments, which were flagged to me.

Regarding whether we have public measures of our impact & what they show

It is indeed hard to measure how much our programmes counterfactually help move talent to high impact causes in a way that increases global welfare, but we do try to do this.

From the 2022 report the relevant section is here. Copying it in as there are a bunch of links. 

We primarily use six sources of data to assess our impact:

  1. Open Philanthropy EA/LT survey
  2. EA Survey responses
  3. The 80,000 Hours user survey. A summary of the 2022 user survey is linked in the appendix. 
  4. Our in-depth case study analyses, which produce our top plan changes and DIPY estimates (last analysed in 2020). 
  5. Our own data about how users interact with our services (e.g. our historical metrics linked in the appendix). 
  6. Our and others' impressions of the quality of our visible output. 

Overall, we’d guess that 80,000 Hours continued to see diminishing returns to its impact per staff member per year. [But we continue to think it's still cost-effective, even as it grows.]

Some elaboration: 

  • DIPY estimates are our measure of contractual career plan shifts we think will be positive for the world. Unfortunately it's hard to get an accurate read on counterfactuals and response rates, so these are only very rough estimates & we don't put that much weight on them.
  • We report on things like engagement time & job board clicks as *lead metrics* because we think they tend to flow through to counterfactual high impact plan changes, & we're able to measure them much more readily.
  • Headlines from some of the links above: 
    • From our own survey (2138 respondents):
      • On the overall social impact that 80,000 Hours had on their career or career plans, 
        • 1021 (50%) said 80,000 Hours increased their impact
          • Within this we identified 266 who reported >30% chance of 80,000 Hours causing them to taking a new jobs or graduate course (a “criteria based plan change”)
        • 26 (1%) said 80,000 Hours reduced their impact.
          • Themes in answers were demoralisation and causing career choices that were a poor fit
    • Open Philanthropy's EA/LT survey was aimed at asking their respondents "​​“What was important in your journey towards longtermist priority work?” – it has a lot of different results and feels hard to summarise, but it showed a big chunk of people considered 80k a factor in ending up working where they are.
    • The 2020 EA survey link says "More than half (50.7%) of respondents cited 80,000 Hours as important for them getting involved in EA". (2022 says something similar

Regarding the extent to which we are cause neutral & whether we've been misleading about this

We do strive to be cause neutral, in the sense that we try to prioritize working on the issues where we think we can have the highest marginal impact (rather than committing to a particular cause for other reasons). 

For the past several years we've thought that the most pressing problem is AI safety, so have put much of our effort there (Some 80k programmes focus on it more than others – I reckon for some it's a majority, but it hasn't been true that as an org we “almost exclusively focus on AI risk.” (a bit more on that here.))

In other words, we're cause neutral, but not cause *agnostic* - we have a view about what's most pressing. (Of course we could be wrong or thinking about this badly, but I take that to be a different concern.)

The most prominent place we describe our problem prioritization is our problem profiles page – which is one of our most popular pages. We describe our list of issues this way: "These areas are ranked roughly by our guess at the expected impact of an additional person working on them, assuming your ability to contribute to solving each is similar (though there’s a lot of variation in the impact of work within each issue as well). (Here's also a past comment from me on a related issue.) 

Regarding the concern about us harming talented EAs by causing them to choose bad early career jobs

To the extent that this has happened this is quite serious – helping talented people have higher impact careers is our entire point! I think we will always sometimes fail to give good advice (given the diversity & complexity of people's situations & the world), but we do try to aggressively minimise negative impacts, and if people think any particular part of our advice is unhelpful, we'd like them to contact us about it! (I'm arden@80000hours.org & I can pass them on to the relevant people.)

We do also try to find evidence of negative impact, e.g. using our user survey, and it seems dramatically less common than the positive impact (see the stats above), though there are of course selection effects with that kind of method so one can't take that at face value!

Regarding our advice on working at AI companies and whether this increases AI risk

This is a good worry and we talk a lot about this internally! We wrote about this here.

I like this post and also worry about this phenomenon.

When I talk about personal fit (and when we do so at 80k) it's basically about how good you are at a thing/the chance that you can excel.

It does increase your personal fit for something to be intuitively motivated by the issue it focuses on, but I agree that it seems way too quick to conclude then that your personal fit with that is higher than other things (since there are tons of factors and there are also lots of different jobs for each problem area), let alone that that means you should work on that issue all things considered (since personal fit is not the only factor).

I think it would be especially valuable to see to which degree they reflect the individual judgment of decision-makers.

The comment above hopefully helps address this.

I would also be interested in whether they take into account recent discussions/criticisms of model choices in longtermist math that strike me as especially important for the kind of advising 80.000 hours does (tldr: I take one crux of that article to be that longtermist benefits by individual action are often overstated, because the great benefits longtermism advertises require both reducing risk and keeping overall risk down long-term, which plausibly exceeds the scope of a career/life).

We did discuss this internally in slack (prompted by David's podcatst https://critiquesofea.podbean.com/e/astronomical-value-existential-risk-and-billionaires-with-david-thorstad/). My take was that the arguments don't mean that reducing existential risk isn't very valuable, even though they do imply it's likely not of 'astronomical' value. So e.g. it's not as if you can ignore all other considerations and treat "whether this will reduce existential risk" as a full substitute for whether something is a top priority. I agree with that.

We do generally agree that many questions in global priorities research remain open — that’s why we recommend some of our readers pursue careers in this area. We’re open to the possibility that new developments in this field could substantially change our views.

I think there would be considerable value in having the biggest career-advising organization (80k) be a non-partisan EA advising organization, whereas I currently take them to be strongly favoring longtermism in their advice. While I feel this explicit stance is a mistake, I feel like getting a better grasp on its motivation would help me understand why it was taken.

We're not trying to be 'partisan', for what it's worth. There might be a temptation to sometimes see longtermism and neartermism as different camps, but what we're trying to do is just figure out all things considered what we think is most pressing / promising and communicate that to readers. We tend to think that propensity to affect the long-run future is a key way in which an issue can be extremely pressing (which we explain in our longtermism article.)

?I think it would be valuable to include all the additional notes which are not on your website. As a minimum viable product, you may want to link to your comment.

Thanks for your feedback here!

Your previous quantitative framework was equivalent to a weighted-factor model (WFM) with the logarithms of importance, tractability and neglectedness as factors with the same weight, such the sum respects the logarithm of the cost-effectiveness. Have you considered trying a WFM with the factors that actually drive your views?

I feel unsure about whether we should be trying to do another WFM at some point. There are a lot of ways we can improve our advice, and I’m not sure this should be at the top of our list but perhaps if/when we have more research capacity. I'd also guess it would still have the problem of giving a misleading sense of precision, so it’s not clear how much of an improvement it would be. But it is certainly true that the ITN framework substantially drives our views.

Load more