Arden Koehler

2581 karmaJoined


Speaking in a personal capacity here --

We do try to be open to changing our minds so that we can be cause neutral in the relevant sense, and we do change our cause rankings periodically and spend time and resources thinking about them (in fact we’re in the middle of thinking through some changes now). But how well set up are we, institutionally, to be able to in practice make changes as big as deprioritising risks from AI if we get good reasons to? I think this is a good question, and want to think about it more. So thanks!

Just want to say here (since I work at 80k & commented abt our impact metrics & other concerns below) that I think it's totally reasonable to:

  1. Disagree with 80,000 Hours's views on AI safety being so high priority, in which case you'll disagree with a big chunk of the organisation's strategy.
  2. Disagree with 80k's views on working in AI companies (which, tl;dr, is that it's complicated and depends on the role and your own situation but is sometimes a good idea). I personally worry about this one a lot and think it really is possible we could be wrong here. It's not obvious what the best thing to do here is, and we discuss this a bunch internally. But we think there's risk in any approach to issue, and are going with our best guess based on talking to people in the field. (We reported on some of their views, some of which were basically 'no don't do it!', here.)
  3. Think that people should prioritise personal fit more than 80k causes them to. To be clear, we think (& 80k's content emphasises) that personal fit matters a lot. But it's possible we don't push this hard enough. Also, because we think it's not the only thing that matters for impact (& so also talk a lot about cause and intervention choice), we tend to present this as a set of considerations to navigate that involves some trade-offs. So It's reasonable to think that 80k encourages too much trading off of personal fit, at least for some people. 

Hey, Arden from 80,000 Hours here – 

I haven't read the full report, but given the time sensitivity with commenting on forum posts, I wanted to quickly provide some information relevant to some of the 80k mentions in the qualitative comments, which were flagged to me.

Regarding whether we have public measures of our impact & what they show

It is indeed hard to measure how much our programmes counterfactually help move talent to high impact causes in a way that increases global welfare, but we do try to do this.

From the 2022 report the relevant section is here. Copying it in as there are a bunch of links. 

We primarily use six sources of data to assess our impact:

  1. Open Philanthropy EA/LT survey
  2. EA Survey responses
  3. The 80,000 Hours user survey. A summary of the 2022 user survey is linked in the appendix. 
  4. Our in-depth case study analyses, which produce our top plan changes and DIPY estimates (last analysed in 2020). 
  5. Our own data about how users interact with our services (e.g. our historical metrics linked in the appendix). 
  6. Our and others' impressions of the quality of our visible output. 

Overall, we’d guess that 80,000 Hours continued to see diminishing returns to its impact per staff member per year. [But we continue to think it's still cost-effective, even as it grows.]

Some elaboration: 

  • DIPY estimates are our measure of contractual career plan shifts we think will be positive for the world. Unfortunately it's hard to get an accurate read on counterfactuals and response rates, so these are only very rough estimates & we don't put that much weight on them.
  • We report on things like engagement time & job board clicks as *lead metrics* because we think they tend to flow through to counterfactual high impact plan changes, & we're able to measure them much more readily.
  • Headlines from some of the links above: 
    • From our own survey (2138 respondents):
      • On the overall social impact that 80,000 Hours had on their career or career plans, 
        • 1021 (50%) said 80,000 Hours increased their impact
          • Within this we identified 266 who reported >30% chance of 80,000 Hours causing them to taking a new jobs or graduate course (a “criteria based plan change”)
        • 26 (1%) said 80,000 Hours reduced their impact.
          • Themes in answers were demoralisation and causing career choices that were a poor fit
    • Open Philanthropy's EA/LT survey was aimed at asking their respondents "​​“What was important in your journey towards longtermist priority work?” – it has a lot of different results and feels hard to summarise, but it showed a big chunk of people considered 80k a factor in ending up working where they are.
    • The 2020 EA survey link says "More than half (50.7%) of respondents cited 80,000 Hours as important for them getting involved in EA". (2022 says something similar

Regarding the extent to which we are cause neutral & whether we've been misleading about this

We do strive to be cause neutral, in the sense that we try to prioritize working on the issues where we think we can have the highest marginal impact (rather than committing to a particular cause for other reasons). 

For the past several years we've thought that the most pressing problem is AI safety, so have put much of our effort there (Some 80k programmes focus on it more than others – I reckon for some it's a majority, but it hasn't been true that as an org we “almost exclusively focus on AI risk.” (a bit more on that here.))

In other words, we're cause neutral, but not cause *agnostic* - we have a view about what's most pressing. (Of course we could be wrong or thinking about this badly, but I take that to be a different concern.)

The most prominent place we describe our problem prioritization is our problem profiles page – which is one of our most popular pages. We describe our list of issues this way: "These areas are ranked roughly by our guess at the expected impact of an additional person working on them, assuming your ability to contribute to solving each is similar (though there’s a lot of variation in the impact of work within each issue as well). (Here's also a past comment from me on a related issue.) 

Regarding the concern about us harming talented EAs by causing them to choose bad early career jobs

To the extent that this has happened this is quite serious – helping talented people have higher impact careers is our entire point! I think we will always sometimes fail to give good advice (given the diversity & complexity of people's situations & the world), but we do try to aggressively minimise negative impacts, and if people think any particular part of our advice is unhelpful, we'd like them to contact us about it! (I'm arden@80000hours.org & I can pass them on to the relevant people.)

We do also try to find evidence of negative impact, e.g. using our user survey, and it seems dramatically less common than the positive impact (see the stats above), though there are of course selection effects with that kind of method so one can't take that at face value!

Regarding our advice on working at AI companies and whether this increases AI risk

This is a good worry and we talk a lot about this internally! We wrote about this here.

I like this post and also worry about this phenomenon.

When I talk about personal fit (and when we do so at 80k) it's basically about how good you are at a thing/the chance that you can excel.

It does increase your personal fit for something to be intuitively motivated by the issue it focuses on, but I agree that it seems way too quick to conclude then that your personal fit with that is higher than other things (since there are tons of factors and there are also lots of different jobs for each problem area), let alone that that means you should work on that issue all things considered (since personal fit is not the only factor).

I think it would be especially valuable to see to which degree they reflect the individual judgment of decision-makers.

The comment above hopefully helps address this.

I would also be interested in whether they take into account recent discussions/criticisms of model choices in longtermist math that strike me as especially important for the kind of advising 80.000 hours does (tldr: I take one crux of that article to be that longtermist benefits by individual action are often overstated, because the great benefits longtermism advertises require both reducing risk and keeping overall risk down long-term, which plausibly exceeds the scope of a career/life).

We did discuss this internally in slack (prompted by David's podcatst https://critiquesofea.podbean.com/e/astronomical-value-existential-risk-and-billionaires-with-david-thorstad/). My take was that the arguments don't mean that reducing existential risk isn't very valuable, even though they do imply it's likely not of 'astronomical' value. So e.g. it's not as if you can ignore all other considerations and treat "whether this will reduce existential risk" as a full substitute for whether something is a top priority. I agree with that.

We do generally agree that many questions in global priorities research remain open — that’s why we recommend some of our readers pursue careers in this area. We’re open to the possibility that new developments in this field could substantially change our views.

I think there would be considerable value in having the biggest career-advising organization (80k) be a non-partisan EA advising organization, whereas I currently take them to be strongly favoring longtermism in their advice. While I feel this explicit stance is a mistake, I feel like getting a better grasp on its motivation would help me understand why it was taken.

We're not trying to be 'partisan', for what it's worth. There might be a temptation to sometimes see longtermism and neartermism as different camps, but what we're trying to do is just figure out all things considered what we think is most pressing / promising and communicate that to readers. We tend to think that propensity to affect the long-run future is a key way in which an issue can be extremely pressing (which we explain in our longtermism article.)

?I think it would be valuable to include all the additional notes which are not on your website. As a minimum viable product, you may want to link to your comment.

Thanks for your feedback here!

Your previous quantitative framework was equivalent to a weighted-factor model (WFM) with the logarithms of importance, tractability and neglectedness as factors with the same weight, such the sum respects the logarithm of the cost-effectiveness. Have you considered trying a WFM with the factors that actually drive your views?

I feel unsure about whether we should be trying to do another WFM at some point. There are a lot of ways we can improve our advice, and I’m not sure this should be at the top of our list but perhaps if/when we have more research capacity. I'd also guess it would still have the problem of giving a misleading sense of precision, so it’s not clear how much of an improvement it would be. But it is certainly true that the ITN framework substantially drives our views.

I agree that it might be worthwhile to try to become the president of the US - but that wouldn't mean it's best for us to have an article on it, especially highly ranked. that takes real estate on our site, attention from readers, and time. This specific path is a sub-category of political careers, which we have several articles on. In the end, it is not possible for us to have profiles on every path that is potentially worthwhile for someone. My take is that it's better for us to prioritise options where the described endpoint is achievable for at least a healthy handful of readers.

No, we have lots of external advisors that aren't listed on our site. There are a few reasons we might not list people, including:

  • We might not want to be committed to asking for someone's advice for a long time or need to remove them at some point.

  • The person might be happy to help us and give input but not want to be featured on our site.

  • It's work to add people, and we often will reach out to someone in our network fairly quickly and informally, and it would feel like overkill / too much friction to get a bio, and get permission from them for it, on our site for them because we asked them a few questions.

  • Also, there are too many people we get takes from over the course of e.g. a few years to list in a way that would give context and not require substantial person-hours of upkeep. So instead we just list some representative advisors who give us input on key subject matters we work on and where they have notable expertise.

This is a good question -- we don't have a formal approach here, and I personally think that in general, it's quite a hard problem who to ask for advice.

A few things to say:

  • the ideal is often to have both.

  • the bottleneck on getting more people with domain expertise is more often us not having people in our network with sufficient expertise, that we know about and believe are highly credible, and who are willing to give us their time, rather than their values. People who share our values tend to be more excited to work with us.

  • it depends a lot on the subject matter we are asking about. e.g. if it's an article about how to become a great software engineer, we don't care so much about the person's values; we care about their software engineering credentials. If it's e.g. an article about how to balance doing good and doing what you love, we care a lot more about their values

Hey Vasco —

Thanks for your interest and also for raising this with us before you posted so I could post this response quickly!

I think you are asking about the first of these, but I'm going to include a few notes on the 2nd and 3rd too as well just in case, as there's a way of hearing your question as about them. 

  1. What is the internal process by which these rankings are produced and where do you describe it? 
  2. What are problems and paths being ranked by? What does the ranking mean?
  3. Where is our reasoning for why we rank each problem or path the way we do? 

We've written some about these things on our site. We’re on the lookout for ways to improve our processes and how we communicate about them (e.g. I updated our research principles and process page this year and would be happy to add more info if it seemed important. If some of the additional notes below seem like they should be included that'd be helpful to hear.) 

Here's a summary of what we say now with some additional notes:

On (1):

Our "Research principles and process" page is the best place to look for an overview, but it doesn't describe everything. 

I'll quote a few relevant bits here:

> Though most of our articles have a primary author, they are always reviewed by other members of the team before publication.

> For major research, we send drafts to several external researchers and people with experience in the area for feedback.

> We seek to proactively gather feedback on our most central positions — in particular, our views on the most pressing global problems and the career paths that have the highest potential for impact, via regularly surveying domain experts and generalist advisors who share our values.

> For some important questions, we assign a point person to gather input from inside and outside 80,000 Hours and determine our institutional position. For example, we do this with our list of the world’s most pressing problems, our page on the top most promising career paths, and some controversial topics, like whether to work at an AI lab. Ultimately, there is no formula for how to combine this input, so we make judgement calls [...] Final editorial calls on what goes on the website lie with our website director. [me, Arden]

> Finally, many of our articles are authored by outside experts. We still always review the articles ourselves to try to spot errors and ensure we buy the arguments being made by the author, but we defer to the author on the research (though we may update the article substantively later to keep it current).

Here are some additional details that aren't on the page:

To reply to your specific question about aggregating people's personal rankings: no, we don't do any formal sort of 'voting' system like that. The problems and paths rankings are informed by the views of the staff at 80,000 Hours and external advisors via surveys where I elicit people's personal rankings, and lots of ongoing internal discussion, but I am the "point person" for ultimately deciding how to combine this information into a ranking. In practice, this means my views can be expected to have an outsized influence, but I put a lot of emphasis on takes from others and aim for the lists to be something 80,000 Hours as an organisation can stand behind. Another big factor is what the lists were before, which I tend to view as a prior to update from, and which were informed by the research we did in the past and the views of people like like Ben Todd, Howie Lempel, and Rob Wiblin.

Our process has evolved over the years, and, for example, the formal "point person" system described above is recent as of this year (though it was informally something a bit like that before). I expect it'll continue to change, and hopefully improve, especially as we grow the team (right now we have only 2 research staff).

Sometimes it's been a while since we've looked at a problem or path, and we decide to re-do the article on it. That might trigger a change in ranking if we discover something that changes our minds.

More often we adjust the rankings over time without necessarily first re-doing the articles, often in response to surveys of advisors and team members, feedback we get, or events in the world. This might then trigger looking more into something and adding or re-doing a relevant article. 

The rankings are not nearly as formal or quantitative as, e.g. the cost-effectiveness analyses that GiveWell performs of its top charities. Though previous versions of the site have included numerical weightings to something like the problem profiles list, we’ve moved away from that practice. We didn’t think the BOTECs and estimations that generated these kinds of numbers were actually driving our views, and the numbers they produced seemed like they suggested a misleading sense of precision. Ranking problems and career paths is messy and we aren't able to be precise. We discuss our level of certainty in e.g. the problem profiles FAQ and at the end of the reserach principles page and try to reflect it in the language on the problems and career path pages. 

As you noted, when we make a big change, like adding a new career path to the priority paths, we try to announce it in some prominent form, though we don't always end up thinking it's worth it. E.g. we sent a newsletter in April explaining why we now consider infosec to be a priority path. We made a similar announcement when we added AI hardware expertise to the priority paths. Our process for this isn't very systematic.

On (2): 

For problems: In EA shorthand, the ranking is via the ITN framework. We try to describe that in a more accessible / short way at the top of the page in the passage you quoted.

We also have an FAQ which talks a bit more about it.

For career paths it is slightly more complicated. A factor we weren't able to fit into the passage you quoted is: we also down-rank paths if they are super narrow/most people can't follow them (or don't write about them at all) – e.g. becoming a public intellectual (or to take an extreme example, becoming president of the US.)

On (3):

For the most part, we want the articles themselves to explain our reasoning – in each problem profile or career review, we say why we think it's as pressing / promising as we think it is. 

We also draw on surveys of 80k staff + external advisors to additionally help determine and adjust the ranking over time, as described above. We don't publish these surveys, but we describe the general type of person we tend to ask for input here.


Load more