M

MichaelDickens

6865 karmaJoined
mdickens.me

Bio

I do independent research on EA topics. I write about whatever seems important, tractable, and interesting (to me).

I have a website: https://mdickens.me/ Much of the content on my website gets cross-posted to the EA Forum, but I also write about some non-EA stuff over there.

My favorite things that I've written: https://mdickens.me/favorite-posts/

I used to work as a software developer at Affirm.

Sequences
1

Quantitative Models for Cause Selection

Comments
927

I won't go through this whole post but I'll pick out a few representative bits to reply to.

Deutsch’s idea of explanatory universality helps clarify the mistake. Persons are universal explainers. They create new explanations that were not contained in past data. This creativity is not extrapolation from a dataset. It is invention.

LLMs do not do this. They remix what exists in their training corpus. They do not originate explanatory theories.

This statement expresses a high degree of confidence in a claim that has, as far as I can tell, zero supporting evidence. I would strongly bet against the prediction that LLMs will never be able to originate an explanatory theory.

Until we understand how humans create explanatory knowledge, we cannot program that capacity.

We still don't know how humans create language, or prove mathematical conjectures, or manipulate objects in physical space, and yet we created AIs that can do those things.

The AI 2027 paper leans heavily on forecasting. But when the subject is knowledge creation, forecasting is not just difficult. It is impossible in principle. This was one of Karl Popper’s central insights.

I am not aware of any such insight? This claim seems easily falsified by the existence of superforecasters.

And: if prediction is impossible in principle, then you can't confidently say that ASI won't kill everyone, therefore you should regard it as potentially dangerous. But you seem to be quite confident that you know what ASI will be like.

The rationalist story claims a superintelligent AI will likely be a moral monster. This conflicts with the claim that such a system will understand the world better than humans do.

https://www.lesswrong.com/w/orthogonality-thesis

Oh maybe that doc wasn't supposed to be public, I will remove the reference.

Key Uncertainties

The listed uncertainties do not include my biggest question, which is: "do Talos fellows do more to reduce AI x-risk than the counterfactual hire?"

I see, I took the chart under "The compensation schedule's structure" to imply that the Axiom of Continuity held for suffering, based on the fact that the X axis shows suffering measured on a cardinal scale.

If you reject Continuity for suffering then I don't think your assumptions are self-contradictory.

I think a downvote-as-disapproval wouldn't work in this case, I would want to use upvote+disagree to express "this is an important change that the community should know about, but I disagree with the decision". A downvote de facto communicates "I don't want people to read this post".

Yeah it's also something I want to get more clarity on. This post is about the step of the chain that goes from "donate money to campaign" -> "candidates gets elected", but it's harder to say what happens after that. I'm working on some future posts that I hope will help me get a better understanding.

Some thoughts:

  • Overall I think I am more confident than you that extinction risk is more important than catastrophic risk, but I agree that this topic is worth exploring and I'm glad you're doing it. We have too much of a tendency to fall into "X is more important than Y" [with 90% probability] and then spend so few resources on Y that Y is now more valuable on the margin even though it only has a 10% chance of mattering. (I'm not saying 10% is the correct number in this case, that's just an example)
  • In your Parfit-esque categorization of three scenarios, I agree that "the difference in value between 3 and 2 is very much greater than the difference between 2 and 1" and I think this is important, although I would also note that #3 could be much worse than #2 if #3 entails spreading wild animal suffering.
  • I'm having a hard time wrapping my head around what the "1 unit of extinction" equation is supposed to represent.
  • By my interpretation, the parable of the apple tree is more about P(recovery) than it is about P(flourishing|recovery). If the low-hanging apples are eaten, that makes it harder to rebuild the tall ladders. But once you've rebuilt them, you're okay. I think your central argument of "flourishing might be harder after recovery than if the catastrophe had been avoided" is still good, but the parable doesn't seem to support that argument.
  • The model of P(success) = (1 - r)^N seems much clearer, but it also seems a bit different than what I understood your argument to be. I understood the fruit-picking argument to be something like "Resources get used up, so getting back to a level of technology the 2nd time is harder than the 1st time." But this equation is embedding an argument more like "A higher probability of catastrophe means there's a higher chance that civilization keeps getting set back by catastrophes without ever expanding to the stars." Which is also an interesting idea but it seems different than what you wrote in prose. It chains into the first argument because the more times civilization collapses, the more resources get used up; but you could also make the 1st argument without making the 2nd one (i.e., even a single catastrophe may be enough to permanently prevent interstellar expansion).
  • "If AGI is significantly more than a decade away and higher annual risk estimates of other catastrophes are plausible, then we might still have a greater expected loss of value from those catastrophes" -- This seems unlikely to me, but I'd like to see some explicit modeling. What range of assumptions would make us want to prioritize [non-extinction, non-AI] catastrophe reduction over AI x-risk?
  • "Reclaimed technology makes tech progress easy for future civilisations" -- I agree that this seems outweighed by the opposite consideration, although I'm not confident and I don't think anyone has a great understanding of this.
  • "But by far the strongest and surely least controversial consequence of this model is that longtermists should seek to reduce non-extinction catastrophic risk almost as urgently as they do extinction risks" – This conclusion depends on the numbers. The model converted catastrophic risk into "extinction units" but it's not clear from the model that catastrophic risk is of similar importance to extinction risk (and my sense is that it isn't).
Load more