M

MichaelDickens

6611 karmaJoined
mdickens.me

Bio

I do independent research on EA topics. I write about whatever seems important, tractable, and interesting (to me).

I have a website: https://mdickens.me/ Much of the content on my website gets cross-posted to the EA Forum, but I also write about some non-EA stuff over there.

My favorite things that I've written: https://mdickens.me/favorite-posts/

I used to work as a software developer at Affirm.

Sequences
1

Quantitative Models for Cause Selection

Comments
915

This reads to me like you're saying "these problems are hard [so Wei Dai is over-rating the importance of working on them]", whereas the inference I would make is "these problems are hard, so we need to slow down AI development, otherwise we won't be able to solve them in time."

Hmm, I think if we are in a world where the people in charge of the company that have already built ASI need to be smart/sane/selfless for things to go well, then we're already in a much worse situation than we should be, and things should have been done differently prior to this point.

I realize this is not a super coherent statement but I thought about it for a bit and I'm not sure how to express my thoughts more coherently so I'm just posting this comment as-is.

And yet, I think that very little AI safety work focuses on affecting P(things go really well | no AI takeover). Probably Forethought is doing the best work in this space.

Do you think this sort of work is related to AI safety? It seems to me that it's more about philosophy (etc.) so I'm wondering what you had in mind.

Indeed, I think it’s possible that there will, in fact, come a time when Anthropic should basically just unilaterally drop out of the race – pivoting, for example, entirely to a focus on advocacy and/or doing alignment research that it then makes publicly available.

Do you have a picture of what conditions would make it a good idea for Anthropic to drop out of the race?

simulating the same wonderful experience a billion times certainly isn't a billion times greater than simulating it once..

I disagree but I don't think this is really a crux. The ideal future could involve filling the universe with beings who have extremely good experiences compared to humans (and do not resemble humans at all) but their experiences are still very diverse.

And, this is sort of an unanswered question about how qualia work, but my guess is that for combinatoric reasons, you could fill the accessible universe with (say) 10^40 beings who all have different experiences where the worst experience out of all of them is only a bit worse than the best.

I've seen a number of people I respect recommend Horizon, but I've never seen any of them articulate a compelling reason why they like it. For example in that comment you linked in the footnote, I found the response pretty unpersuasive (which is what I said in my follow-up comment, which got no reply). Absence of evidence is evidence of absence, but I have to weigh that against the fact that so many people seem to like Horizon.

A couple weeks ago I tried reaching out to Horizon to see if they could clear things up, but they haven't responded. Although even if they did respond, I made it apparent that the answer I'm looking for is "yes Horizon is x-risk-pilled", and I'm sure they could give that answer even if it's not true.

The next-gen LLM might pose an existential threat

I'm pretty sure that the next generation of LLMs will be safe. But the risk is still high enough to make me uncomfortable.

How sure are we that scaling laws are correct? Researchers have drawn curves predicting how AI capabilities scale based on how much goes into training them. If you extrapolate those curves, it looks like the next level of LLMs won't be wildly more powerful than the current level. But maybe there's a weird bump in the curve that happens in between GPT-5 and GPT-6 (or between Claude 4.5 and Claude 5), and LLMs suddenly become much more capable in a way that scaling laws didn't predict. I don't think we can be more than 99.9% confident that there's not.

How sure are we that current-gen LLMs aren't sandbagging (that is, deliberately hiding their true skill level)? I think they're still dumb enough that their sandbagging can be caught, and indeed they have been caught sandbagging on some tests. I don't think LLMs are hiding their true capabilities in general, and our understanding of AI capabilities is probably pretty accurate. But I don't think we can be more than 99.9% confident about that.

How sure are we that the extrapolated capability level of the next-gen LLM isn't enough to take over the world? It probably isn't, but we don't really know what level of capability is required for something like that. I don't think we can be more than 99.9% confident.

Perhaps we can be >99.99% that the extrapolated capability of the next-gen LLM is still not as smart as the smartest human. But an LLM has certain advantages over humans—it can work faster (at least on many sorts of tasks), it can copy itself, it can operate computers in a way that humans can't.

Alternatively, GPT-6/Claude 5 might not be able to take over the world, but it might be smart enough to recursively self-improve, and that might happen too quickly for us to do anything about.

How sure are we that we aren't wrong about something else? I thought of three ways we could be disastrously wrong:

  1. We could be wrong about scaling laws;
  2. We could be wrong that LLMs aren't sandbagging;
  3. We could be wrong about what capabilities are required for AI to take over.

But we could be wrong about some entirely different thing that I didn't even think of. I'm not more than 99.9% confident that my list is comprehensive.

On the whole, I don't think we can say there's less than a 0.4% chance that the next-gen LLM forces us down a path that inevitably ends in everyone dying.

Maybe this is a me problem but I found this essay pretty impenetrable. I can't figure out what the thesis is, and I struggle to even understand what most of the individual sentences are saying.

You have a list of "learn to learn" methods, and then you said "Can we haz nice thingss? Futureburger n real organk lief maybs?" I'm not sure I'm interpreting you correctly, but it sounds like you're saying something like

If we biological humans get sufficiently good at learning to learn, using methods such as the Doman method, mnemonics, etc., then perhaps we can keep up with the rate at which ASI learns things, and thus avoid bad outcomes where humans get completely dominated by ASI.

If that's what you mean then I disagree, I don't think our current understanding of the science of learning is remotely near where it would need to be to keep up with ASI, and in fact I would guess that even a perfect-learner human brain would still never be able to keep up with ASI regardless of how good a job it does. Human brains still have physical limits. An ASI need not have physical limits because it can (e.g.) add more transistors to its brain.

Harangue old-hand EA types to (i) talk about and engage with EA (at least a bit) if they are doing podcasts, etc; (ii) post on Forum (esp if posting to LW anyway), twitter, etc, engaging in EA ideas; (iii) more generally own their EA affiliation.

I think the carrot is better than the stick. Rather than (or in addition to) haranguing people who don't engage, what if we reward people who do engage? (Although I'm not sure what "reward" means exactly)

You could say I'm an old-hand EA type (I've been involved since 2012) and I still actively engage in the EA Forum. I wouldn't mind a carrot.

Will, I think you deserve a carrot, too. You've written 11 EAF posts in the past year! Most of them were long, too! I've probably cited your "moral error" post about a dozen times since you wrote it. I don't know how exactly I can reward you for your contributions but at a minimum I can give you a well-deserved compliment.

I see many other long-time EAs in this comment thread, most of whom I see regularly commenting/posting on EAF. They're doing a good job, too!

(I feel like this post sounds goofy but I'm trying to make it come across as genuine, I've been up since 4am so I'm not doing my best work right now)

Load more