Elias Schmied

The original snippet was more about the acausal stuff, and wasn't Extrapolation, and the distinct argument I subsequently mentioned about simulators was Extrapolation.

There's no quick objection I can give to your response specifically (except to simplistically say "no, I don't buy it, we know more than that, we can have some reasonable guesses about simulators' intentions") - properly laying out my disagreements would take a little bit of time and effort, as is usually the case for deep worldview differences.

A brief list of ways AI safety efforts could be net negative

Elias Schmied4d3

"But then we're back to the Extrapolation argument, which you claimed you weren't committed to."

No, I didn't claim that - I said the snippet you quoted wasn't the Extrapolation argument, and I stand by that. I'm definitely sympathetic to something like Extrapolation in general.

A brief list of ways AI safety efforts could be net negative

Elias Schmied4d3

OK I read the LW comment you linked and skimmed the post, but I don't see how they show that we should expect lots of crucial considerations to come up specifically - they seem to argue more for "we're clueless about how much we should do ECL"? (but correct me if I'm wrong, I may have missed something). On your example, but why should I expect their attempts to do so to backfire if I don't already expect our own attempts to backfire? That seems like it just grounds out in the original debate.
(btw, I also would like to get less confused about the similarity thing)

On the simulators, it just seems like its hard to think of possible simulator-motivations where us reaching good outcomes in the simulation would be bad for the base reality, and easy to think of ones where it would be neutral or good.

In general, I think the unawareness angle is genuinely interesting, I'm just not moved by it as much as you, probably for a few different reasons that would take some time to articulate.

A brief list of ways AI safety efforts could be net negative

Elias Schmied4d*3

I think it's not quite your "Extrapolation", because it's specifically about the acausal mechanism - by definition, the only (EDIT: direct) acausal effect possible is to make other agents take similar actions to us.
(and then the simulation thing I kind of sweep under the rug because the footnote was quickly written, but the argument is somewhat similar (although very vague and I'd like something better): Whatever purpose our simulators have for simulating us, it's probably good for their reality too if we reach a good outcome in the simulation.)

Cluelessness: Summary of the argument, why it matters, and counterarguments

Elias Schmied5d1

As I said before, I think my biggest personal crux are the proposed downside risks (e.g. commitment races) of "promote wisdom, cooperation, altruism, etc" - I thought about them for a little bit and took some notes when i last read about it, but I remain skeptical that they can outweigh the intuitive upside. (I guess this is P3? But I'm not sure). Maybe I could write my thinking up at some point.

A simple argument for trying less hard

Elias Schmied6d3

Thanks Vasco!

Not sure what you're asking exactly - I'm just saying that if you're not a longtermist, you don't face as much uncertainty about how to achieve good outcomes, so the argument doesn't apply as much to you.

A simple argument for trying less hard

Elias Schmied7d*1

"Yep I get that this argument is only one consideration, but my point extends to "trying less hard" as well I think." I'm not sure I understand what you mean, could you explain it more?

"Something something marketplace of ideas?"
Yeah, this is an empirical question - theoretically, we might see better arguments rise in status and influence people more, and of course to some extent we do (e.g. AI risk rising in status over the last 10 years). But there are other factors influencing the status of ideas too, like some kind of general action-bias / power-seeking-bias. Concretely, I feel that most of the bullet points in the "uncertainty" section of the post are underrepresented in the discourse - curious if you disagree.

Also, one more point on the deferral thing - one way in which I think I'm weird is that I would genuinely raise my esteem of someone (on a gut-level) if they said "I'm hopelessly biased on this topic, I can't think about it, don't listen to me". Unfortunately, you never see this. I would really like to see that more.
E.g. if someone who tries extremely hard said "I can't really clearly think about what I'm doing because I'm working so hard, so be careful about listening to me" it would be very beautiful to me. Poetically speaking, it would be... accepting that they are a human weapon, forged for a purpose, impaired by that sacrifice, baring it for the world to see. There would be a slight feeling of heartbreak and love for them in me, and I might very well value them more than before. As I argue in the "Epistemic distortion" section, there's at least to some extent a deep tradeoff between doing and thinking - so admitting that they are trading off against thinking, crippling their mind on a deep level, could make people respect them more by showing how much they are sacrificing for the "doing" status hierarchy. It could be heroic.
That's just a fantasy I have about how things could work.

A simple argument for trying less hard

Elias Schmied7d3

Thanks Toby!

Yeah, I think there's a lot of interesting things to be said here. Some points:

-I'm not sure why we should expect the group to be well-calibrated when no individual is? That sounds a little magical to me. It's not a market dynamic where ground-level feedback directly lowers the viability/status of incorrect hypotheses, and so leads to the collective being smarter than any individual - since we get no direct feedback about our effect on the long-term future whatsoever, and certainly none that would literally force us to stop what we're doing (like not being in touch with what the market wants being able to cause your business to fail).

It feels more viable to me for every individual to act with epistemic virtue (or explicitly defer to someone who does so).

-For a slightly tangential point, it's interesting to think about what the optimal social structures of deferral would be, but I'll note that my guess is that the most influential people also tend to be among the people who work/try the hardest? That's a big part of why people are successful, after all. So if anything, this is a cause for more worry.

-On your "other point": Yeah, I certainly don't endorse doing nothing. The epistemic distortion is just one consideration that happens to push towards doing less (since it lowers the EV of doing anything, if we haven't considered it before). It needs to be weighed against the many other considerations that exist.

Curious what you think!

A simple argument for trying less hard

Elias Schmied10d3

Thank you Michael! Really appreciate you saying that.

I guess I did say at the end of the "Epistemic distortion" section that consciously trying to be more objective may not be enough to replicate the effects of trying less hard. But yeah, I'm pretty unhappy with the framing/structure overall, e.g. it could've been more concrete and clear about the mechanisms.

Elias Schmied

Posts 5

Comments22

Posts
5

Comments
22