what exactly do you mean by feedback loop/effects? if you mean a feedback loop involving actions into the world and then observations going back to the AI, even though i don't see why this would necessarily be an issue, i insist that in one-shot alignment, this is not a thing at least for the initial AI, and it has enough leeway to make sure that its single-action, likely itself an AI, will be extremely robust.
an intelligent AI does not need to contend with the complex world on the outset — it can come up with really robust designs for superintelligences that save the world with only limited information about the world, and definitely without interaction with the world, like in That Alien Message.
of course it can't model everything about the world in advance, but whatever steering we can do as people, it can do way better; and, if it is aligned, this includes way better steering towards nice worlds. a one-shot aligned AI (let's call it AI₀) can, before its action, design a really robust AI₁ which will definitely keep itself aligned, be equipped with enough error-codes to ensure that its instances will get corrupted approximately 0 times until heat death, and ensure that that AI₁ will take over the world very efficiently and then steer it from its singleton position without having to worry about selection effects.
i think the core of my disagreement with this claim is composed of two parts:
note that i am approaching the problem from the angle of AI alignment rather than AI containment — i agree that continuing to contain AI as it gains in intelligence is likely a fraught exercise, and i instead work to ensure that AI systems continue to steer the world towards nice things even when they are outside of containment, and especially once they reach decisive strategic advantage / singletonhood. AI achieving singeltonhood is the most likely outcome i expect.
All AGI outputs will tend to iteratively select[11] towards those specific AGI substrate-needed conditions. In particular: AGI hardware is robust over and needs a much wider range of temperatures and pressures than our fragile human wetware can handle.
i think this quote probably captures the core claim of yours that i'd disagree with — it seems to assume that such AI would either be unaligned, or would have to contend with other unaligned AIs. if we have an aligned singleton, then its reasoning would go something like:
maximally going, or getting selected for, "the directions needed for [my] own continued and greater existence", sure seems like it would indeed cause damage that would cause humankind to die. i am aligned enough to not want that, and intelligent enough to notice this possible failure mode, so i will choose to do something else which is not that.
an aligned singleton AI would notice this failure mode and choose to implement another policy which is better at achieving desired outcomes. notably, it would make sure that the conditions on earth and throughout the universe are not up to selection effects, but up to its deliberate decisions. the whole point of aligned powerful agents is that they steer things towards desirable outcomes rather than relying on selection effects.
these points also don't seem quite right to me, or too ambiguous.
"dignity points" means "having a positive impact".
if alignment is hard we need my plan. and it's still very likely alignment is hard.
and "alignment is hard" is a logical fact not indexical location, we don't get to save "those timelines".
i don't think that's how dignity points works.
for me, p(alignment hard) is still big enough that when weighing
it's still better to keep working on hard alignment (see my plan). that's where the dignity points are.
"shut up and multiply", one might say.
I don't think "has the ship sailed or not" is a binary (see also this LW comment). We're not actually at maximum attention-to-AI, and it is still worthy of consideration whether to keep pushing things in the direction of more attention-to-AI rather than less. And this is really a quantitative matter, since a treaty can only buy some time (probably at most a few years).