I think that's right, but modern AI benchmarks seem to have much the same issue. A human with a modern Claude instance might be able to write code 100x faster than without, but probably less than 2x as fast at choosing a birthday present for a friend.
Ideally you want to integrate over... something to do with the set of all tasks. But it's hard to say what that something would be, let alone how you're going to meaningfully integrate it.
To make outcome-based decisions, you have to decide on the period in which you're considering them. Considering any given period costs non-0 resources (reductio ad absurdum: in practice, considering all possible future timelines would cost infinite resources, so we presumably agree on the principle that excluding some from consideration is not only reasonable but necessary).
I think it's a reasonable position to believe that if something can't be empirically validated then it at least needs exceptionally strong conceptual justifications to inform such decisions.
This cuts both ways, so if the argument of AI2027 is 'we shouldn't dismiss this outcome out of hand' then it's a reasonable position (although I find Titotal's longer backcasting an interesting counterweight, and it prompted me to wonder about a good way to backcast still further). If the argument is that AI safety researchers should meaningfully update towards shorter timelines based on the original essay or that we should move a high proportion of the global or altruistic economy towards event planning for AGI in 2027 - which seems to be what the authors are de facto pushing for - that seems much less defensible.
And I worry that they'll be fodder for views like Aschenbrenner's, and used to justify further undermining US-China relations and increasing the risk of great power conflict or nuclear war, both of which seems to me like more probable events in the next decade than AGI takeover.
Suggested hiring practice tweak
There are typically two ways for organisations of running hiring rounds: deadlined, in which job applications are no longer processed after a publicised date, and rolling in which the organisation will keep allowing submissions until they've found someone they want.
The upside of a deadline is both to an applicant that they know they're not wasting their time on a job that's 99% assigned, and to the organisation, which doesn't have to delay giving an answer to an adequate candidate on the grounds that a potentially better one submits when you're most of the way through the hiring process, and incentivises people to apply slightly earlier than they would have.
The downsides are basically the complement. The individual doesn't get to go for a job that they've just missed and would be really suited to, and the org doesn't get to see as large a pool of applicants.
It occurred to me that an org might be able to get some of the best of both by explicitly giving a mostly-deadline, after which they will explicitly downweight new applications. So if you see the mostly-deadline in time, you're still incentivised to get your application in by the date given, and if it's passed you should rationally apply if and only if you think there's a good chance you're an exceptional fit..
One of the problems with AI benchmarks is that they can't effectively be backcast more than a couple of years. This prompted me to wonder if a more empirical benchmark might be something like 'Ability of a human in conjunction with the best technology available at time t'.
For now at least, humans are still necessary to have in the loop, so this should in principle be at least as good as coding benchmarks for gauging where we are now. When/if humans become irrelevant, it should still work - 'AI capability + basically nothing' = 'AI capability'. And looking back, it gives a much bigger reference class for forecasting future trends, allowing us to compare e.g.
etc.
Thoughts?
Fwiw I commented on Thorstad's linkpost for the paper when he first posted about it here. My impression is that he's broadly sympathetic to my claim about multiplanetary resilience, but either doesn't believe we'll get that far or thinks that the AI counterconsideration dominates it.
In this light, I think that the claim that annual x-risk being lower than 1/(10^-9) being 'implausible' is much too strong if it's being used to undermine EV reasoning. Like I said - if we become interstellar and no universe-ending doomsday technologies exist, then multiplicativity of risk gets you there pretty fast. If each planet has, say 1/(10^5) annual chance of extinction, then n planets have 1/(10^(5^n)) chance of all independently going extinct in a given year. For n=2 that's already one in ten billion.
Obviously there's a) a much higher chance that they could go extinct in different years and b) that they could go all extinct in any given period from non-independent events such as war. But even so, it's hard to believe that increasing k, say to double digits, doesn't rapidly outweigh such considerations, especially given that an advanced civilisation could probably create new self-sustaining settlements in a matter of years.
I feel it is highly speculative on the difficulties of making comebacks and on the likelihood of extreme climate change
I don't understand how you think climate change is more speculative than AI risk. I think it's reasonable to have higher credence in human extinction from the latter, but those scenarios are entirely speculative. Extreme climate change is possible if a couple of parameters turn out to have been mismeasured.
As for the probability of making comebacks, I'd like to write a post about this, but the narrative goes something like this:
If we plug in k=0.001, which seems to be a vaguely representative estimate among x-risk experts, then in 1945 we would have had an 85% chance, today we would have a 92% chance, after one backslide we would have optimistically 73%, pessimistically 20%, and after Backslide Two we would have optimistically 53%, pessimistically basically 0.
We can roughly convert these to units of 'extinction' by dividing the loss of probability by our current prospects. So going to probability 53%, would be losing 32% of our current prospects, which is 32%/85% as bad in the long term as extinction.
This is missing a lot of nuance, obviously, which I've written about in this sequence, so we certainly shouldn't take these numbers very seriously. But I think they overall paint a pretty reasonable picture of a 'minor' catastrophe being, in long-run expectation and aside from any short-term suffering or change in human morality, perhaps in the range of 15-75% as bad as extinction. Lots of room for discussing particulars, but not something we should dismiss as extinction being 'much worse' than - and in particular, not sufficiently lower that we can in practice afford to ignore the relative probabilities of extinction vs lesser global catastrophe.
Thanks for the write-up. I'm broadly sympathetic to a lot of these criticisms tbh, despite not being very left-leaning. A couple of points you relate I think are importantly false:
(Thorstad's claim that) there’s no empirical basis for believing existential risk will drop to near-zero after our current, uniquely dangerous period before achieving long-term stability.
I don't know about 'empirical', but there's a simple mathematical basis for imagining it dropping to near zero in a sufficiently advanced future where we have multiple self-sustaining and hermetically independent settlements e.g. (though not necessarily) on different planets. Then even if you assume disasters befalling one aren't independent, you have to believe they're extremely correlated for this not to net out to extremely high civilisational resilience as you get to double digit settlements. That level of correlation is possible if it turns out to be possible e.g. to trigger a false vacuum decay - in which case Thorstad is right - or if a hostile AGI could wipe out everything before it - though that probability will surely either be realised or drop close to 0 within a few centuries.
If you accept the concept of Existential Risk and give them any credence, it logically follows that any such risk is much worse than any other horrible, terrible, undesirable one that does not lead to human extinction.
It doesn't, and I wish the EA movement would move away from this unestablished claim. Specifically, one must have some difference in credence between achieving whatever longterm future one desires given no 'minor' catastrophe and achieving it given at least one. That credence differential is, to a first approximation, the fraction representing how much of '1 extinction' your minor catastrophe is. Assuming we're reasonably ambitious in our long term goals (e.g., per above, developing a multiplanetary or interstellar civilisation), it seems crazy to me to suppose that fraction should be less than 1/10. I suspect it should be substantially higher, since on restart we would have to survive a high risk in time-of-perils-2 while proceeding to the safe end state much slower, given the depletion of fossil fuels and other key resources.
If we think a restart is >= 1/10x as bad as extinction then we have to ask serious questions about whether it's >= 10x as likely. I think it's at least defensible to claim that e.g. extreme climate change is 10x as likely as an AI destroying literally all humanity.
Hi Eitan, you're very welcome to! No need to book or anything - you can just show up, find a suitable area and run the event if you want, as long as it's comfortably under the space capacity (currently 60) :)
If you want map-editing privileges, or just for me to show you around and give a few pointers, feel free to DM me on here - or just log onto the Gather Town and send me a message if I'm around, which I usually am. If I'm not physically present at the time it's still probably the fastest way to reach me.
There are many ways to reduce existential risk. I don't see any good reason to think that reducing small chances of extinction events is better EV than reducing higher chances of smaller catastrophes, or even just building human capacity in preferentially non-destructive way. The arguments that we should focus on extinction have always boiled down to 'it's simpler to think about'.
(Deleted my lazy comment to give more colour)
Neither agree nor disagree - I think the question is malformed, and both 'sides' have extremely undesirable properties. Moral realism's failings are well documented in the discussion here, and well parodied as being 'spooky' or just wishful thinking. But moral antirealism is ultimately a doctrine of conflict - if reason has no place in motivational discussion, then all that's left for me to get my way from you is threats, emotional manipulation, misinformation and, if need be, actual violence. Any antirealist who denies this as the implication of their position is kidding themselves (or deliberately supplying misinformation).
So I advocate for a third position.
I think the central problem with this debate is that the word 'objective' here has no coherent referent (except when people use it for silly examples, like referring to instructions etched into the universe somewhere). And a noncoherent referent can neither be coherently asserted nor denied.
To paraphrase Douglas Adams, if we don't know what the question is, we can't hope to find an understandable answer.
I think it's useful to compare moral philosophy to applied maths or physics, in that while there are still open debates about whether mathematical Platonism (approximately, objectivity in maths) is correct, most people think it isn't (or, rather, that it's incoherently defined) - and yet most people still think well-reasoned maths is essential to our interactions with the world. Perhaps the same could be true of morality.
One counterpoint might be that unlike maths, morality is dispensable - you can seemingly do pretty well in life by acting as though it doesn't exist (arguably better). But I think this is true only if you focus exclusively on the limited domain of morality that deals with 'spooky' properties and incoherent referents.
A much more fruitful approach to the discussion, IMO, is to start by looking at the much broader question of motivation, aka the cause of Agent A taking some action A1. Motivation has various salient properties:
For example, many of us might choose to modify our motivations so that we e.g.:
I would argue that some - but not all - of these modifications would be close to or actually universal. I would also argue that some of those that weren't universal for early self-modifications might still be states that iterated self-moderators would gravitate towards.
For example, becoming more 'intelligent' through patient thought might cause us to focus a) more on happiness itself than instrumental pathways to happiness like interior design, and b) to recognise the lack of a fundamental distinction between our 'future self' and 'other people', and so tend more towards willingness to help out the latter.
At this point I'm in danger of aligning hedonistic/valence utilitarianism to this process, but you don't have to agree with the previous paragraph to accept that some motivations would be more universal, or at least greater 'attractors' than others while disagreeing on the particulars.
However it's not a coincidence that thinking about 'morality' like this leads us towards some views more than others. Part of the appeal of this way of thinking is that it offers the prospect of 'correct' answers to moral philosophy, or at least shows that some are incorrect - in a comparable sense to the (in)correctness we find in maths.
So we can think of this process as revealing something analogous to 'consistency' in maths. It's not (or not obviously) the same concept, since it's hard to say there's something formally 'inconsistent' in e.g. wanting to procrastinate more, or to be unhappier. Yet wanting such things is contrary in nature to something that for most or all of us resembles an 'axiom' - the drive to e.g. avoid extreme pain and generally to make our lives go better.
If we can identify this or these 'motivational axiom(s)', or even just find a reasonable working definition of them, this means we are in a similar position as we are in applied maths: without ever showing that something is 'objectively wrong' - whatever that could mean - we can show that some conclusions are so contrary to our nature - our 'nature' being 'the axioms we cannot avoid accepting as we function as conscious, decision-making, motivated beings' that we can exclude them from serious consideration.
This raises the question of which and how many moral conclusions are left when we've excluded all those ruled out by our axioms. I suspect and hope that the answer is 'one' (you might guess approximately which from the rest of this message), but that's a much more ambitious argument than I want to make here. Here I just want to claim that this is a better way of thinking about metaethical questions than the alternatives.
I've had to rush through this comment without clearly distinguishing theses, but I'm making 2.5 core claims here:
I don't know if these positions already exists in moral philosophy - I'd be very surprised if I'm the first to advocate them, but fwiw I didn't find anything matching them when I looked a few years ago (though my search was hardly exhaustive). For want of distinguishing it from the undesirable properties of both traditional sets of views and with reference to the previous paragraph, I refer to it as 'moral exclusivism'.
Obviously you could define exclusivism into being either antirealism or realism, but IMO that's missing its ability to capture the intuition behind both without necessitating the baggage of either.