The theoretical section seems weak, and basically just sidesteps all sorts of arguments like this. And theoretical arguments are a key crux imo
And their main theoretical argument doesn’t seem right either. They say "evolution only touches DNA whereas gradient descent touches the whole network”… but the way we ‘touch’ the whole network is via a simple loss function + default weight update rule (which would easily fit into DNA)! You could just as easily say DNA does this, by defining the algorithm/architecture/update process that creates/updates the brain.
They then imply we have much more fine-grained control than DNA/evolution could, but in fact we our current method implies very little fine control. Like yes, in principle, we have access to all the weights. If we understood more, maybe we could have an extremely-complicated loss function and sophisticated update process, and then DNA couldn’t code for something analogously sophisticated. But that’s not remotely what we do, and best practice is essentially "define some loss functions on vibes and see what works”.
If anything, it seems like DNA’s training algorithm exerts more fine-grain control & has a more complicated ‘loss function/update rule’ than gradient descent.
(And evolution had far more time to try out adaptions to novel behavior at ‘near-human capability’ and still failed on inner alignment, albeit unclear how to compare "tons of random random tries" v.s. "a few vibes-based tries")
Not this person, but many AI risk arguments are necessarily logical rather than empirical - there are good reasons to believe the relevant behaviors won't appear, or be trivially easy to counter (at least re: harmful outputs), until you have very capable systems.
Like, if I can construct a deceptive response-to-training strategy (but current models can't), that's enough evidence to be concerned future superhuman models might do similar deceptive alignment. Other concerns like inner optimizers (e.g. humans stopped being kid-maxxers at high capability, because our proxy decoupled from evolution's target) might not show up, or change in character, as models become less limited. And even when you can demonstrate the behavior empirically, people dismiss it as overly-induced or a toy environment - which was the whole point, just to show plausibility not prove it.
More fundamentally: If I argue that a future thing logically implies certain risks arise, responding with "there's no empirical evidence" is silly. Logical chains and structural arguments are still valid epistemic tools.
I think a major blocker to this kind of thing is that people feel like 'it's not a real career' and worry what would happen if they tried to leave, or just didn't see success in their fieldbuilding startup.
IMO this is very incorrect above a certain threshold of ability, especially for people already working in EA or AIS technical/policy/generalist roles. But it would be very helpful if your team could offer some stronger guarantees to these people!
Here's one basic idea (common and probably far from optimal): 'failed-fieldbuilding-attempt insurance' - For people you think should do this, you agree to give a 5 year stipend of 2-5k/month if they try & fail & can't find another decent job. Likely you wouldn't even have to pay this out much, because most people that you're excited to see try fieldbuilding are IMO incorrect about not being able to transition back. So in practice, you'd give them the stipend for a few months before they found a new job. And many of them would actually succeed & you'd pay nothing!
I think this is a good point about precise phrasing, but I think the argument still basically goes through that insects should be treated as extremely important in expectation. You can eliminate the two envelope problem by either make the numbers fixed/concrete, or you can use conditional probabilities.
Intuitively: suppose you thought there was an 50% chance you prevent a holocaust-level (10,000,000 lives) event happening to humans, but a 50% chance that this intervention would be completely useless. Alternately, you could do a normal intervention to save 1000 lives.
You could say "the normal intervention as a 50% chance to be ~infinitely more valuable than the holocaust-prevention thing"
But it's obvious you should do the holocaust prevention thing. Because here it's more obvious what the comparative/conditional stakes are. In one possible world, the 'world you can affect' is vastly larger, and that world should be prioritized.
Caveats: ignoring longtermist arguments, and the probability insects matter is << 50% imo
I don't know what you mean? You can look at existing interventions that primarily help very young people (neonatal or childhood vitamin supplementation) v.s. a comparably-effective interventions that target adults or older people (e.g. cash grants, schistosomiasis)
There are multiple GiveWell charities in both categories, so this is just saying you should weight towards the ones that target older folks by maybe a factor of 2x or more, v.s. what givewell says (they assume the world won't change much)
Some fraction of people who don't work on AI risk cite "wanting to have more certainty of impact"as their main reason. But I think many of them are running the same risk anyway: namely, that what they do won't matter because transformative AI will make their work irrelevant, or dramatically lower value.
This is especially obvious if they work on anything that primarily returns value after a number of years. E.g. building an academic career or any career where most impact is realized later, working toward policy changes, some movement-building things, etc.
But also applies somewhat to things like nutrition or vaccination or even preventing deaths, where most value is realized later (by having better life outcomes, or living an extra 50 years). Though this category does still have certainty of impact, just the amount of impact might be cut by whatever fraction of worlds are upended in some way by AI. And this might affect what they should prioritize... e.g. they should prefer saving old lives over young ones, if the interventions are pretty-close on naive effectiveness measures.
Feels like there's some line where your numbers are getting so tiny and speculative that many other considerations start dominating, like "are your numbers actually right?" E.g. I'd be pretty skeptical of many proposed ".000001% of huge number" interventions (especially skeptical on the on the .000001% side).
In practice, the line could be where "are your numbers actually right" starts becoming the dominant consideration. At that point, proving your numbers are plausible is the main challenge that needs to be overcome - and is honestly where I suspect most people's anti-low-probabilities intuitions come from in the first place.
Very cool!
random thought: could include some of Yoshua Bengio's or Geoffrey Hinton's writings/talks on AI risks concerns in week 10 (& could include Lecun for counterpoint to get all 3), since they're very-well cited academics & Turing Award Winners for deep learning
I haven't looked through their writings/talks to find the most directly relevant, but some examples: https://yoshuabengio.org/2023/05/22/how-rogue-ais-may-arise/ https://yoshuabengio.org/2023/06/24/faq-on-catastrophic-ai-risks/
How are you operationalizing this? No matter the odds, it doesn't make sense to make bets of the form
Maybe would be open to "you transfer 1k to me now (2026), I give you interest-indexed 2k in 2035" or whatever odds make sense. Though I understand you'd need to trust me and/or some trusted way to make sure I give it back
I also don't think short timelines is actually cruxy to my argument above, which is mainly about their argument being wrong + pointing at other arguments for misalignment, not timelines.