If you believe that evidence that does not withstand scrutiny (that is, evidence that does not meet basic quality standards, contains major methodological errors, is statistically insignificant, is based on fallacious reasoning, or any other reason why the evidence is scrutinized) is evidence that we should use, then you are advocating for pseudoscience. The expected value of benefits based on such evidence is near zero.
I'm sorry if criticizing pseudoscience is frustrating, but that kind of thinking has no place in rational decision-making.
Your summary of the quoted text is inaccurate. You claim that this is an arguement that evidence is not something that in inherently required, but the quote says no such thing. Instead, it references "a large body of scientific evidence" and "stronger evidence" vs "limited evidence". This quote essential makes the same arguement I do above. How can we square the differences in these interpretations?
The quoted text implies that the evidence would not be sufficient under normal circumstances, hence the "evidence dilemma". If the amount of evidence was sufficient, there would be no question about what is the correct action. While the text washes its hands from making the actual decision to rely on insufficient evidence, it clearly considers this as a serious possibility, which is not something that I believe anyone should advocate.
You are splitting hairs about the difference between "no evidence" and "limited evidence". The report considers a multitude of different AI risks, some of which have more evidence and some of which have less. What is important is that they bring up the idea that policy should be made without proper evidence.
The "scientific" phrasing frustrates me because I feel like it is often used to suggest high rigor without actually demonstrating that such rigor actually applies to a give situation, and because I feel like it is used to exclude certain categories of evidence when those categories are relevant, even if they are less strong compared to other kinds of evidence. I think we should weigh all relevant evidence, not exclude cetain pieces because they aren't scientific enough.
Again, you are attacking me because of the word "scientific" instead of attacking my arguments. As I have many, many times said, studies should be weighted based on their content and the scrutiny it receives. To oppose the word "science" just because of the word itself is silly. Your idea that works are arbitrarily sorted to "scientific" and "non-scientific" based on "style points" instead of assessing their merits is just wrong and a straw-man argument.
I don't think my argument leads to this conclusion. I'm just saying that AI risk has some evidence behind it, even if it isn't the most rigorous evidence! That's why I'm being such a stickler about this! If it were true that AI risk has actually zero evidence then of course I wouldn't buy it! But I don't think there actually is zero evidence even if AI risk advocates sometimes overestimate the strength of the evidence.
Where have I ever claimed that there is no evidence worth considering? In the start of my post, I write:
What unites many of these statements is the thorough lack of any evidence.
There are some studies that are rigorously conducted that provide some meager evidence. Not really enough to justify any EA intervention. But instead of referring to these studies, people use stuff like narrative arguments and ad-hoc models, which have approximately zero evidential value. That is the point of my post.
What about these statements makes you think that I don't believe uncertainty affects decision making? It seems like I say that it does affect decision making in my comment.
If you believe this, I don't understand where you disagree with me, other than you weird opposition to the word "scientific".
I think many people in the EA community in fact have this view. Do you think those people should still prefer GHD because AI is off limits due to not being "scientific"? I would consider this to be "for style points", and disagree with this approach.
It seems you have an issue with the word "scientific" and are constructing a straw-man argument around it. This has nothing to do with "style points". As I have already explained, by scientific I only refer to high-quality studies that withstand scrutiny. If a study doesn't, then it's value as evidence is heavily discounted, as the probability of the conclusions of the study being right despite methodological errors, failures to replicate it, etc. is lower than if the study does not have these issues. If a study hasn't been scrutinized at all, it is likely bad, because the amount of bad research is greater than the amount of good research (for example, if we look at the rejection rates of journals/conferences), and lack of scrutiny implies lack of credibility as researchers do not take the study seriously enough to scrutinize it.
The conclusion that cause A is preferable to cause B involves the uncertainty about both causes. Even if cause A has more rigorous evidence than cause B, that doesn't mean the conclusion that benefits(A) > benefits(B) is similarly rigorous.
Yet E[benefits(A)] > E[benefits(B)] is a rigorous conclusion, because the uncertainty can be factored into the expected value.
Can I ask why? Do you think AI won't be a "big deal" in the reasonably near future?
The International AI Safety Report lists many realistic threats (the first one of those is deepfakes, to give an example). Studying and regulating these things is nice, but they are not effective interventions in terms of lives saved etc.
I'm really at a loss here. If your argument is taken literally, I can convince you to fund anything, since I can give you highly uncertain arguments for almost everything. I cannot believe this is really your stance. You must agree with me that uncertainty affects decision making. It only seems that the word "scientific" bothers you for some reason, which I cannot really understand either. Do you believe that methodological errors are not important? That statistical significance is not required? That replicability does matter? To object to the idea that these issues cause uncertainty is absurd.
But the reason is that using evidence will improve the quality of the decision, not for "style points" so-to-speak.
No one has ever claimed that evidence should be collected for "style points".
We can't just magically have more rigorous evidence, we have to make decisions and allocate resources in order to get that evidence.
Fortunately, AI research has a plenty of funding right now (without any EA money), so in principle getting evidence should not be an issue. I am not against research, I am a proponent of it.
Doing nothing and sticking with the status quo is also a decision that can have important consequences. [...] If we lack scientific evidence, then that policy decision won't be evidence-based even if we do nothing.
Sticking with status quo is often the best decision. When deciding how to use funds efficiently, you have to consider the opportunity cost of using those funds to something that has a certain positive benefit. And that alternative action is evidence-based. Thus, the dichotomy between "acting on AI without evidence" and "doing nothing without evidence" is false, the options are actually "acting on AI without evidence" and "acting on another cause area with evidence".
If the estimated value of using the money for AI is below the benefit of the alternative, we should not use it for AI and instead stick to the status quo on that matter. Most AI interventions are not tractable, and due to this their actual utility might even be negative.
Do you think you would be more open to some types of AI policy if the case for those policies didn't rely on the emergence of "AGI"?
Yes, there are several types of AI policy I support. However, I don't think they are important cause areas for EA.
While motivated reasoning is certainly something to look out for, the substance of the argument should also be taken into account. I believe that the main point of this post, that Yudkowsky and Soares's book is full of narrative arguments and unfalsifiable hypotheses mostly unsupported by references to external evidence, is obviously true. As you yourself say, OP's arguments are reasonable. On that background, this kind of attack from you seems unjustified, and I'd like to hear what parts/viewpoints/narratives/conclusions of the post are motivated reasoning in your estimation.
I do agree that motivated reasoning is common with the proponents of AI adoption. As an example, I think the white paper Sparks of Artificial General Intelligence: Early experiments with GPT-4 by Microsoft is clearly a piece of advertising masquerading as a scientific paper. Microsoft has a lot to benefit from the commercial success of its partner company OpenAI, and the conclusions it suggests are almost certainly colored by this. Same could be said about many of OpenAI's own white papers. But this does not mean that the examples or experiments they showcase are wrong per se (even if cherry-picked), or that there is no real information in them. Their results merely need to be read with the skeptical lenses.
Thank you for your criticism.
The point of this post is not to address specific issues of AI 2027, but narrative arguments and ad-hoc models in general. AI 2027 contains both, and thus exemplifies them well. By choosing a model not based on reference literature, and thus established consensus, the authors risk incorporating their own biases and assumptions into the model. This risk is present in all ad-hoc, models, not just AI 2027, which is why all ad-hoc models should be met with strong skepticism until supported by wider consensus.
You make a good observation that the criticisms of AI 2027 do not form an "academic consensus" either. This is because AI 2027 itself is not an academic publication, nor has it been a topic of any major academic discussion. It is possible for non-academic works to contain valuable contributions – as I say in an above comment, peer-review is not magic. Furthermore, even new and original models that were "ad-hoc" when first published can be good. However, the lack of wider adoption of this model suggests scientists suggests it is not viewed as a solid foundation to build on. But this lack does not, of course, explain why this is the case, so I have included links to commentary by other people in the EA community that describe the concrete issues in their model. Again, these issues are not the main point of my post, and are only provided for the reader's convenience.
You also don't actually point to any specific or characteristic issues from those 2 blog posts in that paragraph, instead appealing to heuristics, concept handles, and accusations. I would honestly describe that paragraph as a narrative argument.
A narrative argument presents the argument in a form of a story, like AI 2027's science fiction scenario or the parables in Yudkowsky and Soares's book. I'm not sure what part of my text you characterize as a story, could you elaborate on that?
In my post, I referred to the concept of "evidence-based policy making". In this context, evidence refers specifically to rigorous, scientific evidence, as opposed to intuitions, unsubstantiated beliefs and anecdotes. Scientific evidence, as I said, referring to high-quality studies corroborated by other studies. And, as I emphasize the point of evidence mismatch, using a study that concludes something as evidence for something else is a fallacy.
The idea that current progress in AI can be taken as evidence for AGI, which in some sense is the most extreme progress in AI imaginable, incomparable to current progress, is an extraordinary claim that requires extraordinary evidence. People arguing for this are mostly basing their argument on their intuition and guesses, yet they often demand drastic actions over their beliefs. We, as the EA community, should make decisions based on evidence. Currently, people are providing substantial funding to the "AI cause" based on arguments that do not meet the bar of evidence-based policy, and I think that is something that should and must be criticized.
The purpose of peer-review is to make sure that the publication has no obvious errors and meets some basic standards of publication. I have been a peer-reviewer myself, and what I have seen is that the general quality of stuff sent to computer science conferences is low. Peer-review removes the most blatantly bad papers. To a layperson who doesn't know the field and who cannot judge the quality of studies, it is safest to stick to peer-reviewed papers.
But it has never been suggested that peer-review somehow magically separates good evidence from bad evidence. In my work, I often refer to arXiv papers that are not peer-reviewed, but which I believe are methodologically sound and present valuable contributions. On the other hand, I know that conferences and journals often publish papers even with grave methodological errors or lack of statistical understanding.
Ultimately, the real test of a study is the criticism it receives after its publication, not peer-review. If researchers in the field think that the study is good and build their research on it, it is much more credible evidence than a study that is disproved by studies that come after it. One should never rely on a single study alone.
In case of METR's study, their methodological errors do not preclude that their conclusions are correct. I think what they are trying to do is interesting and worth of research. I'd love to see other researchers attempt to replicate the study while improving on methodology, and if they succeed in having similar results, providing evidence for METR's conclusions. So far, we haven't seen this (or at least I am not aware of). Although even in that case, the problem of evidence mismatch stays, and we should be careful not to draw those conclusions to far.
Your bet proposal talks about the Metaculus question "resolving non-ambiguously". Since the question is about the duration of time between the "weak AGI" and "superintelligent AI", it is possible that it cannot be resolved "non-ambiguously" due to the definition of weak AGI being ambiguous even if SAI is invented. This might discourage people who believe in short SAI timelines from accepting the bet.
Your argument is very similar to creationist and other pseudoscientific/conspiracy theory-style arguments.
A creationist might argue that the existence of life, humanity, and other complex phenomena is "evidence" for intelligent design. If we allow this to count as "limited" evidence (or whatever term we choose to use), it is possible to follow through a Pascal's wager-style argument and posit that this "evidence", even if it has high uncertainty, is enough to merit an action.
It is always possible to come up with "evidence" for any claim. In evidence-based decision making, we must set a bar for evidence. Otherwise, the word "evidence" would lose it's meaning, and we'd be wasting our resources considering every piece of knowledge there exists as "evidence".
If the studies withstand scrutiny, then they are high-quality studies. Of course, it is possible that the study has multiple conclusions, and some of them are undermined by scrutiny and some are not, or that there are errors that do not undermine the conclusions. These studies can of course be used as evidence. I used "high-quality" as the opposite of "low-quality", and splitting hair about "moderate-quality" is uninteresting.
This is a good basis when, e.g., funding new research, as confirming and replicating recent studies is an important part of science. In this case, it doesn't matter that much if the study's conclusions end up being true or false, as confirming either way is valuable. Researching interesting things is good, and even bad studies are evidence that the topic is interesting. But they are not evidence that should be used for other kind of decision-making.
You are again splitting a hair about the meanings of words. The important thing is that they are advocating for making decisions without sufficient evidence, which is something I oppose. Their report is long and contains many AI risks, some of which (like deepfakes) have high-quality studies behind them, while others (like X-risks) do not. As a whole, the report "has some evidence" that there are risks associated with AI. So they talk about "limited evidence". What is important is that they imply this "limited evidence" is not sufficient for making decisions.
Splitting a hair. You can call your evidence limited evidence if you want. It won't get you a free pass that your argument should be considered. If it has too much uncertainty or doesn't withstand scrutiny, it shouldn't be taken in as evidence. Otherwise we end up in the creationist situation.