TFD

Your analysis covers several perspectives on this phenomenon, if we focus on the "actual performance" perspective, this is pretty similar to multi-armed bandits. One pattern that I think is present in strategies for these types of problems is the idea of spreading out actions across the different possibilities (explore vs exploit and all that). It wouldn't necessarily make sense to commit to one "arm" (or cause) early on when information is low. This "spreading out" across options is one way of dealing with uncertainty.

A similar idea comes up in another potential anology for cause prioritization, financial investing. We can think about optimizing a portfolio and its allocation to achieve good returns relative to risk, rather than trying to pick the single highest return asset. Thus we get concepts like disversification.

I find this stock-picking analogy helpful for thinking about how "neglectedness" is often treated in practice. I've often found myself skeptical of arguments for and from neglectedness, and I feel the way it is applied in practice doesn't really align with the classic "diminishing returns" conception. I think the way neglectedness is treated in practice ends up being more like how an investor with a high risk tolerance might view a risky asset. Riskier assets are expected to have higher returns, investors with lower risk tolerance would staturate low-risk/high-return options quickly, leaving risker investments "neglected". Thus an investor with high risk tolerance can find good opportunities that would be unappealing to other less risk tolerant investors by going to higher risk assets. I think this captures the spirit of what "neglected" cause areas have often looked like in EA, more speculative but where some EAs have a strong feeling that they caould have outsized impact.

If I can read between the lines a bit, under this anology EA pivoting more into AI is kind of like an investor who wants higher returns putting more of their portfolio in small cap growth stocks that are risker but which the investor thinks will result in higher return. One downside of this is decreased diversification. Another possible option would be to hold a more diversified portfolio but use leverage.

In-model vs Out-of-model robustness

The problem is not limited to cases with trials and noisy statistics, because the error does not have to arise from random chance. Problems with assumptions, bad guesses, even math errors will equally get you cursed. If anything, I would expect causes that lack empirical experimental data to be more cursed, not less.

I think this gets at a distinction that is worth calling out, in-model vs out-of-model robustness.

In my experience with cost-benefit analysis, both reading EA related ones and in industry, it is fairly common to propose a "median" scenario and also a "pessimistic" scenario, and provide estimates for these cases. The point is usually that since even the pessimistic scenario looks good, the analysis shows that the proposed intervention is robustly beneficial. This has a two-fold problem:

First, usually the reason to think that the "pessimistic" scenario is 'pessimistic is just that it uses parameter values that reduce the estimated benefit below the "median" scenario. It's unclear sometimes why that means the estimate is robustly lower than the actual benefit. This is the in-model robustness.

Despite the fact that I think this is an issue, sometimes it may be perceived as (or actually be) a somewhat unfair critique. All models are wrong, we have to use what we have to make estimates. This can result in polarized views of what an estimate shows. For a person who likes the intervention and has a gut feeling it is good, the "median" estimate makes a ton of sense and this seems like a very reasonable approach. For a skeptic, it seems prone to over-estimation for the reasons you highlight in the post. Moving the parameters so that your estimate is 25% lower doesn't turn garbage into non-garbage.

However, there is another source of error lurking in the background. What about costs that you haven't included? The potential for the intervention to backfire that isn't considered in any scenario? The hidden assumption that hasn't been tested in the "pessimistic" scenario? This is out-of-model robustness.

I think the polarization when it comes to in-model robustness causes proponents or fans of an idea or intervention to over-estimate robustness even when in-model robustness is high, because they implicitly credit the (perceived) in-model robustness to the out-of-model robustness.

In my view, the whole "rule high stakes in, not out" idea in practice will result in systematically doing this a lot, which I think makes it a bad heuristic for approaching these types of situations. One way to think about this is it encourages us to focus on specific high-volatility "assets" and thus lacks diversification.

Beware of non-evidence-based argumentation

TFD15d1

In one of my comments above, I say this:

I will caveat this by saying that in my opinion it makes sense for estimation purposes to discount or shrink estimates of highly uncertainty quantities, which I think many advocates of AI as a cause fail to do and can be fairly criticized for. But the issue is a quantitative one, and so can come out either way. I think there is a difference between saying that we should heavily shrink estimates related to AI due to their uncertainty and lower quality evidence, vs saying that they lack any evidence whatsoever.

I feel like my position is consistent with what you have said, I just view this as part of the estimation process. When I say "E[benefits(A)] > E[benefits(B)]" I am assuming these are your best all-inclusive estimates including regularization/discounting/shrinking of highly variable quantities. In fact I think its also fine to use things other than expected value or in general use approaches that are more robust to outliers/high-variance causes. As I say in the above quote, I also think it is a completely reasonable criticism of AI risk advocates that they fail to do this reasonably often.

If you properly account for uncertainty, you should pick the certain cause over the uncertain one even if a naive EV calculation says otherwise

This is sometimes correct, but the math could come out that the highly uncertain cause area is preferable after adjustment. Do you agree with this? That's really the only point I'm trying to make!

I don't think the difference here comes down to one side which is scientific and rigorous and loves truth against another that is bias and shoddy and just wants to sneak there policies through in an underhanded manner with no consideration for evidence or science. Analyzing these things is messy, and different people interpret evidence in different ways or weigh different factors differently. To me this is normal and expected.

I'd be very interested to read your explainer, it sounds like it addresses a valid concern with arguments for AI risk that I also share.

Beware of non-evidence-based argumentation

TFD16d1

If you believe that evidence that does not withstand scrutiny (that is, evidence that does not meet basic quality standards, contains major methodological errors, is statistically insignificant, is based on fallacious reasoning, or any other reason why the evidence is scrutinized) is evidence that we should use, then you are advocating for pseudoscience. The expected value of benefits based on such evidence is near zero.

I don't think evidence which is based on something other than "high-quality studies that withstand scrutiny" is pseudoscience. You could have moderate-quality studies that withstand scutiny, you could have preliminary studies which are suggestive but which haven't been around long enough for scrutiny to percolate up. I don't think these things have near zero evidential value.

This is my issue with your use of the term "scientific evidence" and related concepts. Its role in the argument is mostly rhetorical, having the effective of charcterizing other arguments or positions as not worthy of consideration without engaging with the messy question of what value various pieces of evidence actually have. It causes confusion and results in you equivocating about what counts as "evidence".

My view, and where we seem to disagree, is that I think there are types of evidence other than "high-quality studies that withstand scrutiny" and pseudoscience. Look, I agree that if something has basically zero evidential value we can reasonably round that off to zero. But "limited evidence" isn't the same as near-zero evidence. I think there is a catgory of evidence between pseudoscience/near-zero evidence and "high-quality studies that withstand scrutiny". When we don't have access to the highest quality evidence, it is acceptable in my view to make policy based on the best evidence that we have, including if it is in that imtermediate category. This is the same argument made in the quote from the report.

The quoted text implies that the evidence would not be sufficient under normal circumstances

This is exactly what I mean when I say this approach results in you equivocating. In your OP, you explicitly claim that this quote argues that evidence is not something that is needed. You clarify in your comments with me and in a clarification at the top of your post that only "high-quality studies that withstand scrutiny" really count as evidence as you use the term. The fact that you are using the word "evidence" in this way is causing you to misinterpret the quoted statement. The quote is saying that even if we don't have the ideal, high-quality evidence that we would like and that might be need for us to be highly confident and establish a strong consensus that in situations of uncertainty it is acceptable to make policy based on more limited or moderate evidence. I share this view and think it is reasonable nad not pseudoscientific or somehow a claim that evidence of some kind isn't required.

If the amount of evidence was sufficient, there would be no question about what is the correct action.

Uncetainty exists! You can be in a situation where the correct decision isn't clear because the available information isn't ideal. This is extremely common in real-world decision making. The entire point of this quote and my own comments is that when these situations arise the reasonable thing to do is to make the best possible decision with the information you have (which might involve trying to get more information) rather than declaring some policies off the table because they don't have the highest quailty evidence supporting them. Making decisions under uncertainty means making decisions based on limited evidence sometimes.

Beware of non-evidence-based argumentation

TFD16d2

Where have I ever claimed that there is no evidence worth considering?

In your OP, you write:

In this post, I've criticized non-evidence-based arguments, which hangs on the idea that evidence is something that is inherently required. Yet it has become commonplace to claim the opposite. One example of this argument is presented in the International AI Safety Report

You then quote the following:

Given sometimes rapid and unexpected advancements, policymakers will often have to weigh potential benefits and risks of imminent AI advancements without having a large body of scientific evidence available. In doing so, they face a dilemma. On the one hand, pre-emptive risk mitigation measures based on limited evidence might turn out to be ineffective or unnecessary. On the other hand, waiting for stronger evidence of impending risk could leave society unprepared or even make mitigation impossible – for instance if sudden leaps in AI capabilities, and their associated risks, occur.

Your summary of the quoted text is inaccurate. You claim that this is an arguement that evidence is not something that in inherently required, but the quote says no such thing. Instead, it references "a large body of scientific evidence" and "stronger evidence" vs "limited evidence". This quote essential makes the same arguement I do above. How can we square the differences in these interpretations?

In response to me, you write:

In my post, I referred to the concept of "evidence-based policy making". In this context, evidence refers specifically to rigorous, scientific evidence, as opposed to intuitions, unsubstantiated beliefs and anecdotes. Scientific evidence, as I said, referring to high-quality studies corroborated by other studies.

You also have added as a clarfication to your OP:

Clarification (29.1.2026): In this post, I use the term "evidence" in the context of "evidence-based policy making".

So, as used in your post, "evidence" means "rigorous, scientific evidence, as opposed to intuitions, unsubstantiated beliefs and anecdotes". This is why I find your reference to "scientific evidence" frustrating. You draw a distinct between two categories of evidence and claim policy should be based on only one. I disagree, I think policy should be based on all available evidence, including intuition and anecdote ("unsubtantiated belief" obviously seems definitionally not evidence). I also think your argument relies heavily on contrasting with a hypothetical highly rigorous body of evidence that isn't often achieved, which is why I have pointed out what I see as the "messiness" of lots of published scientific research.

The distinction you draw and how you defined "evidence" results in an equivocation. Your caracterization of the quote above only makes sense if you are claiming that AI risk can only claim to be "evidence-based" if is is backed by "high-quality studies that withstand scrutiny". In other words, as I said in one of my comments:

It seems like the core of your argument is saying that there is a high burden of proof that hasn't been met.

So, where do we disagreee? As I say immediately after:

I agree that arguments for short timelines haven't met a high burden of proof but I don't believe that there is such a burden.

I believe that we should compare E[benefits(AI)] with E[benefits(GHD)] and any other possible alternative cause areas, with no area having any specific burden of proof. The quality of the evidence plays out in taking those expectations. Different people may disagree on the results based on their interpretations of the evidence. People might weigh different sources of evidence differently. But there is no specific burden to have "high-quality studies that withstand scrutiny", although this obviously weighs in favor of a cause that does have those studies. I don't think having high quality studies amounts to "style points". What I think would amount to "style points" is if someone concluded that E[benefits(AI)] > E[benefits(GHD)] but went with GHD anyway because they think AI is off limits due to the lack of "high-quality studies that withstand scrutiny" (i.e. if there is a burden of proof where "high-quality studies that withstand scrutiny" are required).

Beware of non-evidence-based argumentation

TFD16d1

It seems you have an issue with the word "scientific" and are constructing a straw-man argument around it.

The "scientific" phrasing frustrates me because I feel like it is often used to suggest high rigor without actually demonstrating that such rigor actually applies to a give situation, and because I feel like it is used to exclude certain categories of evidence when those categories are relevant, even if they are less strong compared to other kinds of evidence. I think we should weigh all relevant evidence, not exclude cetain pieces because they aren't scientific enough.

Yet E[benefits(A)] > E[benefits(B)] is a rigorous conclusion, because the uncertainty can be factored into the expected value.

Yes, but in doing so the uncertainty in both A and B matters, and showing that A is lower variance than B doesn't show that E[benefits(A)] > E[benefits(B)]. Even if benefits(B) are highly uncertain and we know benefits(A) extremely precsiely, it can still be the case that benefits(B) are larger in expectation.

I cannot believe this is really your stance. You must agree with me that uncertainty affects decision making.

In my comment that you are responding to, I say:

The conclusion that cause A is preferable to cause B involves the uncertainty about both causes.

I also say:

I will caveat this by saying that in my opinion it makes sense for estimation purposes to discount or shrink estimates of highly uncertainty quantities

What about these statements makes you think that I don't believe uncertainty affects decision making? It seems like I say that it does affect decision making in my comment.

If stock A very likely has a return in the range of 1-2%, and stock B very likely has a return in the range of 0-10%, do you think stock A must have a better expected return because it has lower uncertainty?

Yes uncertainty matters but it is more complicated than saying that the least uncertain option is always better. Sometimes the option that has less rigorous support is still better in an all-things-considered analysis.

If your argument is taken literally, I can convince you to fund anything, since I can give you highly uncertain arguments for almost everything.

I don't think my argument leads to this conclusion. I'm just saying that AI risk has some evidence behind it, even if it isn't the most rigorous evidence! That's why I'm being such a stickler about this! If it were true that AI risk has actually zero evidence then of course I wouldn't buy it! But I don't think there actually is zero evidence even if AI risk advocates sometimes overestimate the strength of the evidence.

Beware of non-evidence-based argumentation

TFD16d2

Fortunately, AI research has a plenty of funding right now (without any EA money), so in principle getting evidence should not be an issue. I am not against research, I am a proponent of it.

AI certainly has a lot of resources available, but I don't think those resources are primarily being used to understand how AI will impact society. I think policy could push more in this direction. For example, requiring AI companies who train/are training models above a certain compute budget to undergo third-party audits of their training process and models would push towards clarifying some of these issues in my view.

Sticking with status quo is often the best decision. When deciding how to use funds efficiently, you have to consider the opportunity cost of using those funds to something that has a certain positive benefit. And that alternative action is evidence-based. Thus, the dichotomy between "acting on AI without evidence" and "doing nothing without evidence" is false, the options are actually "acting on AI without evidence" and "acting on another cause area with evidence".

The conclusion that cause A is preferable to cause B involves the uncertainty about both causes. Even if cause A has more rigorous evidence than cause B, that doesn't mean the conclusion that benefits(A) > benefits(B) is similarly rigorous.

Lets take AI and global health and development (GHD) as an example. I think it would be reasonable to say that evidence for GHD is much more rigorous and scientific than the evidence for AI. Yet that doesn't mean that the evidence conclusively shows benefits(GHD) > benefits(AI). Lets say that someone believes that the evidence for GHD is scientific and the evidence for AI is not (or at least much less so), but that the overall, all-things-considered best estimate of benefits(AI) are greater than the best estimate of benefits(GHD). I think many people in the EA community in fact have this view. Do you think those people should still prefer GHD because AI is off limits due to not being "scientific"? I would consider this to be "for style points", and disagree with this approach.

I will caveat this by saying that in my opinion it makes sense for estimation purposes to discount or shrink estimates of highly uncertainty quantities, which I think many advocates of AI as a cause fail to do and can be fairly criticized for. But the issue is a quantitative one, and so can come out either way. I think there is a difference between saying that we should heavily shrink estimates related to AI due to their uncertainty and lower quality evidence, vs saying that they lack any evidence whatsoever.

If the estimated value of using the money for AI is below the benefit of the alternative, we should not use it for AI and instead stick to the status quo on that matter.

I agree, but it doesn't follow from one cause being "scientific" while the other isn't that the "scientific" cause area has higher benefits.

Most AI interventions are not tractable, and due to this their actual utility might even be negative.

I actually agree that tractability is (ironically) a strongly neglected factor and many proponents of AI as a cause area ignore or vastly overestimate the tractability of AI interventions, including the very real possibility that they are counterproductive/net-negative. I still think there are worthwhile opportunities but I agree that this is an underappreciated downside of AI as a cause area.

Yes, there are several types of AI policy I support. However, I don't think they are important cause areas for EA.

Can I ask why? Do you think AI won't be a "big deal" in the reasonably near future?

Beware of non-evidence-based argumentation

TFD17d3

It seems like the core of your argument is saying that there is a high burden of proof that hasn't been met. I agree that arguments for short timelines haven't met a high burden of proof but I don't believe that there is such a burden. I will try to explain my reasoning, although I'm not sure if I can do the argument justice in a comment, perhaps I will try to write a post about the issue.

When it comes to policy, I think the goal should be to make good decisions. You don't get any style points for how good your arguments or evidence are if the consequences of your decisions are bad. That doesn't mean we shouldn't use evidence to make decisions, we certainty should. But the reason is that using evidence will improve the quality of the decision, not for "style points" so-to-speak.

Doing nothing and sticking with the status quo is also a decision that can have important consequences. We can't just magically have more rigorous evidence, we have to make decisions and allocate resources in order to get that evidence. That also requires making decisions about the allocation of resources. When we make those decisions, we have to live with the uncetainty that we face, and make the best decision given that uncertainty. If we don't have solid scientific evidence, we still have to make some decision. It isn't optional. Sticking with the status quo is still making a decision. If we lack scientific evidence, then that policy decision won't be evidence-based even if we do nothing. I think we should make the best decision we can given what information we have instead of defaulting to an informal burden of proof. If there is a formal burden of proof, like a burden on one party in a court case or a procedure for how an administrative or legislative body should decide, then in my view that formal procedure establishes what the burden of proof is.

The idea that current progress in AI can be taken as evidence for AGI

Although I believe there should be policy action/changes in response to the risk from AI, I personally don't see the case for this as hinging on the achievement of "AGI". I've described my position as being more concerned about "powerful" AI than "intelligent" AI. I think focusing on "AGI" or how "intelligent" an AI system is or will be often leads to unproductive rabbit holes or definition debates. On the other hand, obviously lots of AI risk advocates do focus on AGI, so I acknowledge it is completely fair game for skeptics to critique this.

Do you think you would be more open to some types of AI policy if the case for those policies didn't rely on the emergence of "AGI"?

Beware of non-evidence-based argumentation

TFD17d3

Generally, the scientific community is not going around arguing that drastic measures should be taken based on singular novel studies. Mainly, what a single novel study will produce is a wave of new studies on the same subject, to ensure that the results are valid and that the assumptions used hold up to scrutiny. Hence why that low-temperature superconductor was so quickly debunked.

I agree that on average the scientific community does a great job of this, but I think the process is much much messier in practice than a general description of the process makes it seem. For example, you have the alzheimers research that got huge pick-up and massive funding by major scientific institutions where the original research included doctored images. You have power-posing getting viral attention in science-ajacent media. You have priming where Kahneman wrote in his book that even if it seems wild you have to believe in it largely for similar reasons to what is being suggested here I think, that multiple rigorous scientific studies demonstrate the phenomenon, and yet when the replication crisis came around priming looks a lot more shaky than it seemed when Kahneman wrote that.

None of this means that we should throw out the existing scientific community or declare that most published research is false (although ironically there is a peer reviewed publication with this title!). Instead, my argument is that we should understand that this process is often messy and complicated. Imperfect research still has value and in my view is still "evidence" even if it is imperfect.

The research and arguments around AI risk are not anywhere near as rigorous as a lot of scientific research (and I linked a comment above where I myself criticize AI risk advocates for overestimating the rigor of their arguments). At the same time, this doesn't mean that these arguments do not contain any evidence or value. There is a huge amount of uncetainty about what will happen with AI. People worried about the risks from AI are trying to muddle through these issues, just like the scientific community has to muddle through figuring things out as well. I think it its completely valid to point of flaws in arguments, lack of rigor, or over confidence (as I have also done). But evidence or argument doesn't have to appear in a journal or conference to count as "evidence".

My view is that we have to live with the uncertainty and make decisions based on the information we have, while also trying to get better information. Doing nothing and going with the status quo is itself a decision that can have important consequences. We should use the best evidence we have to make the best decision given uncertainty, not just default to the status quo when we lack ideal, rigorous evidence.

Beware of non-evidence-based argumentation

TFD17d1

I don't think my argument is even that anti-institutionalist. I have issues with how academic publishing works but I still think peer reviewed research is an extremely important and valuable source of information. I just think it has flaws and is much messier than discussions around the topic sometimes make it seem.

My point isn't to say that we should throw out traditional academic insitutions, it is to say that I feel like the claim that the arguments for short timelines are "non-evidence-based" are critiquing the same messiness that also is present in peer reviewed research. If I read a study whose conclusions I disagree with, I think it would be wrong to say "field X has a replication crisis, therefore we can't really consider this study to be evidence". I feel like a similar thing is going on when people say the arguments for short timelines are "non-evidence-based". To me things like METR's work definitely are evidence, even if they aren't necessarily strong or definitive evidence or if that evidence is open to contested interpretations. I don't think something needs to be peer reviewed to count as "evidence", is essentially the point I was trying to make.

Beware of non-evidence-based argumentation

TFD17d1

Ultimately, the real test of a study is the criticism it receives after its publication, not peer-review. If researchers in the field think that the study is good and build their research on it, it is much more credible evidence than a study that is disproved by studies that come after it. One should never rely on a single study alone.

This seems reasonable to me, but I don't think its necesarily entirely consistent with the OP. I think a lot of the reason why AI is such a talked about topic compared to 5 years ago is that people have seen work that has gone on in the field and are building on and reacting to it. In other words, they perceive existing results to be evidence of significant progress and opportunities. They could be overreaching to or overhyping those results, but to me it doesn't seem fair to say that the belief in short timelines is entirely "non-evidence-based". Things like METR's work, scaling laws, benchmarks, these are evidence even if they aren't necesarily strong or definitive evidence.

I think it is reasonable to disagree with the conclusions that people draw based on these things, but I don't entirely understand the argument that these things are "non-evidence-based". I think it is worthwhile to distinquish between a disagreement over methodology, evidence strength, or interpretation, and the case where an argument is literally completely free of any evidence or substantiation whatsoever. In my view, arguments for short timelines contain evidence, but that doesn't mean that their conclusions are correct.

TFD

Posts 8

Comments42

Analogies for cause prioritization

In-model vs Out-of-model robustness

Posts
8

Comments
42