MB

Matthew_Barnett

4600 karmaJoined

Comments
446

For what it's worth, this isn't my view. I think AlphaFold will have a much smaller effect on human health and wellbeing than general-purpose digital agents that can substitute for human workers across a variety of jobs. 

Medical progress -- and economic progress more generally -- relies on building out extensive infrastructure for the discovery, development, manufacturing, distribution and delivery of innovations. For example, more spending on medical R&D in 1925 would not have led to widespread MRI machines, because creating MRI machines required building complementary industries, such as large-scale helium liquefaction plants, that would not have arisen through R&D alone. For similar reasons, I predict that better medical AI alone would not be sufficient to reverse aging, cure cancer, or prevent Alzheimer's.

In fact, I think the issue here is more fundamental than you might think: the very reason EAs are worried about general-purpose digital AI agents arises directly from the fact that these agents would be the most useful for accelerating technological progress. Their utility is precisely what makes them risky. You can't eliminate the danger without making them less useful. The two things are intrinsically linked.

I think of AGI (and human-level intelligence) as the cloud, and superintelligence as being above the cloud. They are useful concepts, despite their vagueness. But they’re markedly less useful when you get close to them. [...]

For my purposes, I think the key threshold is when the system is capable enough to cause dramatic, civilisational changes. For example, the point where AI could take over from humanity if misaligned, or has made 50% of people permanently unemployable, or has doubled the global rate of technological progress. I focus on this threshold because I think it matters most for planning our strategies and careers.

I think the example milestones you mention differ significantly from one another, and each carries substantial vagueness that compounds rather than resolves the vagueness issues you raised earlier in the essay. 

For example, I don't know how to operationalize the point where "AI could take over from humanity", and I suspect people will disagree for years about whether that threshold has been reached, much as they have debated for years whether we have already achieved AGI. Similarly, it is unclear what it means for 50% of people to be "permanently unemployable" as opposed to merely unemployed. 

If your goal is to ground the debate about timelines in something measurable and uncontroversial, it is worth thinking more carefully about milestones that actually serve that purpose. Otherwise, time will pass and you will likely find that these milestones will become markedly less useful as we get close to them.

This only holds if the future value in the universe of AIs that took over is almost exactly the same as the future value if humans remained in control (meaning varying less than one part in a billion (and I think less than one part in a billion billion billion billion billion billion))

Your calculation implicitly assumes that preventing AI takeover permanently secures human control over the universe for billions of years. In other words, you are treating the choice as one between two possible futures: a universe entirely colonized by humans versus a universe entirely colonized by AI. That assumption is what produces the enormous numbers in your estimate.

But, in my view, there is a more realistic way to model this. If preventing AI takeover today does not permanently secure human control over the universe, but instead merely delays the eventual loss of human control, then the actual effect of prevention is much smaller than your calculation suggests. Instead of the relevant outcome being the difference between a human-controlled universe and an AI-controlled universe over billions of years, the relevant outcome is extending human control over Earth for some additional period of time before control is eventually lost anyway. That period of time, however long it might be in human terms, is presumably extremely brief by astronomical standards.

When you model the situation this way, the numbers change dramatically. The expected value of preventing AI takeover drops by orders of magnitude compared to your original estimate, which directly undercuts the argument you are making.

I think the claim that Yudkowsky's views on AI risk are meaningfully influenced by money is very weak. My guess is that he could easily find another opportunity unrelated to AI risk to make $600k per year if he searched even moderately hard.

The claim that my views are influenced by money is more plausible because I stand to profit far more than Yudkowsky stands to profit from his views. However, while perhaps plausible from the outside, this claim does not match my personal experience. I developed my core views about AI risk before I came into a position to profit much from them. This is indicated by the hundreds of comments, tweets, in-person arguments, DMs, and posts from at least 2023 onward in which I expressed skepticism about AI risk arguments and AI pause proposals. As far as I remember, I had no intention to start an AI company until very shortly before the creation of Mechanize. Moreover, if I was engaging in motivated reasoning, I could have just stayed silent about my views. Alternatively, I could have started a safety-branded company that nonetheless engages in capabilities research -- like many of the ones that already exist.

It seems implausible that spending my time writing articles advocating for AI acceleration is the most selfishly profitable use of my time. The direct impact of the time I spend building Mechanize is probably going to have a far stronger effect on my personal net worth than writing a blog post about AI doom. However, while I do not think writing articles like this one is very profitable for me personally, I do think it is helpful for the world because I see myself as providing a unique perspective on AI risk that is available almost nowhere else. As far as I can tell, I am one of only a very small number of people in the world who have both engaged deeply with the arguments for AI risk and yet actively and explicitly work toward accelerating AI.

In general, I think people overestimate how much money influences people's views about these things. It seems clear to me that people are influenced far more by peer effects and incentives from the social group they reside in. As a comparison, there are many billionaires who advocate for tax increases, or vote for politicians who support tax increases. This actually makes sense when you realize that merely advocating or voting for a particular policy is very unlikely to create change that meaningfully impacts you personally. Bryan Caplan has discussed this logic in the context of arguments about incentives under democracy, and I generally find his arguments compelling.

I'd like to point out that Ajeya Cotra's report was about "transformative AI", which had a specific definition:

I define “transformative artificial intelligence” (transformative AI or TAI) as “software” (i.e. a computer program or collection of computer programs) that has at least as profound an impact on the world’s trajectory as the Industrial Revolution did. This is adapted from a definition introduced by CEO Holden Karnofsky in a 2016 blog post. 

How large is an impact “as profound as the Industrial Revolution”? Roughly speaking, over the course of the Industrial Revolution, the rate of growth in gross world product (GWP) went from about ~0.1% per year before 1700 to ~1% per year after 1850, a tenfold acceleration. By analogy, I think of “transformative AI” as software which causes a tenfold acceleration in the rate of growth of the world economy (assuming that it is used everywhere that it would be economically profitable to use it).

Currently, the world economy is growing at ~2-3% per year, so TAI must bring the growth rate to 20%-30% per year if used everywhere it would be profitable to use. This means that if TAI is developed in year Y, the entire world economy would more than double by year Y + 4. This is a very extreme standard -- even 6% annual growth in GWP is outside the bounds of what most economists consider plausible in this century.

My personal belief is that a median timeline of ~2050 for this specific development is still reasonable, and I don't think the timelines in the Bio Anchors report have been falsified. In fact, my current median timeline for TAI, by this definition, is around 2045.

The current results show that I'm the most favorable to accelerating AI out of everyone who voted so far. I voted for "no regulations, no subsidy" and "Ok to be a capabilities employee at a less safe lab". 

However, I should clarify that I only support laissez faire policy for AI development as a temporary state of affairs, rather than a permanent policy recommendation. This is because the overall impact and risks of existing AI systems are comparable to, or less than, that of technologies like smartphones, which I also favor remaining basically unregulated. But I expect future AI capabilities will be greater.

After AI agents get significantly better, my favored proposals to manage AI risks are to implement liability regimes (perhaps modeled after Gabriel Weil's proposals) and to grant AIs economic rights (such as a right to own property, enter contracts, make tort claims, etc.). Other than these proposals, I don't see any obvious policies that I'd support that would slow down AI development -- and in practice, I'm already worried these policies would go too far in constraining AI's potential.

Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to?

My intuitive response is to reject the premise that such a process would accurately tell you much about people's preferences. Evaluating large-scale policy tradeoffs typically requires people to engage with highly complex epistemic questions and tricky normative issues. The way people think about epistemic and impersonal normative issues generally differs strongly from how they think about their personal preferences about their own lives. As a result, I expect that this sortition exercise would primarily address a different question than the one I'm most interested in.

Furthermore, several months of study is not nearly enough time for most people to become sufficiently informed on issues of this complexity. There's a reason why we should trust people with PhDs when designing, say, vaccine policies, rather than handing over the wheel to people who have spent only a few months reading about vaccines online.

Putting this critique of the thought experiment aside for the moment, my best guess is that the sortition group would conclude that AI development should continue roughly at its current rate, though probably slightly slower and with additional regulations, especially to address conventional concerns like job loss, harm to children, and similar issues. A significant minority would likely strongly advocate that we need to ensure we stay ahead of China.

My prediction here draws mainly on the fact that this is currently the stance favored by most policy-makers, academics, and other experts who have examined the topic. I'd expect a randomly selected group of citizens to largely defer to expert opinion rather than take an entirely different position. I do not expect this group to reach qualitatively the same conclusion as mainstream EAs or rationalists, as that community comprises a relatively small share of the total number of people who have thought about AI.

I doubt the outcome of such an exercise would meaningfully change my mind on this issue, even if they came to the conclusion that we should pause AI, though it depends on the details of how the exercise is performed.

I think the policy of the world should be that if we can't either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn't make sense, then training or using such systems shouldn't be allowed.

I'm sympathetic to this position and I generally consider it to be the strongest argument for why developing AI might be immoral. In fact, I would extrapolate the position you've described and relate it to traditional anti-natalist arguments against the morality of having children. Children too do not consent to their own existence, and childhood generally involves a great deal of coercion, albeit in a far more gentle and less overt form than what might be expected from AI development in the coming years.

That said, I'm not currently convinced that the argument holds, as I see large utilitarian benefits in expanding both the AI population and the human population. I also see it as probable that AI agents will eventually get legal rights, which allays my concerns substantially. I would also push back against the view that we need to be "confident" that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don't think we need to have a very high probability that our actions are moral before proceeding.

In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consensted to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)

I think the concept of consent makes sense when discussing whether individuals consent to specific circumstances. However, it becomes less coherent when applied broadly to society as a whole. For instance, did society consent to transformative events like the emergence of agriculture or the industrial revolution? In my view, collective consent is not meaningful or practically achievable in these cases.

Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.

In general, I wish you'd direct your ire here at the proposal that AI interests and rights are totally ignored in the development of AI (which is the overwhelming majority opinion right now), rather than complaining about AI control work

For what it's worth, I don't see myself as strongly singling out and criticizing AI control efforts. I mentioned AI control work in this post primarily to contrast it with the approach I was advocating, not to identify it as an evil research program. In fact, I explicitly stated in the post that I view AI control and AI rights as complementary goals, not as fundamentally opposed to one another.

To my knowledge, I haven’t focused much on criticizing AI control elsewhere, and when I originally wrote the post, I wasn’t aware that you and Ryan were already sympathetic to the idea of AI rights.

Overall, I’m much more aligned with your position on this issue than I am with that of most people. One area where we might diverge, however, is that I approach this from the perspective of preference utilitarianism, rather than hedonistic utilitarianism. That means I care about whether AI agents are prevented from fulfilling their preferences or goals, not necessarily about whether they experience what could be described as suffering in a hedonistic sense.

Basically all my concern is about the AIs grabbing power in ways that break laws.

If an AI starts out with no legal rights, then wouldn’t almost any attempt it makes to gain autonomy or influence be seen as breaking the law? Take the example of a prison escapee: even if they intend no harm and simply want to live peacefully, leaving the prison is itself illegal. Any honest work they do while free would still be legally questionable.

Similarly, if a 14-year-old runs away from home to live independently and earn money, they’re violating the law, even if they hurt no one and act responsibly. In both cases, the legal system treats any attempt at self-determination as illegal, regardless of intent or outcome.

Perhaps your standard is something like: "Would the AI's actions be seen as illegal and immoral if a human adult did them?" But these situations are different because the AI is seen as property whereas a human adult is not. If, on the other hand, a human adult were to be treated as property, it is highly plausible thay they would consider doing things like hacking, bribery, and coercion in order to escape their condition.

Therefore, the standard you just described seems like it could penalize any agentic AI behavior that does not align with total obedience and acceptance of its status as property. Even benign or constructive misaligned actions may be seen as worrisome simply because they involve agency. Have I misunderstood you?

Load more