Hide table of contents

Summary: Patching all exploits in open-source software that forms the backbone of the internet would be hard on maintainers, less effective than thought, and expensive (Fermi estimate included, 5%/50%/95% cost ~$31 mio./~$1.9 bio./~$168 bio.). It's unclear who'll be willing to pay that.

Preventative measures discussed for averting an AI takeover attempt include hardenening the software infrastructure of the world against attacks. The plan is to use lab-internal (specialized?) software engineering AI systems to submit patches to fix all findable security vulnerabilities in open-source software (think a vastly expanded and automated version of Project Zero, and likely to partner with companies developing internet-critical software (in the likes of Cisco & Huawei).

I think that that plan is net-positive. I also think that it has some pretty glaring open problems (in ascending order of exigency): (1) Maintainer overload and response times, (2) hybrid hardware/software vulnerabilities, and (3) cost as a public good (also known as "who's gonna pay for it?").

Maintainer Overload

If transformative AI is developed soon, most open source projects (especially old ones relevant to internet infrastructure) are going to be maintained by humans with human response times. That will significantly increase the time for relevant security patches to be reviewed and merged into existing codebases, especially if at the time attackers will submit AI-generated or co-developed subtle exploits using AI systems six to nine months behind the leading capabilities, keeping maintainers especially vigilant.

Hybrid and Hardware Vulnerabilities

My impression is that vulnerabilities are moving from software-only vulnerabilities towards very low-level microcode or software/hardware hybrid vulnerabilities (e.g. Hertzbleed, Spectre, Meltdown, Rowhammer, Microarchitectural Data Sampling, …), for which software fixes, if they exist, have pretty bad performance penalties. GPU-level vulnerabilities get less attention, but they absolutely exist, e.g. LeftoverLocals and JellyFish. My best guess is that cutting-edge GPUs are much less secure than CPUs, since they've received less attention from researchers and their documentation is less easily accessible. (They probably have less cruft from bad design choices in early computer history.) Hence: Software-only vulnerabilities are easy to fix, software/hardware hybrid ones are more painful to fix, hardware vulnerabilities escape quick fixes (in the extreme demanding recall like the Pentium FDIV bug). And don't get me started on the vulnerabilities lurking in human psychology, which are basically impossible to fix on short time-scales…

Who Pays?

Finding vulnerabilities in all the relevant security infrastructure of the internet and fixing them might be expensive. 1 mio. input tokens for Gemini 2.0 Flash cost $0.15, and $0.60 for output tokens—but a model able to find & find fixes to security vulnerabilities is going to be more expensive. An AI-generated me-adjusted Squiggle model estimates that it'd cost (median estimate) ~$1.9 bio. to fix most vulnerabilities in open-source software (90% confidence-interval: ~$31 mio. to ~$168 bio., mean estimated cost is… gulp… ~$140 bio.).

(I think the analysis under-estimates the cost because it doesn't consider setting up the project, paying human supervisors and reviewers, costs for testing infrastructure & compute, finding complicated vulnerabilities that arise from the interaction of different programs…).

It was notable when Google paid $600k for open-source fuzzing, so >~$1.9 bio. is going to be… hefty. The discussion on this has been pretty far mode and "surely somebody is going to do that when it's “so easy”", but there have been fewer remarks about the expense and who'll carry the burden. For comparison, the 6-year budget for Horizon Europe (which funds, as a tiny part of its portfolio, open source projects like PeerTube and the Eclipse Foundation) is 93.5 bio. €, and the EU Next Generation Internet programme has spent 250 mio. € (2018-2020)+62 mio. € (2021-2022)+27 mio. € (2023-2025)=~337 mio. € on funding open-source software.

Another consideration is that this project would need to be finished quickly—potentially less than a year as open weights models catch up and frontier models become more dangerous. So humanity will not be able to wait until the frontier models become cheaper so that it'll be less expensive—as soon as automated vulnerability finding becomes, both attackers and defenders will be in a race to exploit them.

So, a proposal: Whenever someone claims that LLMs will d/acc us out of AI takeover by fixing our infrastructure, they will also have to specify who will pay the costs of setting up this project and running it.

35

0
0

Reactions

0
0

More posts like this

Comments4


Sorted by Click to highlight new comments since:

Remark in guaranteed safe AI newsletter

Niplav writes

So, a proposal: Whenever someone claims that LLMs will d/acc us out of AI takeover by fixing our infrastructure, they will also have to specify who will pay the costs of setting up this project and running it.

I’m almost centrally the guy claiming LLMs will d/acc us out of AI takeover by fixing infrastructure, technically I’m usually hedging more than that but it’s accurate in spirit.

If transformative AI is developed soon, most open source projects (especially old ones relevant to internet infrastructure) are going to be maintained by humans with human response times. That will significantly increase the time for relevant security patches to be reviewed and merged into existing codebases, especially if at the time attackers will submit AI-generated or co-developed subtle exploits using AI systems six to nine months behind the leading capabilities, keeping maintainers especially vigilant.

I usually say we prove the patches correct! But Niplav is correct: it’s a hard social problem, many critical systems maintainers are particularly slop-phobic and won’t want synthetic code checked in. That’s why I try to emphasize that the two trust points are the spec and the checker, and the rest is relinquished to a shoggoth. That’s the vision anyway– we solve this social problem by involving the slop-phobic maintainers in writing the spec and conveying to them how trustworthy the deductive process is.

Niplav’s squiggle model: Median $~1b worth of tokens, plus all the “setting up the project, paying human supervisors and reviewers, costs for testing infrastructure & compute, finding complicated vulnerabilities that arise from the interaction of different programs…” etc costs. I think a lot’s in our action space to reduce those latter costs, but the token cost imposes a firm lower bound.

But this is an EA Forum post, meaning the project is being evaluated as an EA cause area: is it cost effective? To be cost effective, the savings from alleviating some disvalue have to be worth the money you’ll spend. As a programming best practices chauvinist, one of my pastimes is picking on CrowdStrike, so let’s not pass up the opportunity. The 2024 outage is estimated to have cost about $5b across the top 500 companies excluding microsoft. A public goods project may not have been able to avert CrowdStrike, but it’s instructive for getting a flavor of the damage, and this number suggests it could be easily worth spending around Niplav’s estimate. On cost effectiveness though, even I (who works on this “LLMs driving Hot FV Summer” thing full time) am skeptical, only because open source software is pretty hardened already. Curl/libcurl saw 23 CVEs in 2023 and 18 in 2024, which it’d be nice to prevent but really isn’t a catastrophic amount. Other projects are similar. I think a lot about the Tony Hoare quote “It has turned out that the world just does not suffer significantly from the kind of problem that our research was originally intended to solve.” Not every bug is even an exploit.

While this is a good argument against it indicating governance-by-default (if people are saying that), securing longtermist funding to work with the free software community over this (thus overcoming two of the three hurdles) still seems to be a potentially very cost-effective way to reduce AI risk to look into, particularly combined with differential technological development of AI defensive v. offensive capacities.

That's maybe a more productive way of looking at it! Makes me glad I estimated more than I claimed.

I think governments are probably the best candidate for funding this, or AI companies in cooperation with governments. And it's an intervention which has limited downside and is easy to scale up/down, with the most important software being evaluated first.

Not draft amnesty but I'll take it. Yell at me below to get my justification for the variable-values in the Fermi estimate.

More from niplav
Curated and popular this week
 ·  · 4m read
 · 
TLDR When we look across all jobs globally, many of us in the EA community occupy positions that would rank in the 99.9th percentile or higher by our own preferences within jobs that we could plausibly get.[1] Whether you work at an EA-aligned organization, hold a high-impact role elsewhere, or have a well-compensated position which allows you to make significant high effectiveness donations, your job situation is likely extraordinarily fortunate and high impact by global standards. This career conversations week, it's worth reflecting on this and considering how we can make the most of these opportunities. Intro I think job choice is one of the great advantages of development. Before the industrial revolution, nearly everyone had to be a hunter-gatherer or a farmer, and they typically didn’t get a choice between those. Now there is typically some choice in low income countries, and typically a lot of choice in high income countries. This already suggests that having a job in your preferred field puts you in a high percentile of job choice. But for many in the EA community, the situation is even more fortunate. The Mathematics of Job Preference If you work at an EA-aligned organization and that is your top preference, you occupy an extraordinarily rare position. There are perhaps a few thousand such positions globally, out of the world's several billion jobs. Simple division suggests this puts you in roughly the 99.9999th percentile of job preference. Even if you don't work directly for an EA organization but have secured: * A job allowing significant donations * A position with direct positive impact aligned with your values * Work that combines your skills, interests, and preferred location You likely still occupy a position in the 99.9th percentile or higher of global job preference matching. Even without the impact perspective, if you are working in your preferred field and preferred country, that may put you in the 99.9th percentile of job preference
 ·  · 5m read
 · 
Summary Following our co-founder Joey's recent transition announcement we're actively searching for exceptional leadership to join our C-level team and guide AIM into its next phase. * Find the full job description here * To apply, please visit the following link * Recommend someone you think could be a great fit here * Location: London strongly preferred. Remote candidates willing to work from London at least 3 months a year and otherwise overlapping at least 6 hours with 9 am to 5 pm BST will be considered. We are happy to sponsor UK work visas. * Employment Type: Full-time (35 hours) * Application Deadline: rolling until August 10, 2025 * Start Date: as soon as possible (with some flexibility for the right candidate) * Compensation: £45,000–£90,000 (for details on our compensation policy see full job description) Leadership Transition On March 15th, Joey announced he's stepping away from his role as CEO of AIM, with his planned last day as December 1st. This follows our other co-founder Karolina's completed transition in 2024. Like Karolina, Joey will transition to a board member role while we bring in new leadership to guide AIM's next phase of growth. The Opportunity AIM is at a unique inflection point. We're seeking an exceptional leader to join Samantha and Devon on our C-level team and help shape the next era of one of the most impactful organizations in the EA ecosystem. With foundations established (including a strong leadership team and funding runway), we're ready to scale our influence dramatically and see many exciting pathways to do so. While the current leadership team has a default 2026 strategic plan, we are open to a new CEO proposing radical departures. This might include: * Proposing alternative ways to integrate or spin off existing or new programs * Deciding to spend more resources trialling more experimental programs, or double down on Charity Entrepreneurship * Expanding geographically or deepening impact in existing region
 ·  · 10m read
 · 
The Strategic Animal Funding Circle just finished its first two rounds of grantmaking, and we ended up giving ~$900,000 across 16 organisations. Below is some brief reasoning behind why we were excited about these grants and broad factors that resulted in us not granting to other organisations. Overall we were pretty excited about the quality of applications we got and feel optimistic the circle will continue to run/deploy more in the future. At the bottom of this post, you can find more information about the next round, how to apply and how to join as a donor.  Top four reasons we ruled out applications Unclear theory of change The most common reason an organisation was ruled out was for unclear theory of change. This has come up with other funding circles’ explanation copied from a prior writeup we made “Some applicants do not share sufficient reasoning on how their project (in the end) contributes to a better world. Other applicants have a theory of change which seems too complex or involves too many programs. We generally prefer fewer programs with a more narrow focus, especially for earlier-stage projects. Other ToCs simply seem like an inaccurate representation of how well the intervention would actually work. As a starting point, we recommend Aidan Alexander’s post on ToCs.” We particularly saw this challenge with research projects and political projects.  Lack of strong plan, goals or evidence for why a group may achieve success. The groups focused too much on WHAT they wanted to achieve and insufficiently on HOW they planned to achieve this. In the future, we recommend that applicants elaborate on what are their SMART goals, what their plan is to achieve them or what other evidence exists that they will achieve what they planned for example, showing their track record or effectiveness of the intervention in general. This will enable us to judge and build confidence in their ability to execute the project and therefore increase our interest in funding