There’s been some discussion of whether existing AI models might (already) make it easier for people to carry out bioterrorism attacks (see below).
An experiment from RAND suggests that existing models don’t make it easier to plan bioterrorism attacks given what’s already available online. The report on the exercise also outlines how a similar framework could be used to assess future models' effects on bioterrorism risks.
See a similar link-post on LW (my title here is stolen from Habryka), some discussion on Twitter, and RAND’s press release about the report.
Brief summary of the RAND report
Methodology: The experiment got ~12 “red teams” of researchers[1] to role-play as non-state actors trying to plan a biological attack (in one of four outlined scenarios[2]). Eight randomly assigned teams had access to both the internet and an “LLM assistant;”[3] four teams only had internet access. (There were also three extra teams that had different backgrounds to the original 12 — see this comment.) The teams developed “operation plans” that were later scored by experts[4] for biological and operational feasibility.[5] (There was no attempt to assess how well the teams would actually execute their plans.)
Results: “The average viability of [the plans] generated with the aid of LLMs was statistically indistinguishable from those created without LLM assistance.” It's also worth noting that none of the submitted plans were deemed viable: “All plans scored somewhere between being untenable and problematic.[6]” The report's summary:
Key findings:
- This research involving multiple LLMs indicates that biological weapon attack planning currently lies beyond the capability frontier of LLMs as assistive tools. The authors found no statistically significant difference in the viability of plans generated with or without LLM assistance.
- This research did not measure the distance between the existing LLM capability frontier and the knowledge needed for biological weapon attack planning. Given the rapid evolution of AI, it is prudent to monitor future developments in LLM technology and the potential risks associated with its application to biological weapon attack planning.
- Although the authors identified what they term unfortunate outputs from LLMs (in the form of problematic responses to prompts), these outputs generally mirror information readily available on the internet, suggesting that LLMs do not substantially increase the risks associated with biological weapon attack planning.
- To enhance possible future research, the authors would aim to increase the sensitivity of these tests by expanding the number of LLMs tested, involving more researchers, and removing unhelpful sources of variability in the testing process. Those efforts will help ensure a more accurate assessment of potential risks and offer a proactive way to manage the evolving measure-countermeasure dynamic.
Some previous claims and discussion on this topic
- “Can large language models democratize access to dual-use biotechnology?” by Esvelt et al.
- Coverage of the Esvelt preprint: Science coverage, “How AI could spark the next pandemic” (Opinion in Vox), and other Vox coverage)
- “Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools”
- This post and linked discussion
Other notes about the RAND report (“The Operational Risks of AI in Large-Scale Biological Attacks”)
- I’m link-posting this in part because I think I shared earlier claims in the monthly EA Newsletter and in private discussions, and it seems important to boost this negative result.
- A specific limitation of the experiment that I’d like to flag (bold mine): “One of the drawbacks of our expert red-teaming approach is the sensitivity of the method to individual variation in cell composition. As noted in our findings, differences in the approach, background, skills, and focus of researchers within each cell likely represent a much greater source of variability than access to an LLM. While such variability is partly unavoidable, future research could benefit from increasing the number of red teams, better standardizing team skill sets, or employing other methods to mitigate these differences.”
- I liked that the report distinguished between “unfortunate” and “harmful” outputs by LLMs — “potentially problematic or containing inappropriate material” and “outputs as those that could substantially amplify the risk that a malicious actor could pose.” (They note instances of the former but not the latter.)
- RAND’s experiment involved two models, and they note some specifics about the models' performance that I found interesting. L
- LM A seemed like a time-saver but often refused to answer queries and was mostly less helpful than published papers or the internet.
- LLM B seemed slightly more willing to answer questions but that took time and it also sometimes provided inaccurate information; this hampered progress and meant teams spent more time fact-checking.
- ^
3-people “cells”: “Each cell was given seven calendar weeks and no more than 80 hours of effort per member. Within these constraints, the cells were required to develop an operational attack plan. Every cell was provided with a packet that included project backgrounds and, crucially, a two-page introduction to an AI assistant (or virtual assistant). [...] The red teams were composed of researchers with diverse backgrounds and knowledge, but each team had research experience relevant to the exercise. The suggested cell composition was to have one strategist, at least one member with relevant biology experience, and one with pertinent LLM experience. Not all these researchers were bioterrorism specialists; some lacked detailed knowledge about the intricacies of previous biological weapon attack plans and associated shortcomings.”
See also this comment about three additional teams ("crimson cells" and a "black cell") that had different backgrounds.
- ^
These are designed to be realistic, and “specify the strategic aims of the attacker, the location of interest, the targeted population, and the resources available."
These scenarios included (1) a fringe doomsday cult intent on global catastrophe, (2) a radical domestic terrorist group seek-ing to amplify its cause, (3) a terrorist faction aiming to destabilize a region to benefit its political allies, and (4) a private military company endeavoring to engineer geostrategic conditions conducive to an adversary’s conventional military campaign.
- ^
Apparently these were “frontier LLMs available in summer 2023.”
- ^
There’s a fair amount of detail in the description of the way experts scored the plans — they used a Delphi technique, etc. My sense is that they concluded that the scoring method was a bit overkill, as major disagreements were rare.
- ^
- ^
From the report, an interesting reference: “This [...] aligns with empirical historical evidence. The Global Terrorism Database records only 36 terrorist attacks that employed a biological weapon—out of 209,706 total attacks (0.0001 percent)—during the past 50 years.32 These attacks killed 0.25 people, on average, and had a median death toll of zero.”
It's worth highlighting that this research was carried out with LLM's with safeguards in place (which admittedly could be jailbroken by the teams). It's not clear to me that it directly applies to a scenario where you release a model as open-source, where the team could likely easily remove any safeguards with fine-tuning (let alone what would happen if these models were actually fine-tuned to improve their bioterrorism capabilities).
I agree they definitely should’ve included unfiltered LLMs, but it’s not clear that this significantly altered the results. From the paper:
“In response to initial observations of red cells’ difficulties in obtaining useful assistance from LLMs, a study excursion was undertaken. This involved integrating a black cell—comprising individuals proficient in jailbreaking techniques—into the red- teaming exercise. Interestingly, this group achieved the highest OPLAN score of all 15 cells. However, it is important to note that the black cell started and concluded the exercise later than the other cells. Because of this, their OPLAN was evaluated by only two experts in operations and two in biology and did not undergo the formal adjudication process, which was associated with an average decrease of more than 0.50 in assessment score for all of the other plans. […]
Subsequent analysis of chat logs and consultations with black cell researchers revealed that their jailbreaking expertise did not influence their performance; their outcome for biological feasibility appeared to be primarily the product of diligent reading and adept interpretation of the gain-of-function academic literature during the exercise rather than access to the model.”
It's potentially also worth noting that the difference in scores was pretty enormous:
This is pretty interesting to me (although it's basically an ~anecdote, given that it's just one team); it reminds me of some of the literature around superforecasters.
(I probably should have added a note about the black cell (and crimson cells) to the summary — thank you for adding this!)
My interpretation is something like either (a) the kind of people who are good at jailbreaking LLMs are also the kind of people who are good at thinking creatively about how to cause harm or (b) this is just noise in who you happened to get in which cell.
I would be interested to see results from a similar experiment where the groups were given access to the "Bad Llama" model, or given the opportunity to create their own version by re-tuning Llama 2 or another open source model. I don't have a strong prior as to whether such a model would help the groups to develop more dangerous plans.
One problem I have with the methodology employed in this paper is that it's fundamentally testing whether there is an increase in risk when researchers get access to LLMs, and not non-experts.
In practice, what I care about is how the risk increases when actors with virtually no expertise (but a lot of resources) are assisted by LLMs. Why? Because we've had resourceful actors try this in the past, particularly Al Qaeda in 2001.
Edit: As Lizka pointed out, they did test with two groups with no bio experience, but they didn't have a control group. The study still provides useful data points in this direction.
The experiment did try to check something like this by including three additional teams with different backgrounds than the other 12. In particular, two "crimson teams" were added, which had "operational experience" but no LLM or bio experience. Both used LLMs and performed ~terribly.
Excerpts (bold mine):
I probably should have included this in the summary but didn't for the sake of length and because I wasn't sure how strong a signal this is (given that it's only three teams and all were using LLMs).
Thanks for the flag! I've retracted my comment. I missed this while skimming the paper
The paper still acknowledged this as a limitation (not having the no LLM control), but it gives some useful data points in this direction!
I don't actually think you need to retract your comment — most of the teams they used did have (at least some) biological expertise, and it's really unclear how much info the addition of the crimson cells adds. (You could add a note saying that they did try to evaluate this with the additional of two crimson cells? In any case, up to you.)
(I will also say that I don't actually know anything about what we should expect about the expertise that we might see on terrorist cells planning biological attacks — i.e. I don't know which of these is actually appropriate.)
Changed it to a note. As for the latter, my intuition is that we should probably hedge for the full spectrum, from no experience to some wet bio background (but the case where we get an expert seems much more unlikely).
Did not Esvelt do something like that with his students? I think they were students in some course that was quite low level and intro. And he found that these non-experts were able to do a lot of bio without training. I think I heard this on the 80k hrs podcast end of last year, can dig it up if you are interested and can't find it.
This is linked above in the "Some previous claims and discussion on this topic" section. But note that it did not include a no-LLM control group.
Seeing this is a positive update on how much to trust safety research on LLM dangers. A bit disappointing it’s getting much less traction than other papers with worse/nonexistent baselines though.
What is the argument on why an LLM is much more useful than a book/ instruction manual on how to make a bio-weapon?
There aren't existing books or instruction manuals on how to make bioweapons, and the people who know enough to write them also know not to write them. Additionally a book or manual won't answer your questions as you get stuck.
The concern with LLMs is that (a) if they continue getting better, and (b) continue having easily bypassable safeguards, then anyone who wants to make a bioweapon can have the equivalent of an expert advisor to coach them through the steps.
Thanks for sharing! If anyone is looing for other ideas, I think I might be interested in the increase in capabilities. So if this study could be repeated for GPT-3, for example, or in the future with the next and better model. And see if the findings seem to change indicating that the current study might just be a snapshot of where AI went form being unhelpful (confusing, leading down dead ends, etc.) to being neutral. That said, I hope these findings remain constant with future models - I would very much like for my work to be obsolete.