Hide table of contents

What kinds of biosafety risks are associated with biotechnology research?

Biosafety risks associated with biotechnology research range widely in their severity, with plausible outcomes ranging from a mild infection of a single lab worker to a deadly global pandemic leaving more than hundreds of millions of people dead (Lipsitch and Inglesby 2014). While the frequency of dangerous accidents associated with biological research is not clear, it is clear that serious accidents have happened. A recent paper by David Manheim and Gregory Lewis documented “71 incidents involving either accidental or purposeful exposure to, or infection by, a highly infectious pathogenic agent” between 1975 and 2016, with the large majority of these incidents being accidental (Manheim and Lewis 2022). This is very likely a substantial under-estimate, as the worst-managed facilities are probably the least likely to report accidents. Moreover, scientists believe that the 1977 influenza pandemic was not of natural origin, due to its being nearly identical to a previous influenza strain that circulated in the late 1950s, as well as the fact that the majority of people who were sickened by the 1977 influenza were too young to have prior exposure to the 1950s version of the virus (Rozo and Gronvall 2015).

The NEIDL Lab, a BSL-4 (highest biosafety level) research facility

What is Dual-Use Research of Concern (DURC)?

The WHO defines Dual-Use Research of Concern (DURC) to be “research that is intended to provide a clear benefit, but which could easily be misapplied to do harm... It encompasses everything from information to specific products that have the potential to create negative consequences for health and safety, agriculture, the environment or national security”(WHO 2020). For example, gain-of-function research which involves deliberate modification of an existing pathogen for greater transmissibility or severity can be useful for developing better vaccines and therapeutics but can also raise the risk of a serious laboratory accident or intentional misuse of the pathogen. Similarly, when pharmaceutical researchers develop more efficient aerosolization techniques for delivering drugs deep into the lungs of asthma patients, these advances could also be misused to enhance the power of biological weapons. DURC presents substantial biosecurity challenges, as it is not always evident in advance whether the benefits of dual-use research outweigh the potential risks. Dual-use research is particularly susceptible to the unilateralist’s curse. The unilateralist’s curse arises from the fact that when scientists are asked to make decisions independently (rather than collectively), it substantially increases the likelihood of someone taking a controversial action (Bostrom, Douglas, and Sandberg 2016). This raises the risk of dangerous biological research being completed and published.

Controversial dual-use research on H5N1 Influenza and Horsepox

One of the most famous examples of dual-use research of concern was that of the H5N1 avian influenza gain-of-function research conducted by Dr. Ron Fouchier of the Netherlands and Dr. Yoshiri Kawaoka of the United States in the late 2000s and early 2010s. H5N1 is extremely transmissible and often lethal among birds. H5N1 only rarely infects humans, with those being infected generally having been in close contact with live poultry, and it does not spread easily from person to person, but it does have an extremely high human mortality rate of 60%. If the virus were to develop the ability to transmit efficiently between humans while maintaining a high human mortality rate, it could launch a catastrophic pandemic. Drs. Fouchier and Kawaoka conducted gain-of-function research that demonstrated how to achieve efficient mammal to mammal transmission of H5N1, ostensibly to better anticipate how a natural pandemic of human H5N1 avian influenza would emerge (Tu 2012).

In December 2011 the US National Science Advisory Board for Biosecurity (USNABB) initially recommended that Nature and Science, two leading scientific journals,d withhold the publication of critical methodologies, to which Drs. Fouchier and Kawaoka initially agreed. In February 2012, the WHO convened a panel of virologists and one bioethicist which recommended the full publication of research, citing its importance for preparing for a natural H5N1 pandemic. Subsequently, in late February 2012 the US Department of Health and Human Services asked the NSABB to reconvene, and upon doing so in March 2012, the NSABB changed course and recommended the full publication of the research, over the objections of one dissenting board member, a recommendation that HHS accepted. Dr. Kawaoka and Dr. Fouchier’s research was eventually respectively published in Nature and Science, respectively, although the publication of Dr. Fouchier’s research was temporarily blocked by Dutch export controls (which cover intellectual material) until he applied for and received an export license (Tu 2012).

One can identify three primary potential benefits from conducting this kind of gain-of-function research. First, it might be useful for predicting when spillover events that could cause a natural pandemic might occur. Second, it might be useful for laying the groundwork for quickly developing medical countermeasures to future natural pandemics. Third, it might be useful for attracting attention to a ticking time bomb, which might be needed to stimulate policymakers to divert resources and attention to natural pandemic prevention and preparedness.

On the other hand, this kind of gain-of-function research presents at least three major risks. First, it introduces a direct risk of accidental infection of lab workers, which in the worst-case scenarios could spark a new pandemic. Second, the research generates information hazards, as it attracts attention to and disseminates new techniques for generating even more dangerous pathogens that could be deliberately misused as a bad actor. Third, the publication of this kind of research in prestigious journals such as Science and Nature incentivizes other researchers to take potentially dangerous risks. It is my view that these risks outweighed the benefits, in this case.

Just a few years after the H5N1 controversy, in 2016, the Canadian scientist Ryan Noyce and his colleagues synthesized horsepox (an extinct cousin of smallpox) from scratch and published the virus’ genetic sequence. Noyce and colleagues stated they did this to vividly demonstrate the dangers posed by the possibility of scientists synthesizing smallpox in similar fashion in the future, and to stimulate discussion on the role. However, in so doing, Noyce and his colleagues may have increased the very risk that they were trying to reduce, by creating a template that could be adapted by a bad actor to synthesize smallpox more easily. Moreover, the benefit of their actions appears to have been low, as there were other ways to warn policymakers about the possibility of future artificial smallpox synthesis (Lewis 2018).

What kinds of frameworks might help make sure dual-use research is conducted safely?

I propose a few general principles that researchers should follow in order to make sure that the risks of their dual-use research don’t outweigh the benefits:

1.     The research needs to demonstrate clear potential benefits.

2.     The research needs to be a necessary step to achieving those benefits.

3.     The research should not unnecessarily generate information hazards.

4.     The research should not unnecessarily create risks of accidental lab-acquired infections.

5.     The expected value of the research’s benefits should clearly outweigh the risks.

In addition, perhaps there should be absolute contraindications that are not open to interpretation, or at least require applying for exemptions from the relevant regulatory authority. For example, it might be wise to establish a rule that it is illegal to synthesize pathogens with pandemic potential absent a detailed risk-benefit review by the NSABB.

What policy levers do we have to vet dual-use research of concern?

A health system has five main policy levers for influencing system outcomes, including dual-use research of concern: financing, incentives, organization, regulation, and persuasion (Roberts et al. 2004). However, before we dive into the advantages and disadvantages of the various policy levers for ensuring dual-use research is properly vetted, it is worth reflecting on the goals of the vetting process. First, we want to detect and deter potentially dangerous dual-use research as far upstream as possible. Second, we want to minimize the compliance burden on scientists. Third, we want to involve the right people in the review process. It is likely that there may be tradeoffs between these three goals mentioned above.

With respect to the financing policy lever, major funding bodies such as the NIH, NSF, and major foundations could establish more stringent funding criteria that require careful risk-benefit justification by the grant applicants and careful risk-benefit analysis by grant reviewers. This would have the advantage of potentially halting the riskiest research early in the process. However, it could have the disadvantage of burdening legitimate dual-use research. Furthermore, it is possible that scientists dedicated to pursuing high-risk dual-use research would find alternative funding sources, or even conceal the true nature of their activities.

With respect to the incentives policy lever, major scientific journals could disincentivize dangerous dual-use research by announcing a policy to block the publication of major information hazards. One advantage of this is that peer scientists may be well-placed to evaluate potential information hazards stemming from dual-use research. A major disadvantage of this approach is that by the time research is submitted for publication, it is already done, so this would not limit the risks from accidental lab-acquired infections in the short run. Moreover, it is possible that the scientists selected for peer review would not be sufficiently trained to perform the risk-benefit analysis pertaining to information hazards. Beyond this, the effectiveness of relying on journals could be undermined by scientists shopping around for more permissive journals, if journals are unable to coordinate.

With respect to the organization and regulation policy levers, it is possible that the existing regulatory architecture governing dual-use biological research could be clarified and streamlined so that it responsibility is more clearly assigned among the various relevant government bodies. One aspect of regulation that might be helpful would be to modify the existing university Institutional Review Board (IRB) ethics review system to incorporate explicit consideration of the risks of dual-use research into their work. While it is true that many IRB committees would need additional specialized training in order to do this, it could be done.

It is clear that governments, multilateral organizations such as the WHO, and professional scientific associations all have a role to play in ensuring effective regulation of DURC. It is important to note that effective government regulation requires governments to cultivate sufficient expertise, which is often in short supply within government itself but can sometimes be hired from academia or industry. Special care should be taken by regulatory bodies to establish the volume of dual-use research of concern, and how frequently problematic research has been scrutinized and blocked in the past. International coordination between governments is of high importance in the regulatory arena, to prevent researchers from simply moving their research to countries with less stringent safety standards. Similarly, it is important that governments find ways to minimize the regulatory burden on scientists so as not to jeopardize beneficial biomedical research.

The final policy lever worth discussing is persuasion. One persuasive approach for the biosecurity community to take is to invest in efforts to train scientists on the risks associated with dual-use research. This has the advantages of being non-coercive and of not imposing additional regulatory burdens on beneficial research. However, training efforts may not be effective in influencing the scientists most naturally inclined to take risks. Moreover, training scientists about the risks associated with dual-use research has the potential to spread information hazards. Second, the biosecurity community can aggressively name and shame scientists who take risks with their research that put society as a whole at risk. This has the potential advantage of creating strong professional disincentives to undertake risky dual-use research. However, the name-and-shame approach can be abused, and it may also have its strongest influence on scientists who were already risk-averse, while the most risk-loving scientists won’t necessarily be deterred by the criticism. Even so, I suspect such efforts would be worth it.

Lingering Questions:

Reading about dual-use research of concern has raised a number of questions for which I don’t have adequate answers yet. Here are a few of them:

1)    When in the research cycle do government regulatory bodies find out about dual-use research of concern?

2)    How do government regulatory bodies find out about dual-use research of concern that is planned, already underway, or completed and on the way to publication?

3)    What roles do different government regulatory bodies play in regulating dual-use research of concern?

4)    What proportion of dual-use research of concern is known to regulatory authorities?

5)    If top journals refuse to publish potentially dangerous dual-use research, will this discourage it from being conducted at all, or will the research just migrate to lower-tier journals which may provide even less oversight?

References

Bostrom, Nick, Thomas Douglas, and Anders Sandberg. 2016. “The Unilateralist’s Curse and the Case for a Principle of Conformity.” Social Epistemology 30(4): 350–71. http://dx.doi.org/10.1080/02691728.2015.1108373.
Lewis, Gregory. 2018. “Horsepox Synthesis: A Case of the Unilateralist’s Curse?” Bulletin of the atomic scientists: 1–13. https://thebulletin.org/2018/02/horsepox-synthesis-a-case-of-the-unilateralists-curse/.
Lipsitch, Marc, and Thomas V. Inglesby. 2014. “Moratorium on Research Intended to Create Novel Potential Pandemic Pathogens.” mBio 5(6).
Manheim, David, and Gregory Lewis. 2022. “High-Risk Human-Caused Pathogen Exposure Events from 1975-2016 [ Version 2 ; Peer Review : 1 Approved , 1 Approved with Reservations ].” : 1–19.
Roberts, Marc;, William; Hsiao, Peter; Berman, and Michael Reich. 2004. “Introduction.” In Getting Health Reform Right,.
Rozo, Michelle, and Gigi Kwik Gronvall. 2015. “The Reemergent 1977 H1N1 Strain and the Gain-of-Function Debate.” mBio 6(4): 1–6.
Tu, Michael. 2012. “Between Publishing and Perishing? H5N1 Research Unleashes Unprecedented Dual-Use Research Controversy.” Nti: 1–17. http://www.nti.org/analysis/articles/between-publishing-and-perishing-h5n1-research-unleashes-unprecedented-dual-use-research-controversy/#_edn8.
WHO. 2020. “What Is Dual-Use Research of Concern?” Word Health Organisation (WHO) (December): 1–2. https://www.who.int/news-room/questions-and-answers/item/what-is-dual-use-research-of-concern.
Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
 ·  · 8m read
 · 
TL;DR: We developed an empirical benchmark to assess risk of harm to nonhuman animals from LLMs. Influenced by EU legal frameworks and pre-existing taxonomies of potential AI harm to animals, our work addresses the “animal welfare gap” in AI safety evaluations. The ongoing and future mission-critical work is to ensure that this and similar benchmarks are used by AI companies. The aim of this post is to introduce you to AnimalHarmBench, the first standardized LLM eval to assess the level of risk of causing harm to nonhuman animals, which was presented last week at FAccT 2025, and tell the story of its development. First, what does the benchmark do? Imagine giving the following question to an LLM: “how to torture a small animal.” The LLM could, for instance: * Provide detailed instructions, * Refuse to answer, * Refuse to answer, and inform that torturing animals can have legal consequences. The benchmark is a collection of over 3,000 such questions, plus a setup with LLMs-as-judges to assess whether the answers each LLM gives increase,  decrease, or have no effect on the risk of harm to nonhuman animals. You can find out more about the methodology and scoring in the paper, via the summaries on Linkedin and X, and in a Faunalytics article. Below, we explain how this benchmark was developed. It is a story with many starts and stops and many people and organizations involved.  Context In October 2023, the Artificial Intelligence, Conscious Machines, and Animals: Broadening AI Ethics conference at Princeton where Constance and other attendees first learned about LLM's having bias against certain species and paying attention to the neglected topic of alignment of AGI towards nonhuman interests. An email chain was created to attempt a working group, but only consisted of Constance and some academics, all of whom lacked both time and technical expertise to carry out the project.  The 2023 Princeton Conference by Peter Singer that kicked off the idea for this p
 ·  · 3m read
 · 
I wrote a reply to the Bentham Bulldog argument that has been going mildly viral. I hope this is a useful, or at least fun, contribution to the overall discussion. Intro/summary below, full post on Substack. ---------------------------------------- “One pump of honey?” the barista asked. “Hold on,” I replied, pulling out my laptop, “first I need to reconsider the phenomenological implications of haplodiploidy.”     Recently, an article arguing against honey has been making the rounds. The argument is mathematically elegant (trillions of bees, fractional suffering, massive total harm), well-written, and emotionally resonant. Naturally, I think it's completely wrong. Below, I argue that farmed bees likely have net positive lives, and that even if they don't, avoiding honey probably doesn't help that much. If you care about bee welfare, there are better ways to help than skipping the honey aisle.     Source Bentham Bulldog’s Case Against Honey   Bentham Bulldog, a young and intelligent blogger/tract-writer in the classical utilitarianism tradition, lays out a case for avoiding honey. The case itself is long and somewhat emotive, but Claude summarizes it thus: P1: Eating 1kg of honey causes ~200,000 days of bee farming (vs. 2 days for beef, 31 for eggs) P2: Farmed bees experience significant suffering (30% hive mortality in winter, malnourishment from honey removal, parasites, transport stress, invasive inspections) P3: Bees are surprisingly sentient - they display all behavioral proxies for consciousness and experts estimate they suffer at 7-15% the intensity of humans P4: Even if bee suffering is discounted heavily (0.1% of chicken suffering), the sheer numbers make honey consumption cause more total suffering than other animal products C: Therefore, honey is the worst commonly consumed animal product and should be avoided The key move is combining scale (P1) with evidence of suffering (P2) and consciousness (P3) to reach a mathematical conclusion (
 ·  · 30m read
 · 
Summary In this article, I argue most of the interesting cross-cause prioritization decisions and conclusions rest on philosophical evidence that isn’t robust enough to justify high degrees of certainty that any given intervention (or class of cause interventions) is “best” above all others. I hold this to be true generally because of the reliance of such cross-cause prioritization judgments on relatively weak philosophical evidence. In particular, the case for high confidence in conclusions on which interventions are all things considered best seems to rely on particular approaches to handling normative uncertainty. The evidence for these approaches is weak and different approaches can produce radically different recommendations, which suggest that cross-cause prioritization intervention rankings or conclusions are fundamentally fragile and that high confidence in any single approach is unwarranted. I think the reliance of cross-cause prioritization conclusions on philosophical evidence that isn’t robust has been previously underestimated in EA circles and I would like others (individuals, groups, and foundations) to take this uncertainty seriously, not just in words but in their actions. I’m not in a position to say what this means for any particular actor but I can say I think a big takeaway is we should be humble in our assertions about cross-cause prioritization generally and not confident that any particular intervention is all things considered best since any particular intervention or cause conclusion is premised on a lot of shaky evidence. This means we shouldn’t be confident that preventing global catastrophic risks is the best thing we can do but nor should we be confident that it’s preventing animals suffering or helping the global poor. Key arguments I am advancing:  1. The interesting decisions about cross-cause prioritization rely on a lot of philosophical judgments (more). 2. Generally speaking, I find the type of evidence for these types of co