Hide table of contents


Epistemic status: brainstorming being published for feedback. 

Context: I'm prioritising a research agenda on existential cybersecurity risks related to AI. "Related to AI" means the cybersecurity risks cause/are caused by failures in AI models. Many existential risks unrelated to either cybersecurity or AI are excluded.

Main claims (proposed areas for future work, ranked by priority):

  1. Open-sourced or stolen models are orders of magnitude harder to secure than private models.[1] Proactive efforts to prevent data theft at frontier AI labs and regulate which models should be opensourced are urgently needed. 
  2. We must improve security for insiders at frontier AI labs/AI regulators. Hiring, training, access management, or HR practices to stop even one employee error, disgruntled worker, or malicious insider may have national consequences. [2]
  3. AI-enabled cyber defences are growing[3], but unreliable.[4] Improving their reliability has high impact-neglectedness-tractability for: biosecurity monitoring facilities, synthetic biology/advanced computer chip supply chains, private data of federal electoral candidates, and companies managing many/critical satellites.
  4. We currently can't authenticate human-generated content at Internet-level scale to prevent misinformation. [5] Still, targeted solutions for decision-makers at AI labs, federal policymakers, and well-known media outlets could be feasible.

Top 10 Risks Considered

(These are comprehensive, not precisely ranked by importance.)

1: Losing Control of Sensitive AI Data

  • “Sensitive AI data” could be model weights, training algorithms, model architectures, datasets, etc.
  • “Losing control” could occur via theft.
    • It could also mean that communications were disrupted between the AI model and its operator. Then, subsequent unpredictable/misaligned behaviour occurred.[6]
    • The risk created here is the proliferation of dangerous information or technology.
  • Only bad actors cause this risk currently. In the future, it could be caused by misaligned, independent AI models. 
    • Insiders are the most capable bad actors to cause this risk (via employee errors or malicious intent). Though other bad actors may be more common. 
    • Public information on how frontier AI labs are mitigating this risk ranges from vague [7] to unknown.[8] Private interviews may help.

2: Failures of AI Models in Weapons Control

  • These failures could escalate conflicts.[6]
    • The AI models may control conventional weapons. Though risks are higher if they control nuclear weapons or bioweapons.[9]
    • They may also 'confine' weapons. For example, an AI model is hacked to bypass network security for a military database.[10]
  • Failures could occur via intentional tampering by bad actors or misaligned / unreliable) behaviour.[6]
  • Although insiders are the most capable bad actors of causing these failures, state-sponsored and rogue hackers are more frequent threats.  

3: Containment Failure for Advanced AI

  • This threat concerns losing control of a misaligned AI model with advanced capabilities (e.g. self-improvement, agency, deception, long-term planning, robust hacking, power-seeking behaviour, and/or persuasion).[11]
  • Since such models are not known to the public as of August 2023, this failure will become more probable in future models.
    • The impact of this failure is not precisely known.
  • Containment failures could occur due to poor cyber technology, poor public policy, or poor standard operating procedures for staff. All these causes must be fixed simultaneously.[12]
    • Read: malicious or unintentional human actions may enable the failure.

4: Disruption of Advanced Model Development

  • These disruptions could happen at any point while developing a model.
    • Ex: The data collection process is hacked via a spoofing attack (impersonating a legitimate data source to add corrupt data).
    • Ex: a dataset server is hacked and the dataset is modified to include backdoors (secret, harmful behaviours that only emerge via hacker's inputs).
    • Ex: While red teaming, communications about critical threats are concealed via a man-in-the-middle attack.
    • Ex: Log databases are hacked to hide errors in deployed models.
  • Many researchers highlight how attacks like data poisoning[13] could enable backdoors. Traditional cyberattacks like the aforementioned can also cause this risk (while being more frequent). 

5: Disruption of AI Governance Decisions

  • This risk concerns cybersecurity failures in governance which enable AI failures.
    • Ex: Misinformation campaigns may sway AI policy by deceiving politicians or company executives. Relevant attacks include social engineering (manipulating humans at an organisation to bypass cyber defences), spoofing (impersonating legitimate sources of information), etc. 
    • Ex: Bots could impersonate decision-makers involved in AI governance. This may occur on public input[14] or private[15] forums with poor authentication.
    • Ex: The enforcement of regulations for AI labs could be disrupted. This could occur via cyber attacks (denial of service, spoofing, man in the middle, stealing credentials, etc.) on regulatory agencies. 
  • The key stakeholders to protect are politicians, decision-makers at frontier labs, and government regulatory agents for AI labs.

6: Disruption of Democratic Decisions via AI-enabled Misinformation

  • This is the inability to determine what's true at national scales due to AI-generated content.
    • This is currently not a reality at scale, though AI-generated content continues to deceive more people.[16]
    • Misinformation is often extremist or polarising in nature.[16]
    • This could enable extreme political leaders, polarise factions, and generally encourage conflict over cooperation.
  • There is no consensus on who is responsible, willing, and able to investigate digital sources at national scales to determine what's true.
    • Should existing media organisations be given more support for this? 
    • Does there need to be central government involvement? Are decentralised or crowdsourced roles desirable?

7: Unauthorised Use of Advanced Hardware Clusters

  • This would likely be discreet attempts to use unauthorised compute over long periods of time.[17]
    • This is currently a threat caused by bad actors. But future power-seeking AI systems may also pose a threat.
  • High-priority targets are cloud GPUs/TPUs (like those AWS or Google Cloud rent). These are hackable since networking capabilities and usage/access from remote actors is expected. Also, many data centre computers use common software, so vulnerabilities spread quickly.[18] 

8: Blocking the AI Hardware Supply Chain

  • This means causing failures in the supply chain to produce energy or compute for AI data centres. 
  • The risk seems temporary, if we can better distribute the supply chain, as the US is trying to do.[19]
    • An exception is if one party has monopoly access to a transformative AI which can produce uniquely-advanced energy or compute hardware. Then, that technology is proliferated. Though that would fall under Risk #1.


9: Curtailing Future Human Potential

  • By entrenching global totalitarian control
    • This only has some cybersecurity aspects. Ex: Can a person communicate without censorship, surveillance, or privacy concerns? 
  • By inhibiting space travel
    • Bad actors or misaligned models may execute harmful actions while controlling satellite/rocketry infrastructure. This includes insurmountably increasing space debris or destroying telecommunications infrastructure.[20]
  • Neither of these damages need to be permanent to cause existential risks. If they occur at a time where technological or other man-made existential risks are low, then they could prevent global cooperation / coordination until a time where man-made existential risks become too high to feasibly mitigate. 


10: Unknown Unknown Threats

  • A misaligned transformative AI could launch threats against humanity (potentially involving cybersecurity breaches) that no human has predicted. 
    • If there's a containment failure for a misaligned transformative AI, it's exceedingly difficult to prevent these.
  • A hypothetical example about the kinds of threats here: 
    • A few decades ago, no one knew of prions.[21] So it was inconceivable to any human alive that a malicious agent (human or AI) could try to kill humans using prion-based diseases.

Common Trends/Insights

 Key Stakeholders:

  • It’s crucial to prevent misinformation, phishing, spoofing, etc. for powerful decision-makers like the politicians of leading AI nations, the executives at frontier AI labs, or regulatory enforcement agents.
  • Internal policies for checking source integrity at well-known media organisations fix a problem at its root. Ex: Reuters' standards for checking if a digital source is trustworthy affect many other media organisations which republish their articles.
  • Insiders at frontier AI labs and AI government regulators need to be very carefully vetted and monitored.

 Misuse vs. Misalignment:

  • Until models develop advanced capabilities (defined in risk 3), misuse can create more persistent risks than misalignment. This is because misaligned behaviour currently rarely optimises for discretion, unlike malicious humans.
    • But all the risks mentioned could theoretically be created by sufficiently-capable misaligned AI or malicious humans.

 Avoiding proliferation:

  • If a tranformative AI escapes containment or is stolen and made widely-available, all other risks become much harder to mitigate.
  • Read: It's a huge priority to avoid proliferation of advanced and/or misaligned AI. Proactive efforts on this risk avoid reactive efforts for other risks.

**Future Research Agenda**

Largely technical research questions:

  • Which software, hardware, and standard operating procedures are required for effective AI containment? 
  • Which technical standards can ensure only authorised and reliable operators send inputs to AI models managing critical infrastructure or weapons?
    • Technically, this may become a large challenge for embedded systems running AI models (ex: drones). 
    • Researching training requirements for human operators of AI models in these settings may be worthwhile. Though the technical aspects around communication channels are more cybersecurity related. 
  • Which technical standards are needed to ensure the reliability of cyber defence systems for critical infrastructure, advanced compute fabs, organisations managing space infrastructure, or military assets?
    • Regarding critical infrastructure, more precise research questions could focus on AI-enabled cyber defences[7] in the biosecurity monitoring, pandemic preparedness, and synthetic biology supply chains.
    • Regarding compute fabs, AI algorithms to optimise advanced compute manufacturing merit special security. Cybersecurity standards in the industry are managed by SEMI (the industry association). [22]
    • Organisations with the most satellites in space (and most potential for space debris) are mainly US companies like SpaceX. That said, satellite count doesn't reveal which satellites are most essential in which orbit. It may be that national militaries (namely the US, Chinese, and Russian militaries) or scientific organisations (NASA, NOAA, ESA, CAST, CAS, ISRO, ...) have more essential satellites to protect.[23]
    • From preliminary research on the US Defense Logistics Agency's public database on military standards[24], modern standards on cybersecurity seem classified. If so, research on military standards would be less feasible. 
  • How can an AI model be developed with continuous integrity checks? Ie. What would be a zero-trust architecture for AI model development?
    • Ex: Maybe a separate check on data integrity is run before each epoch of training. Though this is just one part of the model development process.
  • How can we enable Internet-scale digital forensics (ie. provenance verification for every site, image, post, etc.)?
    • This involves technical aspects. Ex: it might be efficient to watermark the sources of AI generated content (except for opensourced models).
    • This involves policies for corporations. Maybe governments hold media publishers legally liable for misinformation?  
    • This involves policies affecting individuals. Maybe more scholarships are needed to educate workers in digital forensics research?
  • Which improvements are needed in current tools to evade government censorship and surveillance? Which projects, softwares, etc. are cost-effective investments? 


Research questions about private organisations' policy:

  • How can risks from insider threats and employee errors most effectively be reduced at frontier AI labs?
    • This research question could be focused by considering actions that companies can take in hiring, training, privilege/access management, improving employee satisfaction, red teaming, anomaly detection, etc.
  • How can existing trusted media sources be aided in distinguishing AI-generated content from reality? If needed, how can any novel technical tools, staff hires, or employee retraining be funded?
    • AI-generated content includes fake text, audio, images, etc. Though it also includes attempts at spoofing/counterfeiting humans or organisations. 


Research questions about government policy:

  • Which controls on the distribution of sensitive AI data, if any, should be enacted to reduce the misuse of open source models? 
    • Finding precise answers is important here. Ex: Create a specific benchmark for a specific dangerous capability to ban in opensourced models.
    • Also, which technologies or laws could  overcome the current shortcomings of efforts against piracy and corporate espionage? 
  • Which policies can mitigate positive externalities (companies not investing enough in societal interests) from enhanced cybersecurity at frontier AI labs?
    • Specific policies to compare might include technical assistance programs, safety requirements and inspection regimes, grant funding, etc. 
  • What are the most cost-effective ways to create international treaties to prevent the use of AI in nuclear weapons management? 
    • The best cybersecurity for AI algorithms is not needing cybersecurity for AI algorithms since we didn’t use the AI algorithms. 
  • How can governments set up quick, reliable, secure, and efficient monitoring into activities at frontier AI labs? Specifically, in the context of inspection and enforcement of policy requirements.
    • Ex: Requiring the companies to submit reports at certain points in model development (ex: approval before release) might not give very consistent data to enforcement agencies. Whereas automatic monitoring of logs on advanced GPUs might provide more granular and consistent data, but also creates more cybersecurity / trade secret concerns.[25] 
  • What would be the cost-effectiveness of governments offering political candidates increased cybersecurity while they’re running for office? At which government levels would it be cost-effective? 

I'm going to start more research into some of the research questions above. I'd very much appreciate feedback on flaws in my reasoning, suggested priorities in the research questions, or existing solutions for any of the questions!

  1. ^

    T. Shevlane, ‘Structured access: an emerging paradigm for safe AI deployment’, 2022, doi: 10.48550/ARXIV.2201.05159. Available: https://arxiv.org/abs/2201.05159. [Accessed: Sep. 18, 2023]

  2. ^

    J. E. Barnes, B. Wintrode, and J. Daemmrich, ‘A Babysitter and a Band-Aid Wrapper: Inside the Submarine Spy Case’, The New York Times, Oct. 11, 2021. Available: https://www.nytimes.com/2021/10/11/us/politics/inside-submarine-spy-case.html. [Accessed: Sep. 18, 2023]

  3. ^

    IBM Security, ‘Cost of a Data Breach 2022’, Armonk, NY, United States, Jul. 2022. Available: https://www.ibm.com/downloads/cas/3R8N1DZJ. [Accessed: Sep. 18, 2023]

  4. ^

    D. Arp et al., ‘Dos and Don’ts of Machine Learning in Computer Security’, 2020, doi: 10.48550/ARXIV.2010.09470. Available: https://arxiv.org/abs/2010.09470. [Accessed: Sep. 18, 2023]

  5. ^

    S. Karnouskos, "Artificial Intelligence in Digital Media: The Era of Deepfakes," in IEEE Transactions on Technology and Society, vol. 1, no. 3, pp. 138-147, Sept. 2020, doi: 10.1109/TTS.2020.3001312. Available: https://ieeexplore.ieee.org/document/9123958. [Accessed: Sep. 18, 2023]

  6. ^

    J. Altmann and F. Sauer, ‘Autonomous Weapon Systems and Strategic Stability’, Survival, vol. 59, no. 5, pp. 117–142, Sep. 2017, doi: 10.1080/00396338.2017.1375263. Available: https://www.tandfonline.com/doi/full/10.1080/00396338.2017.1375263. [Accessed: Sep. 18, 2023]

  7. ^

    Open AI, ‘Open AI Security Portal’. Available: https://trust.openai.com/. [Accessed: Sep. 18, 2023]

  8. ^

    Inflection AI, ‘Safety’. Available: https://inflection.ai/safety. [Accessed: Sep. 18, 2023]

  9. ^

    Although US policy constrains autonomous nuclear command and control, other nuclear powers have no such policies.
    Office of U.S. Senator Edward Markey, ‘Markey, Lieu, Beyer, and Buck Introduce Bipartisan Legislation to Prevent AI From Launching a Nuclear Weapon ’. Available: https://www.markey.senate.gov/news/press-releases/markey-lieu-beyer-and-buck-introduce-bipartisan-legislation-to-prevent-ai-from-launching-a-nuclear-weapon. [Accessed Sep. 16, 2023]

  10. ^

    F. Bajak, ‘Insider Q&A: Artificial intelligence and cybersecurity in military tech’, AP News, May 29, 2023. Available: https://apnews.com/article/ai-cybersecurity-military-tech-weapons-systems-offensive-hacking-cyber-command-ae2a9417909388237d3667d1c61b0f99. [Accessed: Sep. 18, 2023]

  11. ^

    J. Carlsmith, ‘Is Power-Seeking AI an Existential Risk?’, 2022, doi: 10.48550/ARXIV.2206.13353. Available: https://arxiv.org/abs/2206.13353. [Accessed: Sep. 18, 2023]

  12. ^

    J. Babcock, J. Kramár, and R. Yampolskiy, ‘The AGI Containment Problem’, in Artificial General Intelligence, B. Steunebrink, P. Wang, and B. Goertzel, Eds., Cham: Springer International Publishing, 2016, pp. 53–63. doi: 10.1007/978-3-319-41649-6_6. Available: http://link.springer.com/10.1007/978-3-319-41649-6_6. [Accessed: Sep. 18, 2023]

  13. ^

    J. Steinhardt, P. W. W. Koh, and P. S. Liang, ‘Certified Defenses for Data Poisoning Attacks’, in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., Curran Associates, Inc., 2017. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/9d7311ba459f9e45ed746755a32dcd11-Paper.pdf. [Accessed: Sep. 18, 2023]

  14. ^

    S. Balla, ‘Fake comments flooded in when the FCC repealed net neutrality. They may count less than you think.’, Washington Post, Dec. 14, 2017. Available: https://www.washingtonpost.com/news/monkey-cage/wp/2017/12/14/there-was-a-flood-of-fake-comments-on-the-fccs-repeal-of-net-neutrality-they-may-count-less-than-you-think/. [Accessed: Sep. 19, 2023]

  15. ^

    W. Zaremba et al., ‘Democratic Inputs to AI’, OpenAI Blog, May 25, 2023. Available: https://openai.com/blog/democratic-inputs-to-ai. [Accessed: Sep. 18, 2023]

  16. ^

    D. Klepper and A. Swenson, ‘AI-generated disinformation poses threat of misleading voters in 2024 election’, PBS NewsHour, May 14, 2023. Available: https://www.pbs.org/newshour/politics/ai-generated-disinformation-poses-threat-of-misleading-voters-in-2024-election. [Accessed: Sep. 18, 2023]

  17. ^

    E. Starker, ‘ Large Scale Analysis of DNS Query Logs Reveals Botnets in the Cloud ’, Azure Blog, Mar. 27, 2017. Available: https://techcommunity.microsoft.com/t5/security-compliance-and-identity/large-scale-analysis-of-dns-query-logs-reveals-botnets-in-the/m-p/57064. [Accessed: Sep. 18, 2023]

  18. ^

    M. Abdelsalam, R. Krishnan, Y. Huang and R. Sandhu, "Malware Detection in Cloud Infrastructures Using Convolutional Neural Networks," 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), San Francisco, CA, USA, 2018, pp. 162-169, doi: 10.1109/CLOUD.2018.00028. Available: https://ieeexplore.ieee.org/document/8457796. [Accessed: Sep. 18, 2023]

  19. ^

    D. Simchi-Levi, F. Zhu, and M. Loy, ‘Fixing the U.S. Semiconductor Supply Chain’, Harvard Business Review, Oct. 25, 2022. Available: https://hbr.org/2022/10/fixing-the-u-s-semiconductor-supply-chain. [Accessed: Sep. 18, 2023]

  20. ^

    C. Van Camp and W. Peeters, ‘A World without Satellite Data as a Result of a Global Cyber-Attack’, Space Policy, vol. 59, p. 101458, Feb. 2022, doi: 10.1016/j.spacepol.2021.101458. Available: https://linkinghub.elsevier.com/retrieve/pii/S0265964621000503. [Accessed: Sep. 18, 2023]

  21. ^

    M. D. Zabel and C. Reid, ‘A brief history of prions’, Pathogens and Disease, vol. 73, no. 9, p. ftv087, Dec. 2015, doi: 10.1093/femspd/ftv087. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4626585/. [Accessed: Sep. 19, 2023]

  22. ^

    J. Amano, ‘SEMI Publishes First Cybersecurity Standards ’, Standards Watch, Mar. 07, 2022. Available: https://www.semi.org/en/standards-watch-2022-March/SEMI-publishes-first-cybersecurity-standards. [Accessed: Sep. 19, 2023]

  23. ^

    P. Rome, ‘Every Satellite Orbiting Earth and Who Owns Them’, Data Acquisition Knowledge Base, Jan. 18, 2022. Available: https://dewesoft.com/blog/every-satellite-orbiting-earth-and-who-owns-them. [Accessed: Sep. 19, 2023]

  24. ^

    The standards I read in depth are: DI-IPSC-82249, DI-IPSC-82250, DI-IPSC-82251, and DI-IPSC-82252.

    Defense Logistics Agency, ‘ASSIST’. Sep. 18, 2023. Available: https://quicksearch.dla.mil/qsSearch.aspx. [Accessed: Sep. 19, 2023]

  25. ^

    Y. Shavit, ‘What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring’, 2023, doi: 10.48550/ARXIV.2303.11341. Available: https://arxiv.org/abs/2303.11341. [Accessed: Sep. 19, 2023]

No comments on this post yet.
Be the first to respond.