Many talented lawyers do not contribute to AI Safety, simply because they've never had a chance to work with AIS researchers or don’t know what the field entails.
I am hopeful that this can improve if we create more structured opportunities for cooperation. And this is the main motivation behind the upcoming AI Safety Law-a-thon, organised by AI-Plans[1]:
A hackathon where every team pairs one lawyer with one technical AI safety researcher. Each pair will tackle challenges drawn up from real legal bottlenecks and overlooked AI safety risks.
From my time in the tech industry, my suspicion is that if more senior counsel actually understood alignment risks, frontier AI deals would face far more scrutiny. Right now, most law firms would focus on IP rights or privacy clauses when giving advice to their clients- not on whether model alignment drift could blow up the contract six months after signing.
We launched the event one day ago, and we already have an impressive lineup of senior counsel from top firms and regulators. What we still need are technical AI safety people to pair with them!
If you join, you'll help stress-test the legal scenarios and point out the alignment risks that are not salient to your counterpart (they’ll be obvious to you, but not to them).
You’ll also get the chance to put your own questions to experienced attorneys.
Feel free to DM me if you want to raise any queries!
^
NOTE: I really want to improve how I communicate updates like these. If this sounds too salesy or overly persuasive, it would really help me if you comment and suggest how to improve the wording.
I find this more effective than just downvoting- but of course, do so if you want. Thank you in advance!.
One thing is that emojis are pretty rare on the Forum (despite being popular in places like LinkedIn and some slacks), so they sometimes make things appear more salesy/ even LLM generated. In my opinion, your text itself doesn't seem to salesy or overly persuasive.
This sounds valuable! Quick question about participation: I'm an EA-aligned lawyer concerned about AI safety, though not currently at a top firm or working directly in AI regulation. Would someone with general legal expertise and strong motivation to contribute to AI safety be useful for this, or are you specifically looking for lawyers already working in tech/AI policy?
I imagine fresh perspectives from lawyers outside the usual AI circles could be valuable for spotting overlooked risks, but wanted to check if that fits what you're envisioning.
Apparently emojis don't render properly on Firefox. I didn't see any emojis so I tried opening this page on Chrome and indeed they are there, but they don't show up in my normal browser.
What I’m proposing is a complementary, safety-oriented summary that extracts the parts of the AI Act that are most relevant to AI alignment researchers, interpretability work, and long-term governance thinkers.
It would include:
Provisions related to transparency, human oversight, and systemic risks
Notes on how technical safety tools (e.g. interpretability, scalable oversight, evals) might interface with conformity assessments, or the compliance exemptions available for research work.
Commentary on loopholes or compliance dynamics that could shape industry behavior
What the Act doesn't currently address from a frontier risk or misalignment perspective
Target length: 3–5 pages, written for technical researchers and governance folks who want signal without wading through dense regulation.
If this sounds useful, I’d love to hear what you’d want to see included, or what use cases would make it most actionable.
And if you think this is a bad idea, no worries. Just please don’t downvote me into oblivion, I just got to decent karma :).
By background I’m a lawyer (hybrid Legal-AI Safety researcher), and I usually write about AI Safety to spread awareness among tech lawyers and others who might not otherwise engage with the field.
This post, though, is more personal: a reflection on how “deep thinking” and rationalist habits have shaped my best professional and personal outputs, even through long phases of intellectual isolation. Hence the “mudblood” analogy, which (to my surprise) resonated with more people than I expected.
Sharing here in case it’s useful. Obviously very open to criticism and feedback (that’s why I’m here!), but also hoping it’s of some help. :)
The AI alignment community had a major victory in the regulatory landscape, and it went unnoticed by many.
The EU AI Act explicitly mentions "alignment with human intent" as a key focus area in relation to regulation of systemic risks.
As far as I know, this is the first time “alignment” has been mentioned by a law, or major regulatory text.
It’s buried in Recital 110, but it’s there. And it also makes research on AI Control relevant:
"International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent".
This means that alignment is now part of the EU’s regulatory vocabulary.
But here’s the issue: most AI governance professionals and policymakers still don’t know what it really means, or how your research connects to it.
I’m trying to build a space where AI Safety and AI Governance communities can actually talk to each other.
If you're curious, I wrote an article about this, aimed at the corporate decision-makers that lack literacy on your area.
Would love any feedback, especially from folks thinking about how alignment ideas can scale into the policy domain.
Here is the Substack link (I also posted it on LinkedIn):
How should AI alignment and autonomy preservation intersect in practice?
We know that AI alignment research has made significant progress in embedding internal constraints that prevent models from manipulating, deceiving, or coercing users (to the extent that they don’t). However, internal alignment mechanisms alone don’t necessarily give users meaningful control over AI’s influence on their decision-making. Which is a mechanistic problem on its own, but…
This raises a question: Should future AI systems be designed to not only align with human values but also expose their influence in ways that allow users to actively contest and reshape AI-driven inferences?
For example:
If an AI model generates an inference about a user (e.g., “this person prefers risk-averse financial decisions”), should users be able to see, override, or refine that inference?
If an AI assistant subtly nudges users toward certain decisions, should it disclose those nudges in a way that preserves user autonomy?
Could mechanisms like adaptive user interfaces (allowing users to adjust how AI explains itself) or AI-generated critiques of its own outputs serve as tools for reinforcing autonomy rather than eroding it?
I’m exploring a concept I call Autonomy by Design, a control-layer approach that builds on alignment research but adds external, user-facing mechanisms to make AI’s reasoning and influence more contestable.
Would love to hear from interpretability experts, and UX designers: Where do you see the biggest challenges in implementing user-facing autonomy safeguards? Are there existing methodologies that could be adapted for this purpose?
Many talented lawyers do not contribute to AI Safety, simply because they've never had a chance to work with AIS researchers or don’t know what the field entails.
I am hopeful that this can improve if we create more structured opportunities for cooperation. And this is the main motivation behind the upcoming AI Safety Law-a-thon, organised by AI-Plans[1]:
A hackathon where every team pairs one lawyer with one technical AI safety researcher. Each pair will tackle challenges drawn up from real legal bottlenecks and overlooked AI safety risks.
From my time in the tech industry, my suspicion is that if more senior counsel actually understood alignment risks, frontier AI deals would face far more scrutiny. Right now, most law firms would focus on IP rights or privacy clauses when giving advice to their clients- not on whether model alignment drift could blow up the contract six months after signing.
We launched the event one day ago, and we already have an impressive lineup of senior counsel from top firms and regulators. What we still need are technical AI safety people to pair with them!
If you join, you'll help stress-test the legal scenarios and point out the alignment risks that are not salient to your counterpart (they’ll be obvious to you, but not to them).
You’ll also get the chance to put your own questions to experienced attorneys.
📅 25–26 October
🌍 Hybrid: online + in-person (London)
If you’re up for it, sign up here: https://luma.com/8hv5n7t0
Feel free to DM me if you want to raise any queries!
^
NOTE: I really want to improve how I communicate updates like these. If this sounds too salesy or overly persuasive, it would really help me if you comment and suggest how to improve the wording.
I find this more effective than just downvoting- but of course, do so if you want. Thank you in advance!.
One thing is that emojis are pretty rare on the Forum (despite being popular in places like LinkedIn and some slacks), so they sometimes make things appear more salesy/ even LLM generated.
In my opinion, your text itself doesn't seem to salesy or overly persuasive.
This sounds valuable! Quick question about participation: I'm an EA-aligned lawyer concerned about AI safety, though not currently at a top firm or working directly in AI regulation. Would someone with general legal expertise and strong motivation to contribute to AI safety be useful for this, or are you specifically looking for lawyers already working in tech/AI policy?
I imagine fresh perspectives from lawyers outside the usual AI circles could be valuable for spotting overlooked risks, but wanted to check if that fits what you're envisioning.
Of course! We'd love to have you there!
Apparently emojis don't render properly on Firefox. I didn't see any emojis so I tried opening this page on Chrome and indeed they are there, but they don't show up in my normal browser.
Would a safety-focused breakdown of the EU AI Act be useful to you?
The Future of Life Institute published a great high-level summary of the EU AI Act here: https://artificialintelligenceact.eu/high-level-summary/
What I’m proposing is a complementary, safety-oriented summary that extracts the parts of the AI Act that are most relevant to AI alignment researchers, interpretability work, and long-term governance thinkers.
It would include:
Target length: 3–5 pages, written for technical researchers and governance folks who want signal without wading through dense regulation.
If this sounds useful, I’d love to hear what you’d want to see included, or what use cases would make it most actionable.
And if you think this is a bad idea, no worries. Just please don’t downvote me into oblivion, I just got to decent karma :).
Thanks in advance for the feebdack!
Two days ago, I published a Substack article called "The Epistemics of Being a Mudblood: Stress Testing intellectual isolation". I wasn’t sure whether to cross-post it here, but a few people encouraged me to at least share the link.
By background I’m a lawyer (hybrid Legal-AI Safety researcher), and I usually write about AI Safety to spread awareness among tech lawyers and others who might not otherwise engage with the field.
This post, though, is more personal: a reflection on how “deep thinking” and rationalist habits have shaped my best professional and personal outputs, even through long phases of intellectual isolation. Hence the “mudblood” analogy, which (to my surprise) resonated with more people than I expected.
Sharing here in case it’s useful. Obviously very open to criticism and feedback (that’s why I’m here!), but also hoping it’s of some help. :)
The EU AI Act explicitly mentions "alignment with human intent" as a key focus area in relation to regulation of systemic risks.
As far as I know, this is the first time “alignment” has been mentioned by a law, or major regulatory text.
It’s buried in Recital 110, but it’s there. And it also makes research on AI Control relevant:
"International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent".
This means that alignment is now part of the EU’s regulatory vocabulary.
But here’s the issue: most AI governance professionals and policymakers still don’t know what it really means, or how your research connects to it.
I’m trying to build a space where AI Safety and AI Governance communities can actually talk to each other.
If you're curious, I wrote an article about this, aimed at the corporate decision-makers that lack literacy on your area.
Would love any feedback, especially from folks thinking about how alignment ideas can scale into the policy domain.
Here is the Substack link (I also posted it on LinkedIn):
https://open.substack.com/pub/katalinahernandez/p/why-should-ai-governance-professionals?utm_source=share&utm_medium=android&r=1j2joa
My intuition says that this was a push from Future of Life Institute.
Thoughts? Did you know about this already?
How should AI alignment and autonomy preservation intersect in practice?
We know that AI alignment research has made significant progress in embedding internal constraints that prevent models from manipulating, deceiving, or coercing users (to the extent that they don’t). However, internal alignment mechanisms alone don’t necessarily give users meaningful control over AI’s influence on their decision-making. Which is a mechanistic problem on its own, but…
This raises a question: Should future AI systems be designed to not only align with human values but also expose their influence in ways that allow users to actively contest and reshape AI-driven inferences?
For example:
I’m exploring a concept I call Autonomy by Design, a control-layer approach that builds on alignment research but adds external, user-facing mechanisms to make AI’s reasoning and influence more contestable.
Would love to hear from interpretability experts, and UX designers: Where do you see the biggest challenges in implementing user-facing autonomy safeguards? Are there existing methodologies that could be adapted for this purpose?
Thank you in advance.
Feel free to shatter this if you must XD.