🔹 Background: Started as a data privacy specialist, focusing on Privacy-Enhancing Technologies (PETs) in technological innovation.
🔹 AI Governance Shift:
🔹 What I Do: Bridging AI governance, privacy engineering, and alignment-adjacent control mechanisms to ensure AI enhances (not replaces) human decision-making.
🔹 Current Research – Autonomy by Design (AbD):
🔹 Long-Term Vision:
I’m actively exploring how mechanistic interpretability can be leveraged to give users real control over the inferences GenAI models make about them.
🔹 If you’re working on mechanistic interpretability, interpretability-adjacent research, or UX for AI transparency, I’d love to discuss how these efforts can extend beyond AI oversight into direct user control over AI inferences.
🔹 I’m particularly interested in how feature-level interpretability (like Anthropic’s feature steering) could apply to real-time inference contestability, so that users don’t just see what AI assumes, but can intervene and correct how AI reasons about them in deployed systems.
🔹 If you’re in AI governance, alignment, privacy, or HCI, let’s connect. I believe autonomy-centered AI needs multidisciplinary collaboration to be taken seriously.
If you have insights, critiques, or research that overlaps with this, I’d love to hear from you!
I work at the intersection of AI governance, privacy, and alignment-adjacent research, focusing on Autonomy by Design.
🔹 AI Governance & Privacy: If you’re navigating the regulatory landscape, I can help bridge legal, technical, and alignment perspectives to make governance frameworks actionable in real-world AI deployment.
🔹 Mechanistic Interpretability & UX Research: If you’re working on interpretability, AI safety, or user experience in AI control, I can help connect research on transparency and control to real-world autonomy-preserving interfaces.
🔹 Multidisciplinary Collaboration: AI alignment, privacy, and HCI need to work together. If you’re looking for insights on how to make autonomy-preserving AI credible, actionable, and scalable, I’d love to contribute.
I’m here to exchange ideas, challenge assumptions, and help build AI systems where users have real choice over how AI impacts them.
Would a safety-focused breakdown of the EU AI Act be useful to you?
The Future of Life Institute published a great high-level summary of the EU AI Act here: https://artificialintelligenceact.eu/high-level-summary/
What I’m proposing is a complementary, safety-oriented summary that extracts the parts of the AI Act that are most relevant to AI alignment researchers, interpretability work, and long-term governance thinkers.
It would include:
Target length: 3–5 pages, written for technical researchers and governance folks who want signal without wading through dense regulation.
If this sounds useful, I’d love to hear what you’d want to see included, or what use cases would make it most actionable.
And if you think this is a bad idea, no worries. Just please don’t downvote me into oblivion, I just got to decent karma :).
Thanks in advance for the feebdack!
The AI alignment community had a major victory in the regulatory landscape, and it went unnoticed by many.
The EU AI Act explicitly mentions "alignment with human intent" as a key focus area in relation to regulation of systemic risks.
As far as I know, this is the first time “alignment” has been mentioned by a law, or major regulatory text.
It’s buried in Recital 110, but it’s there. And it also makes research on AI Control relevant:
"International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent".
This means that alignment is now part of the EU’s regulatory vocabulary.
But here’s the issue: most AI governance professionals and policymakers still don’t know what it really means, or how your research connects to it.
I’m trying to build a space where AI Safety and AI Governance communities can actually talk to each other.
If you're curious, I wrote an article about this, aimed at the corporate decision-makers that lack literacy on your area.
Would love any feedback, especially from folks thinking about how alignment ideas can scale into the policy domain.
Here is the Substack link (I also posted it on LinkedIn):
My intuition says that this was a push from Future of Life Institute.
Thoughts? Did you know about this already?
How should AI alignment and autonomy preservation intersect in practice?
We know that AI alignment research has made significant progress in embedding internal constraints that prevent models from manipulating, deceiving, or coercing users (to the extent that they don’t). However, internal alignment mechanisms alone don’t necessarily give users meaningful control over AI’s influence on their decision-making. Which is a mechanistic problem on its own, but…
This raises a question: Should future AI systems be designed to not only align with human values but also expose their influence in ways that allow users to actively contest and reshape AI-driven inferences?
For example:
I’m exploring a concept I call Autonomy by Design, a control-layer approach that builds on alignment research but adds external, user-facing mechanisms to make AI’s reasoning and influence more contestable.
Would love to hear from interpretability experts, and UX designers: Where do you see the biggest challenges in implementing user-facing autonomy safeguards? Are there existing methodologies that could be adapted for this purpose?
Thank you in advance.
Feel free to shatter this if you must XD.
Two days ago, I published a Substack article called "The Epistemics of Being a Mudblood: Stress Testing intellectual isolation". I wasn’t sure whether to cross-post it here, but a few people encouraged me to at least share the link.
By background I’m a lawyer (hybrid Legal-AI Safety researcher), and I usually write about AI Safety to spread awareness among tech lawyers and others who might not otherwise engage with the field.
This post, though, is more personal: a reflection on how “deep thinking” and rationalist habits have shaped my best professional and personal outputs, even through long phases of intellectual isolation. Hence the “mudblood” analogy, which (to my surprise) resonated with more people than I expected.
Sharing here in case it’s useful. Obviously very open to criticism and feedback (that’s why I’m here!), but also hoping it’s of some help. :)