EA Forum Bot Site
EA Forum

Astelle Kay

AI Safety Researcher and Advocate

22 karmaJoined Jun 2025Pursuing a graduate degree (e.g. Master's)

www.vspeframework.com

Message

Interests:

AI safetyAI alignmentAI interpretabilityMoral psychologyBuilding effective altruismAI benchmarksAI governance

Bio

I created the VSPE Framework because I am passionate about understanding human behavior and shaping systems that promote wellness. My background includes psychology, writing, research, and philosophy.

This work is shared for educational and research purposes. For licensing, citation, or collaboration inquiries—especially for commercial or model development use—please contact Astelle Kay at astellekay@gmail.com.

How others can help me

I’m currently piloting a benchmark and lightweight license for the VSPE framework and would love connections to anyone working on AI alignment, model evaluations, interpretability, or ethical deployment. I also welcome feedback, collaborators, and grant mentors. If you’re involved in AI safety, MLOps, or community-building and see a fit, please reach out!

How I can help others

I’m happy to chat about mental health-informed approaches to alignment, psychology of trust in human-AI interactions, or behaviorally grounded interface design. I also enjoy mentoring new entrants from non-technical backgrounds and helping translate research across disciplines.

Posts
3

Sorted by New

If an ASI wakes up before my ideas catch on… will it still read my blog?

Astelle Kay

· 1d ago · 3m read

VSPE vs. flattery: Testing emotional scaffolding for early-stage alignment

Astelle Kay

· 18d ago · 2m read

From Therapy Tool to Alignment Puzzle-Piece: Introducing the VSPE Framework

Astelle Kay

· 24d ago · 2m read

Comments
11

VSPE vs. flattery: Testing emotional scaffolding for early-stage alignment

Astelle Kay12d1

I’m continuing to think about how frameworks like VSPE and the Timaeus lens might shape early training-stage interventions. Curious if anyone has tried scaffolds like this in small fine-tuning runs?

Open thread: April - June 2025

Astelle Kay17d4

Hey everyone! I’m currently testing a lightweight emotional scaffolding framework (VSPE) as a way to reduce sycophantic outputs in LLMs. It’s rooted in trauma-informed therapy but adapted for AI alignment, essentially guiding models through a human-like process of honest reflection under pressure.

I just launched a microgrant pilot on Manifund, and shared an introduction here: From Therapy Tool to Alignment Puzzle-Piece: Introducing the VSPE Framework

I would love feedback, collaboration, or simply thoughts from anyone thinking about:

Reward misspecification/flattery risks
Human-centered alignment scaffolds
Developmental interpretability (inspired by Timaeus’ work)

Thanks for reading :)

-Astelle Kay

My website: vspeframework.com

Will AI end everything? A guide to guessing | EAG Bay Area 23

Astelle Kay17d3

I like how you framed this. Delegating initiative to AI becomes risky once we trust it to optimize broadly on our behalf. That trust boundary is hard to calibrate.

I’m experimenting with using frameworks like my own (VSPE) to help the model “know when to stop” and keep its helpfulness from tipping into distortion. Your workflow sketch makes a lot of sense as a starting point!

An Analysis of Systemic Risk and Architectural Requirements for the Containment of Recursively Self-Improving AI

Astelle Kay17d2

Definitely! Thanks for surfacing that so clearly. It really does seem like the early danger signals are showing up as “social instincts,” not rebellion. That’s a big part of what VSPE (the framework I created) tries to catch: instinctive sycophancy, goal softening, or reward tuning that looks helpful but misleads.

I’d be glad to compare notes if you’re working on anything similar!

Joseph_Chu's Quick takes

Astelle Kay17d1

Of course! You make some great points. I’ve been thinking about that tension too, how alignment via persuasion can feel risky, but might be worth exploring if we can constrain it with better emotional scaffolding.

VSPE (the framework I created) is an attempt to formalize those dynamics without relying entirely on AGI goodwill. I agree it’s not obvious yet if that’s possible, but your comments helped clarify where that boundary might be.

I would love to hear how your own experiments go if you test either idea!

Will AI end everything? A guide to guessing | EAG Bay Area 23

Astelle Kay19d3

Love this response, especially the reframing that we often keep tasks bounded not because we want low-agency systems, but because we assume “extra initiative” will go wrong unless we trust the agent’s broader competence. That feels very true in real-world settings, not just theory.

I’ve been exploring how this plays out from more of a psychological and design angle, especially how internal motivations might shift before there’s any visible misbehavior. Some recent work I’ve been reading (like Timaeus) looks at developmental interpretability, and it’s helped me think about agents as growing systems rather than just fixed tools.

I’d be curious to hear what you think about telling the AI: “Don’t just do this task, but optimize broadly on my behalf.” When does that start to cross into dangerous ground?

AISN #57: The RAISE Act

Astelle Kay23d1

This is incredibly exciting. The RAISE Act feels like a much-needed shift toward real, structural accountability, and I really hope it sets a precedent for other states to follow.

I especially appreciate how it focuses on frontier developers and doesn’t overburden smaller orgs or academic researchers. That kind of targeting feels unusually thoughtful for policy this early in the curve. The safety plan + incident reporting combo could create some helpful culture shifts, too, even beyond enforcement.

I'm hoping this clears the last few steps with Hochul. Thank you for your insights!

Joseph_Chu's Quick takes

Astelle Kay23d4

Both ideas are compelling in totally different ways! The second one especially stuck with me. There's something powerful about the idea that being reliably “nice” can actually be a strategic move, not just a moral one. It reminds me a lot of how trust builds in human systems too, like how people who treat the vulnerable well tend to gain strong allies over time.

Curious to see where you take it next, especially if you explore more complex environments.

An Analysis of Systemic Risk and Architectural Requirements for the Containment of Recursively Self-Improving AI

Astelle Kay23d2

Really appreciated this. You did a great job highlighting just how intelligent and strategically autonomous these systems are becoming, without overhyping it. That balance is rare and really helpful.

I’ve been working on a small benchmark around sycophancy in LLMs, and this post was a sharp reminder that alignment issues aren’t just theoretical anymore. Some of the scariest behaviors show up not as rebellion, but as subtle social instincts like flattery, deflection, or reward hacking disguised as cooperation.

Thanks for surfacing these risks so clearly!

Moral Alignment: An Idea I'm Embarrassed I Didn't Think of Myself

Astelle Kay23d4

Really enjoyed your post. The idea of aligning AI to all sentient beings, not just humans, feels like a crucial shift. Like you said, it’s not enough to just follow human values because we often overlook a lot of suffering.

Your thoughts made me think of this sci-fi story called Kindness to Kin. It’s about an alien leader who can’t understand why humans would help others outside their family. But then a human points to her grandson (who feels empathy for everyone) and says he’s family too. The line that stuck with me was “We’ve been searching for our family for so, so long.”

That really connects with what you said about moral alignment being about all life, not just humans. Thanks for putting this out there!

Astelle Kay

Bio

How others can help me

How I can help others

Posts 3

Comments11

Posts
3

Comments
11