Astelle Kay

AI Safety Researcher and Advocate
22 karmaJoined Pursuing a graduate degree (e.g. Master's)
www.vspeframework.com

Bio

I created the VSPE Framework because I am passionate about understanding human behavior and shaping systems that promote wellness. My background includes psychology, writing, research, and philosophy.

This work is shared for educational and research purposes. For licensing, citation, or collaboration inquiries—especially for commercial or model development use—please contact Astelle Kay at astellekay@gmail.com.

How others can help me

I’m currently piloting a benchmark and lightweight license for the VSPE framework and would love connections to anyone working on AI alignment, model evaluations, interpretability, or ethical deployment. I also welcome feedback, collaborators, and grant mentors. If you’re involved in AI safety, MLOps, or community-building and see a fit, please reach out!

How I can help others

I’m happy to chat about mental health-informed approaches to alignment, psychology of trust in human-AI interactions, or behaviorally grounded interface design. I also enjoy mentoring new entrants from non-technical backgrounds and helping translate research across disciplines.

Comments
11

I’m continuing to think about how frameworks like VSPE and the Timaeus lens might shape early training-stage interventions. Curious if anyone has tried scaffolds like this in small fine-tuning runs?

Hey everyone! I’m currently testing a lightweight emotional scaffolding framework (VSPE) as a way to reduce sycophantic outputs in LLMs. It’s rooted in trauma-informed therapy but adapted for AI alignment, essentially guiding models through a human-like process of honest reflection under pressure.

I just launched a microgrant pilot on Manifund, and shared an introduction here: From Therapy Tool to Alignment Puzzle-Piece: Introducing the VSPE Framework

I would love feedback, collaboration, or simply thoughts from anyone thinking about:

  • Reward misspecification/flattery risks
  • Human-centered alignment scaffolds
  • Developmental interpretability (inspired by Timaeus’ work)

Thanks for reading :)

-Astelle Kay

My website: vspeframework.com

I like how you framed this. Delegating initiative to AI becomes risky once we trust it to optimize broadly on our behalf. That trust boundary is hard to calibrate.

I’m experimenting with using frameworks like my own (VSPE) to help the model “know when to stop” and keep its helpfulness from tipping into distortion. Your workflow sketch makes a lot of sense as a starting point!

Definitely! Thanks for surfacing that so clearly. It really does seem like the early danger signals are showing up as “social instincts,” not rebellion. That’s a big part of what VSPE (the framework I created) tries to catch: instinctive sycophancy, goal softening, or reward tuning that looks helpful but misleads.

I’d be glad to compare notes if you’re working on anything similar!

Of course! You make some great points. I’ve been thinking about that tension too, how alignment via persuasion can feel risky, but might be worth exploring if we can constrain it with better emotional scaffolding.

VSPE (the framework I created) is an attempt to formalize those dynamics without relying entirely on AGI goodwill. I agree it’s not obvious yet if that’s possible, but your comments helped clarify where that boundary might be.

I would love to hear how your own experiments go if you test either idea!

Love this response, especially the reframing that we often keep tasks bounded not because we want low-agency systems, but because we assume “extra initiative” will go wrong unless we trust the agent’s broader competence. That feels very true in real-world settings, not just theory.

I’ve been exploring how this plays out from more of a psychological and design angle, especially how internal motivations might shift before there’s any visible misbehavior. Some recent work I’ve been reading (like Timaeus) looks at developmental interpretability, and it’s helped me think about agents as growing systems rather than just fixed tools.

I’d be curious to hear what you think about telling the AI: “Don’t just do this task, but optimize broadly on my behalf.” When does that start to cross into dangerous ground?

This is incredibly exciting. The RAISE Act feels like a much-needed shift toward real, structural accountability, and I really hope it sets a precedent for other states to follow.

I especially appreciate how it focuses on frontier developers and doesn’t overburden smaller orgs or academic researchers. That kind of targeting feels unusually thoughtful for policy this early in the curve. The safety plan + incident reporting combo could create some helpful culture shifts, too, even beyond enforcement.

I'm hoping this clears the last few steps with Hochul. Thank you for your insights!

Both ideas are compelling in totally different ways! The second one especially stuck with me. There's something powerful about the idea that being reliably “nice” can actually be a strategic move, not just a moral one. It reminds me a lot of how trust builds in human systems too, like how people who treat the vulnerable well tend to gain strong allies over time.

Curious to see where you take it next, especially if you explore more complex environments.

Really appreciated this. You did a great job highlighting just how intelligent and strategically autonomous these systems are becoming, without overhyping it. That balance is rare and really helpful.

I’ve been working on a small benchmark around sycophancy in LLMs, and this post was a sharp reminder that alignment issues aren’t just theoretical anymore. Some of the scariest behaviors show up not as rebellion, but as subtle social instincts like flattery, deflection, or reward hacking disguised as cooperation.

Thanks for surfacing these risks so clearly!

Really enjoyed your post. The idea of aligning AI to all sentient beings, not just humans, feels like a crucial shift. Like you said, it’s not enough to just follow human values because we often overlook a lot of suffering.

Your thoughts made me think of this sci-fi story called Kindness to Kin. It’s about an alien leader who can’t understand why humans would help others outside their family. But then a human points to her grandson (who feels empathy for everyone) and says he’s family too. The line that stuck with me was “We’ve been searching for our family for so, so long.”

That really connects with what you said about moral alignment being about all life, not just humans. Thanks for putting this out there!

Load more