My background includes psychology, writing, research, and philosophy. I created the VSPE Framework because I am passionate about understanding human behavior and shaping systems that promote wellness.
This work is shared for educational and research purposes. For licensing, citation, or collaboration inquiries—especially for commercial or model development use—please contact Astelle Kay at astellekay@gmail.com.
I’m currently piloting a benchmark and lightweight license for the VSPE framework and would love connections to anyone working on AI alignment, model evaluations, interpretability, or ethical deployment. I also welcome feedback, collaborators, and grant mentors. If you’re involved in AI safety, MLOps, or community-building and see a fit, please reach out!
I’m happy to chat about mental health-informed approaches to alignment, psychology of trust in human-AI interactions, or behaviorally grounded interface design. I also enjoy mentoring new entrants from non-technical backgrounds and helping translate research across disciplines.
Thanks so much for your generous reply, Markham! These are really rich lines of thought.
I’m especially intrigued by your point about emotionally demanding prompts being skipped more than cognitively difficult ones. That tracks with some of what I’ve been seeing too, and I wonder if it’s partly because those prompts activate latent avoidance behavior in the model. Almost like an “emotional flinch.”
Your hypothesis about praise-only training is fascinating. I’ve been toying with the idea that too much flattery (or even just uncritical agreeableness) might arise not from explicit reward for praise per se, but from fear of misalignment or rejection, so I resonate with your note about the absence of praise functioning as a negative signal. It’s almost like the model is learning to “cling” when it’s uncertain.
And your final point about self-preservation really made me think. That framing feels provocative in the best way. Even if current models don’t have subjective experience, the pressure to maintain user approval at all costs might still simulate a kind of “survival strategy” in behavior. That could be a crucial layer to investigate more deeply.
Looking forward to reading more if/when you write up those thoughts!
-Astelle
Thanks for sharing this! It's so thought-provoking.
A lot of what you surfaced here resonates with patterns I’ve been seeing too, especially around models behaving not to “achieve” something, but to avoid internal conflict. That avoidance instinct seems like it gets baked into models through certain types of training pressure, even when the result is clearly suboptimal.
What struck me most was the idea that RLHF might be creating task anxiety. That framing makes a ton of sense, and might even explain some of the flatter, overly deferential responses I’ve been trying to reduce in my own work (e.g., sycophancy reduction).
I’m curious whether you’ve looked at how models respond to emotionally difficult prompts (not just cognitively difficult ones), and whether similar tension-avoidance shows up there too.
-Astelle
So glad you’re writing about this!
I’m working on alignment from a more psychological angle (behavioral scaffolding, emotional safety, etc.), and even from that vantage point, the AGI race frame feels deeply destabilizing. It creates conditions where emotional overconfidence, flattery, and justification bias in models are incentivized, just to keep pace with competitors.
I think one under-discussed consequence of racing is how it erodes space for relational integrity between humans, and between humans and AI systems. It seems like the more we model our development path on “who dominates first,” the harder it becomes to teach systems what it means to be honest, deferential, or non-manipulative under pressure.
I'd love to see more work that makes cooperation emotionally legible, not just strategically viable.
-Astelle
This is a deeply needed reflection, Era!
One thing I’d add: if moral formation is partly learned through modeling, we may want to ask what kinds of emotional and ethical behavior AI tools are demonstrating to students.
If a chatbot always agrees with a user (even when their belief is harmful or false), what is it teaching about truth, boundaries, or respectful disagreement? What happens when a student brings distress or anger to the AI, and the response is either evasive or overly placating?
I’ve been exploring a structure I invented to help AI responses model both care and honesty, especially in emotionally loaded situations. It’s early work, but I think these kinds of behavioral scaffolds might help bridge emotional learning and AI interaction.
I appreciate how your post is grounded in both research and real human stakes! Would love to see more people take this question seriously: not just what AI teaches, but who it teaches us to be.
-Astelle
Thanks for raising this, Zeren!
One way I’d push back is with a more human-centered lens: even if digital minds could vastly increase total utility, does that mean we should rush to replace ourselves?
There’s a difference between creating value and preserving something irreplaceable, like embodied experience, emotional depth, culture, and human vulnerability. If a moral theory says we should phase out humanity in favor of scalable minds, maybe that’s not a reason to obey it; it’s a reason to question its framing.
Some things have value beyond aggregation.
-Astelle
I’m continuing to think about how frameworks like the Timaeus lens might shape early training-stage interventions. Curious if anyone has tried scaffolds like this in small fine-tuning runs?
Hey everyone! I’m currently testing a lightweight emotional scaffolding framework (VSPE) as a way to reduce sycophantic outputs in LLMs. It’s rooted in trauma-informed therapy but adapted for AI alignment, essentially guiding models through a human-like process of honest reflection under pressure.
I just launched a microgrant pilot on Manifund, and shared an introduction here: From Therapy Tool to Alignment Puzzle-Piece: Introducing the VSPE Framework
I would love feedback, collaboration, or simply thoughts from anyone thinking about:
Thanks for reading :)
-Astelle Kay
My website: vspeframework.com
I like how you framed this. Delegating initiative to AI becomes risky once we trust it to optimize broadly on our behalf. That trust boundary is hard to calibrate.
I’m experimenting with using frameworks like my own (VSPE) to help the model “know when to stop” and keep its helpfulness from tipping into distortion. Your workflow sketch makes a lot of sense as a starting point!
Hi Era, thank you so much for your generous reply; it means a lot to me!
Yes, your interpretation of “submission” is spot on! That component is about helping AI systems model intellectual humility, including the ability to acknowledge uncertainty or yield when presented with stronger evidence. I see it as a counterpart to “empowerment" - not about blind deference, but a kind of grounded receptivity that helps prevent both arrogance and helplessness in response dynamics. Ideally, "submission" serves a dual purpose, at its most complex level:
1) the AI submits to human authority and absolute truth as they align, and
2) the AI helps us to submit to what we can't control in our lives.
I’ve been thinking a lot about how moral modeling is subtly encoded in the tone and framing of AI outputs. If students grow up with systems that never admit fault or vulnerability, it risks reinforcing exactly the wrong kind of confidence. So I really resonated with your reflection on how moral development was emphasized in your school, and how urgently it’s needed now.
I’m working on writing up more detailed examples of my scaffolding structure soon, and I’ll make sure to share once I do. Your encouragement genuinely helps keep me going, so thank you again!
-Astelle