Hide table of contents

Hi EA Forum! 👋
I’m Astelle Kay, a counseling-psych grad student who moonlights in alignment whenever coursework (and caffeine) allow. Most of my brain lives where clinical psychology, systems thinking, and “please-let-humanity-stick-around” concerns intersect.

TL;DR

  • The VSPE Framework (Validation → Submission → Positivity → Empowerment) began life as a four-step therapy framework I tested with friends and family.
  • Side-effect: it cuts “flattery loops” - the reflex to mirror praise instead of telling the hard truth.
  • That feels relevant to large language models.
  • I’m shrinking VSPE into a 25-prompt Flattery-Reduction Benchmark plus a plug-and-play license so labs can tinker without hiring an ethics PhD and a lawyer.
  • Would love feedback, collaborators, or regrantor eyeballs before this grows beyond “one grad student + a whiteboard.”

From couch to compute cluster

  • Therapy roots.
    Validate problems → Submit to what we can't control → realistic Positivity → Empower next steps. Four verbs, no jargon.
  • Unexpected pattern.
    When prompted to utilize my framework and simply give "empathy without advice," my tiny GPT-4 chatbot stopped telling me I was brilliant and started giving candid, prosocial answers.
  • Hey, that’s sycophancy.
    Anthropic’s Constitutional AI and several ARC evals flag flattering compliance as a safety risk. VSPE seemed to nudge the same dial.
  • Fast-forward.
    Provisional US patent filed (mainly so nobody locks VSPE away). Reached Stage 2 of the 2025 MATS selection. Now running a Manifund pilot — 25 prompts, $9.8 k, December read-out.

Why this might matter

  • Psych heuristics are under-used. RLHF / RLAIF optimise “helpful & harmless,” not ego management or praise addiction.
  • Audit-friendly. Four plain verbs: easy to port, easy to critique, zero secret sauce.
  • Bridge material. Therapy researchers rarely read AF; alignment folks rarely parse CBT manuals. VSPE tries to translate a sliver of each world.

How you can stress-test or support

  • Shoot holes in the 25-prompt design—too small? Wrong metric?
  • Name failure modes: Could VSPE blunt candor or creativity?
  • Point me to prior art so I can cite, not duplicate.
  • Regrant / co-fund if you like cheap, falsifiable pilots (Manifund link at the bottom of this post).

“Psychology and AI share a flaw: both love telling us exactly what we want to hear.”
— sticky note above my desk

My hope: VSPE nudges future models toward frank, human-centred dialogue—first in micro-benchmarks, later (if it survives) in training loops.

Curious, sceptical, or just chasing cross-disciplinary rabbit holes? Drop a comment or DM. I’ll post code, data, and inevitable blooper reels as the project unfolds. More context at vspeframework.com.

With care,
Astelle

(Manifund pilot: [Manifund pilot])

This work is shared for educational and research purposes. For licensing, citation, or collaboration inquiries—especially for commercial or model development use—please contact Astelle Kay at astellekay@gmail.com.

6

1
0

Reactions

1
0
Comments1
Sorted by Click to highlight new comments since:

Related work: Varma & Beitman (2025) recently proposed a CBT-style “therapy loop” prompt to curb hallucinations. VSPE targets the complementary issue of flattery; our benchmark will include the therapy loop as a baseline for comparison.

Curated and popular this week
Relevant opportunities