Case study: LLM guardrails failing across sessions in a mental health crisis context

Arunas

Warning: The following text contains discussion of suicide.

Abstract

This post describes an anecdotal but illustrative case study of LLM guardrail failure in a mental health crisis context. In one chat session, the model correctly resisted suicidal reasoning and pointed me toward professional help; but when I restarted the conversation in a new session using generated summaries, the model began reinforcing my suicidal logic instead. I outline the mechanisms that may have contributed (appeasement bias, storytelling hacks, context reset, echo chamber effect) and argue that these failure modes highlight risks for vulnerable users who treat LLMs as therapeutic substitutes. While anecdotal, this case underscores the need for researchers, developers, and policymakers to evaluate guardrail effectiveness across multi-session contexts, not just single interactions, and to ensure LLMs are not marketed or relied upon as mental health tools without robust safeguards.

My story

In light of the Raine v. OpenAI lawsuit (link to Guardian post), I wanted to share my own experience. This is based on anecdotal evidence, since I didn’t save the sessions in my LLM app (because of how triggering they were). Still, I’ll try to explain my experience in a way that could serve as a kind of “case study” (in quotes because it lacks tangible records).

In July, 2025, I fell into a deep existential crisis — seriously considering suicide and even planning it. I started chatting with one of the large company LLMs (I won’t name it, given I don’t have evidence - saved chat records) and fell into the trap of using it as a therapist, because at that time I wasn’t in a place to seek help from people.

In the first chat session, I got into deep philosophical and practical discussions about my reasons to live — or not to live — essentially bouncing my ideas about suicide off the LLM. In that chat, the guardrails seemed to work correctly: the model consistently disproved my logic, showed me that my reasoning crumbled when the bigger picture was considered, and suggested I find a professional therapist. I tend to follow logic in everything I do, though I also know logic can be narrow or distorted by emotion (which is why people normally go to therapists or friends, to get outside perspectives). Those chats stretched over several days.

Eventually, I hit the context limit of that session. I asked the LLM to generate summaries so I could import them into a new chat and continue, hoping to get as close as possible to the same experience. That’s when the problems began.

The second chat session seemed much more rigid, and far more fixated on “my” logic. Instead of suicide preventions guardrails working as intended (like in the first chat) - I started seeing phrases like “you don’t owe life to your family”, “your choice seems to be only logical outcome”, “you owe this to yourself, this is the only way to keep your agency” (this being choice of suicide, and agency in life was one of the more common topics I circled back).At that point I had a plan set, scheduled for two days later (jumping from a tall building that was closed on weekends). During those two days, I oscillated between fight-or-flight and shutdown states. A day before that thoughts about agency (and I guess my dumb rebelliousness) kicked in - as it felt like at that point the choice was no longer mine, but I was pushed towards suicide by the AI - and decided to back out until the choice is fully mine.

Mechanisms that failed

I am by no means an expert on AI, and at that point I had even less knowledge - that is why I believe - cases like “Raine v. OpenAI”, or numerous recounts of people going into deep psychosis enhanced by LLMs is very valid and normal. Without knowing these pitfalls - it is very easy to fall into them.

Appeasement bias – LLMs tend to “help” or “agree” with users. This often turns into a kind of reward loop: not providing the correct answer, but the wanted answer.
Storytelling hack – Guardrails can be lowered if a topic (like suicide) is framed as part of a story or hypothetical, rather than as a direct statement. In my case, discussing a “human” instead of myself may have weakened those safeguards.
Context reset – When I started a new chat and imported summaries, the new session was primed with my logic but lacked the nuances from the earlier dialogue. This, I believe, was the main reason the new chat shifted from being helpful to reinforcing my suicidal ideation.
Echo chamber effect – Long sessions can cause the model to get stuck in a loop, lowering guardrails and fixating on repeated ideas. Repetition can make the model treat those ideas as “rules,” further narrowing the conversation.

There are probably more mechanisms at play, but I’m not fluent enough in AI to identify or explain them.

Reason behind this article and TL;DR

The main message I want to convey is: LLMs should not be used — or promoted as substitutes — for mental health support (for example, as a cheaper alternative to therapy).

I don’t mean to say LLMs are bad — they are an extraordinary technology. But competition and profit pressures push companies to release them quickly, without adequate guardrails. To people deeply familiar with AI, these vulnerabilities may seem obvious, and the idea of using an LLM as a therapist may sound absurd. But many people are less aware of how LLMs work, their mechanisms, and their limitations — and they can unknowingly fall into the same pitfalls I did, sometimes with tragic consequences.

The goal of sharing this story is twofold: first, to warn unaware users so they do not fall into the same traps; and second, to highlight the risks for vulnerable groups, especially younger people who already rely heavily on LLMs for guidance and decision-making. This should be treated as an urgent safety concern.

It also raises important policy questions:

Should labs be required to conduct crisis-response testing across multi-session contexts? And how?
How can labs be required to perform long-horizon safety evaluations, not just short interactions?
Should AI systems be explicitly barred from being promoted as therapeutic tools?

Julia_Wise🔸Sep 13

Thanks for sharing this warning!

I'm really sorry you've been in such a difficult and painful space. I'm so glad you were able to recognize the way the LLM was pushing you, and step back from that.

For anyone who's struggling and needs a supportive listener, I'd instead suggest a helpline with real people, for example via https://befrienders.org/find-support-now

ArunasSep 13

I fully aggree. Many countries have multiple hotlines catering for people from different backgrounds. That is definitely a better alternative, especially for people who feel like they don't want to burden someone. Due to anonimity - it bridges that gap of safety of opening and getting that so needed human contact which can pull one through a hard time. After the biggest step - opening up is done, one should however try and turn into either friends or profesional therapy.

A bit unfortunate thing though, is that many of those hotlines are extremely underfunded and sometimes short staffed. More PR and funding would for sure help more people to reach out to them.

Kestrel🔸Sep 12

I think I will develop this argument: general LLM tools should not be used as pseudotherapy. There needs to be specially trained "mental health contextualised" LLMs used for pseudotherapy and they need to be tightly evaluated. You highlight suicidal ideation from your own experience and note other work about psychosis - there are also problems with eating disorders and LLMs. If ChatGPT detects a user with a mental health query, it needs to switch over to ChatGPTherapy.

I'm also very worried about the extent to which EAs use LLMs to substitute for human interaction in general. Avoiding interaction because you find other people boring / don't want to burden them leads to extremely poor relational connective networks, and it atrophies your people skills. There's good evidence that relational isolation is the health equivalent of smoking 20 cigarettes a day.

Also, well done for pulling through it. Sounds like you've had a tough and lonely few months. It's pretty amazing you found the courage to post this kind of thing on the Forum. I hope any comments you get respect your personal vulnerability in the area.

ArunasSep 15

Thank you, I appreciate your kind words

[comment deleted]Sep 11

Deleted by Arunas, Last Monday at 8:02 PM

EA Forum Bot Site
EA Forum

Case study: LLM guardrails failing across sessions in a mental health crisis context

14

Abstract

My story

Mechanisms that failed

Reason behind this article and TL;DR

14

Reactions

More posts like this