Tyler Williams

Independent AI Systems Researcher

2 karmaJoined Jul 2025Seeking workWorking (6-15 years)

Interests:

AI safetyAI governanceAI alignmentAI interpretabilityResearch

Bio

Hello! My research interests include Human-AI collaboration, synthetic cognition, and model behavior. If you’d like to see more of my work, my website is linked in my profile here. My DMs are always open for curious minds. Additionally, you can find software/tools I've developed and released publicly on GitHub. Also find me on Telegram: @unmodeledtyler

Posts
8

Sorted by New

VANTA Research Reasoning Evaluation (VRRE): A New Evaluation Framework for Real-World Reasoning

Tyler Williams

· 1mo ago · 3m read

Hallucinations May Be a Result of Models Not Knowing What They’re Actually Capable Of

Tyler Williams

· 2mo ago · 2m read

The Inevitable Emergence of Black-Market LLM Infrastructure

Tyler Williams

· 2mo ago · 3m read

Cognitive Stress Testing Gemini 2.5 Pro: Empirical Findings from Recursive Prompting

Tyler Williams

· 3mo ago · 2m read

Synthetic Anthropology: Can AI Offer a New Lens on Human Behavior?

Tyler Williams

· 3mo ago · 2m read

How Prompt Recursion Undermines Grok's Semantic Stability

Tyler Williams

· 3mo ago · 2m read

How DeepSeek Collapsed Under Recursive Load

Tyler Williams

· 3mo ago · 2m read

Adversarial Prompting and Simulated Context Drift in Large Language Models

Tyler Williams

· 3mo ago · 2m read

Comments
3

Research Engineer @ Timaeus

Tyler Williams2mo1

It's cool to see a role like this open up. I'm curious to see how SLT plays out in practice, especially at scale. I've seen some pretty dramatic shifts in generalization between different versions of the same language model, even just from one quantization to another. Definitely feels like important territory to explore.

Ten AI safety projects I'd like people to work on

Tyler Williams3mo2

Great list! I've actually been working on something that aligns closely with #3: I've been independently testing LLMs (including Gemini, Grok, DeepSeek etc.) for unexpected behavior under recursive prompt stress. I've been documenting my tests in red-team style forensic breakdowns that show when and how models deviate or degrade through persistent pressure.

The goal of this is absolutely to see and evaluate how agents behave in the wild. I believe that this is a critical safety test that cannot be missed.

I'd be curious to connect with others that are interested in research/testing from this angle.

A Practical Guide for Aspiring Super Connectors

Tyler Williams3mo1

What a great post! Thanks for taking the time to share it, Constance! The section on cultivating a HEALTHY whisper network was certainly appreciated. I don't think it's something that's talked about enough.

Tyler Williams

Bio

Posts 8

Comments3

Posts
8

Comments
3