J.S.

Human Morality, and Machine Ethics and Alignment Researcher
3 karmaJoined Retired

Bio

Most of my life I have been studying how humans make moral choices. I have been working on human moral dilemmas for decades and eventually brought all that experience into AI alignment and Machine Ethics. I have been building a framework for moral verification that comes out of years of thinking about how people actually reason about right and wrong.

I am here to contribute, learn, and to connect with people who are trying to solve the same hard problems in AI safety. 

How others can help me

Feedback, Collaboration, Insights, Guidance.

How I can help others

Feel free to reach out.

Comments
3

I think we are looking at this through genuinely different lenses, and I appreciate you engaging.

Protection in RE is not a standalone principle and it is not a synonym for welfare maximization in consequentialism. It is part of an inseparable structural ordering: P > T > F. You cannot pull Protection out of that structure and compare it to consequentialism in isolation, because it only functions as part of the whole. A forum post is a thin slice of the framework, so I completely understand the blurriness. I would genuinely love to know whether you have had a chance to read the RE paper itself, because I think several of your concerns dissolve once the definitions are on their own terms rather than mapped onto familiar ones.

On moral luck, you are actually closer to RE's position than you might think. You said the act is equally wrong in both cases while the actual results carry more weight. RE agrees, and it gives you the structural reason why both things are true simultaneously: the Self-Deception is identical in both drivers, and the Moral Entropy is different. That is not a repackaging, it is a mechanism.

On the Repugnant Conclusion, you are absolutely right that some serious philosophers accept it. That is a legitimate position. My argument is not that it feels bad. It is that the reasoning process that generates it contains a specific kind of incoherence when run through PTF structure. If you want to defend accepting it, I would be genuinely curious what you make of that argument specifically.

No final word claimed here. Just something I think is worth a closer look.

Maybe it's just me, but this looks like a win for Anthropic. Bad actors will do bad things, but I wonder why they would choose to use Anthropic instead of their own Chinese AI, where I would assume the security is less rigorous, at least to their own state actors, no? I had Claude quickly dig this up for me, and from what he said, it occurred as far back as mid-September 2025, which would indicate this release had intentional timing. Anthropic chose to announce during peak AI governance discussion, framing it to emphasize both the threat and defense value of their systems. The delay between September detection and November announcement allowed them to craft a narrative that positions Claude as both the problem and the solution, which is classic positioning for regulatory influence. Nothing wrong with that I suppose...?