AI Safety Camp 10

Robert Kralisch; Linda Linsefors; Remmelt

AI Safety Camp 10

Comments

Sorted by

New & upvoted

No comments on this post yet.

Be the first to respond.

Comments

Curated and popular this week

Hard-to-reverse decisions destroy option value

Stefan_Schubert·9y ago·Curated 1d ago·14m read

This post is co-authored with Ben Garfinkel. It is cross-posted from the CEA blog. A PDF version can be found here. Summary: Some strategic decisions available to the effective altruism m...

Introducing Impact List: a ranking of philanthropists by expected lives saved

Elliot Olds·2d ago·6m read

TL;DR: I'm releasing a website that ranks philanthropists according to EA principles and research, and allows users to re-rank the list using their own assumptions. I'd like feedback and help making it better. I'd especially like ideas for how to make the results more trustworthy. Funding may be available. I recently built Impact List (impactlist.xyz), a site which ranks people by their positive impact via donations. The goal is t...

If you're agentic, work in biosecurity

sharmaayushmaan🔸·5d ago·7m read

Disclaimer: Although I work on the Groups Team at CEA, I’m writing this in a personal capacity, and this post does not constitute an endorsement by CEA. Agency - the realisation that you really can just do things. TL;DR Biosecurity needs people (of any background) who are agentic and have a high execution velocity and track record....

Recent opportunities to take action

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·4d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·4d ago·3m read

Starting an EA group @ SUNY Binghamton

micahzarin·2d ago·1m read

AI Safety Camp 10

AI Safety Camp 10

Stop/Pause AI

(1) Growing PauseAI

Summary

(2) Grassroots Communication and Lobbying Strategy for PauseAI

Summary

(3) AI Policy Course: AI’s capacity of exploiting existing legal structures and rights

Summary

(4) Building the Pause Button: A Proposal for AI Compute Governance

Summary

(5) Stop AI Video Sharing Campaign

Summary

Evaluate risks from AI

(6) Write Blogpost on Simulator Theory

Summary

(7) Formalize the Hashiness Model of AGI Uncontainability

Summary

(8) LLMs: Can They Science?

Summary

(9) Measuring Precursors to Situationally Aware Reward Hacking

Summary

(10) Develop New Sycophancy Benchmarks

Summary

(11) Agency Overhang as a Proxy for Sharp Left Turn

Summary

Mech-Interp

(12) Understanding the Reasoning Capabilities of LLMs

Summary

(13) Mechanistic Interpretability via Learning Differential Equations

Summary

(14) Towards Understanding Features

Summary

(15) Towards Ambitious Mechanistic Interpretability II

Summary

Agent Foundations

(16) Understanding Trust

Summary

(17) Understand Intelligence

Summary

(18) Applications of Factored Space Models: Agents, Interventions and Efficient Inference

Summary

Prevent Jailbreaks/Misuse

(19) Preventing Adversarial Reward Optimization

Summary

(20) Evaluating LLM Safety in a Multilingual World

Summary

(21) Enhancing Multi-Turn Human Jailbreaks Dataset for Improved LLM Defenses

Summary

Train Aligned/Helper AIs

(22) AI Safety Scientist

Summary

(23) Wise AI Advisers via Imitation Learning

Summary

(24) iVAIS: Ideally Virtuous AI System with Virtue as its Deep Character

Summary

(25) Exploring Rudimentary Value Steering Techniques

Summary

(26) Autostructures – for Research and Policy

Summary

Other

(27) Reinforcement Learning from Recursive Information Market Feedback

Summary

(28) Explainability through Causality and Elegance

Summary

(29) Leveraging Neuroscience for AI Safety

Summary

(30) Scalable Soft Optimization

Summary

(31) AI Rights for Human Safety

Summary

(32) Universal Values and Proactive AI Safety

Summary