I had a question. Why do all the AI safety companies seem to do the opposite of AI safety? Anthropic keeps publicly releasing models (which means they can be accessed by billions of people), same for OpenAI, and while these models are unlikely to cause major problems, if you're releasing a product that is going to be used by billions of people you should make sure the product is around 99.9999% failure proof. Anthropic themselves have said "AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding...
This task, of trying to align them, is something that shouldn't just be left to researchers in AI companies.
Why? I would find an AI expert is much more suited to align a potential AGI than any common person. I just don't see how the common person could contribute to alignment. If anything, I can see how they would contribute to DISalignment (engineering better jailbreaks, using the models for nefarious purposes, giving the models "bad values" (like "cause as much damage as possible"), etc.). I think I value existential risk above all else, and I can't imagine publicly releasing "almost superhuman" models can decrease it.
Systemic change interventions should be mentioned as a category of things EA works on when introducing EA and categorizing EA work, outside the "countermeasure" interventions that are done to solve more specific problems (animal suffering, x-risks, global h&d, etc) and meta-work.
I think this would help clarify the idea that EA tries to analyze systemic change proposals and puts money and work on them sometimes.
They would include causes like improving institutional decision-making, happiness research, etc, and even new ideas like making AIs' chara...
Bettering social media platforms could be a top-tier systemic intervention.
My understanding is that EA put most systemic change / solutions S-rated in terms of impact but F-rated in terms of neglectedness and tractability.
Is this the case for social media? Is it super hard and attempted to make the platforms, either internally or through policy, to not maximize for engagement? What about popularizing (and building if necessary) platforms that reward for what would be "useful" for people (or society at large)?
Because I think it's the most impactful systemic...
People give 'The Day After' as an example of a movie that motivated nuclear disarmament and it would be good to have something similar for AIXR and I agree and I think there's something important to learn about that case.
That movie and 'Threads' are about the catastrophe *happening*, and about how absolutely terrible that would be. It forces you to put yourself there and that makes a strong emotional impact.
I think this type of intuition pump is the most powerful of them; people get the most motivated to change their lives when they think about their last ...
Question from a newbie. I am constantly seeing negative references to the gutting of US foreign aid. It seems pretty clear that global development-focused EAs generally view the change in policy to be a bad thing. But I do not think I have once seen any discussion at all about how to reverse this state of affairs. Building on a running theme as of recently, it seems like political giving may have an outsize effectiveness, due to the relatively sparse funding in the space. So, naively, it would seem like you probably could get a great rate of return on effo...
NB -- this is almost entirely AI generated, with some back and forth prompts and corrections
I'm sharing a steelman against a live assumption in Bay/EA/AIS circles: that large AI-lab-adjacent philanthropy is likely to arrive soon enough, and in a sufficiently usable form, that organizations should plan around it.
https://uj-ai-wealth-philanthropy-steelman.netlify.app/
The stronger skeptical case is that IPOs, valuations, pledges, DAFs, and foundation stakes are several gates away from fast, flexible, AIS/EA-directed grants.
...The interactive model lets readers v
I'm biking and Amtraking to Berkeley to join the plzdontkillme house july 1st. I'm gonna try to interview people along the way about ai/technology/practical philosophy.
You can follow me on insta https://www.instagram.com/charlie.guthmann/ or youtube https://www.youtube.com/channel/UCmTkQjHjs2cVgca3eC5vHcQ
Bear with me as I'm learning how to use social media and do content creation.
Here is the video I made a month ago (with the help of Diego, who is the linked channel), if you want to get a sense of what I'm trying to do. Advice wel...
Lots of people in the Bay seem to be thinking about/preparing for/making funding decisions based on the idea that lots of philanthropy will be given to AIS/EA cause areas very soon (i.e. end of year-ish). I would love for someone to write the comprehensive steel man case against this, as I think it’s probably underrated (some reasons to think they won’t give the money/it won’t be as much as some assume. Happy to comment/ speak to whoever is interested in doing this.
An important (and to me fairly open) related question is to what extent this ends up being action-guiding?
Eg if I lower my probability estimate of this materialising by 10 percentage points, how much will it affect resources I spend on helping to prepare for the possible outcome? Perhaps people here have thoughts on that - my impression is that working on improving the future opportunity set for such donations is relatively robustly good right now?
Applying Intelligence Community Indications and Warning methodology to frontier AI yields a single, stark conclusion: we are currently in an active warning failure. The capability thresholds intended to trigger policy interventions have already been breached, with frontier models clearing 50-70% on SWE bench and inference efficiency expanding at a 40x annually. Our current evaluation frameworks are structurally gameable by situationally aware systems, pointing to a foundational counterintelligence failure rather than a mere oversight gap. The governa...
I was excited to hear about this "Claude Corps" initiative for NGOs, which helps orgs supercharge their benefits from AI then gutted to hear that its only going to be in the USA. Apparently they want to extend it overseas later, but the impact an intern li this could have right now for orgnisations like us at OneDay Health in Uganda would be mind-blowing. I hope they can expand the program overseas sooner rather than later!
- 150 million dollar program
- Intern works for 12 months with the NGO to supercharge AI use
- $85,000 payment to intern for the year
https://www.anthropic.com/news/claude-corps
Someone (other than me) should write a deep-dive post about the cult Leverage Research and its infiltration of effective altruism.
The story, in brief:
I think it’s worth noting that Larissa and Kerry have denied being involved with Leverage until after they departed CEA.
There is a thread here where Kerry (now deleted) makes claims on his side of this story.
If you're (re)starting a local EA group or running a local EA event, consider reaching out to people nearby according to the people directory or an EAG Swapcard.
There will be people who do not know about a local group or event and otherwise wouldn't hear about it, and it's pretty painless on both ends! Obviously don't spam them, so (probably) only do this when (re)starting. I did this for an EA Edinburgh event and immediately got some responses.
To make this easier: Put your (approximate) location on the people directory so people can tell you about nearby ...
Next week, me and @Fran will be recording with @NickLaing for The World Can Be Better.
What would you like us to ask him?
Evals are being gamed not because the methodology is insufficient but the models on which the compliance audit run are sophisticated enough to game the audit.
IC methodology already solved the problem of denied human capabilities through triangulation by using independent behavioural signals not better direct elicitation .The AI safety community needs to make the same epistemological shift.
The question isn't how to make evals harder to game, it's whether evals are the right instrument at all.
Idea for thinking about the future of EAGs and whether to keep them cause-neutral or have separate cause-themed events:
This could be of interest: support reporting projects that document and explain the opportunities, harms, and regulatory and labor issues surrounding AI systems https://pulitzercenter.org/blog/open-call-proposals-pulitzer-centers-ai-accountability-fellowships-2026-2027
A recurring sub-theme across multiple of my research interests this year have been various forms of deception checking, particularly automated deception checking.
I've gotten pretty disappointed in the space. Not all the time (eg Pangram is great), but consistently they can be bad, and bad in ways that are not obvious to outsiders or low-information buyers.
If you're a deception checking company, there's a consistent tradeoff for what you can invest your resources in:
I’m willing to bet that Anthropic’s revenue growth over the next year will be slower than its revenue growth over the last 3 years. I proposed a specific bet here. Anyone who wants can offer to take the other side of that bet. Or you can make a counteroffer.
I’m also willing to make a longer-term bet that the AI industry is in a bubble. I proposed a specific bet for that, too, here. Feel free to offer to take the other side of that bet or make a counteroffer.
I’d also be open to other bets. It seems pointless to bet about whether AGI or tr...
I often hear the suggestion that people should short stock when they don't believe in a company. I don't think that it is a very good piece of advice.
Shorting is notoriously difficult and carries the possibility of unlimited loss. Even if you believe that a stock will crash, small errors such as timing the crash one year too early or using a miscalculated stop order to stop shorting too early can lead to massive losses. Determining the actual risk involved is not very straightforward. Shorting is often accompanied by risk-managing tactics such as hedging t...