Quick takes

Set topic
Frontpage
Global health
Animal welfare
Existential risk
Biosecurity & pandemics
12 more

I had a question. Why do all the AI safety companies seem to do the opposite of AI safety? Anthropic keeps publicly releasing models (which means they can be accessed by billions of people), same for OpenAI, and while these models are unlikely to cause major problems, if you're releasing a product that is going to be used by billions of people you should make sure the product is around 99.9999% failure proof. Anthropic themselves have said "AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding... (read more)

Showing 3 of 13 replies (Click to show all)
2
Guy Raveh
We don't know how to align a possible AGI yet. The best we can hope for is that current models are close enough to whatever AGI is going to be, that trying to align them will teach us about aligning an AGI. This task, of trying to align them, is something that shouldn't just be left to researchers in AI companies.

This task, of trying to align them, is something that shouldn't just be left to researchers in AI companies.

 

Why? I would find an AI expert is much more suited to align a potential AGI than any common person. I just don't see how the common person could contribute to alignment. If anything, I can see how they would contribute to DISalignment (engineering better jailbreaks, using the models for nefarious purposes, giving the models "bad values" (like "cause as much damage as possible"), etc.). I think I value existential risk above all else, and I can't imagine publicly releasing "almost superhuman" models can decrease it.

2
David T
In principle I agree.  But would you say that people's suitability to align AI safely (or more specifically ensuring that Fable does not write nasty software exploits) is defined less by their expertise and alignment with Anthropic's stated mission and more by how much money they can spend on credits? Because that's what Anthropic and the impending IPO marketing is asking you to believe (tbh I'm not concerned by Fable manipulating its way into world domination. But if I was, I'd be extremely concerned that our most dedicated defenders against manipulative AI agents might be the sort of people who still take statements put out by AI companies at face value)

Systemic change interventions should be mentioned as a category of things EA works on when introducing EA and categorizing EA work, outside the "countermeasure" interventions that are done to solve more specific problems (animal suffering, x-risks, global h&d, etc) and meta-work.

I think this would help clarify the idea that EA tries to analyze systemic change proposals and puts money and work on them sometimes. 

They would include causes like improving institutional decision-making, happiness research, etc, and even new ideas like making AIs' chara... (read more)

Bettering social media platforms could be a top-tier systemic intervention.

My understanding is that EA put most systemic change / solutions S-rated in terms of impact but F-rated in terms of neglectedness and tractability.

Is this the case for social media? Is it super hard and attempted to make the platforms, either internally or through policy, to not maximize for engagement? What about popularizing (and building if necessary) platforms that reward for what would be "useful" for people (or society at large)?

Because I think it's the most impactful systemic... (read more)

People give 'The Day After' as an example of a movie that motivated nuclear disarmament and it would be good to have something similar for AIXR and I agree and I think there's something important to learn about that case.

That movie and 'Threads' are about the catastrophe *happening*, and about how absolutely terrible that would be. It forces you to put yourself there and that makes a strong emotional impact.

I think this type of intuition pump is the most powerful of them; people get the most motivated to change their lives when they think about their last ... (read more)

Question from a newbie. I am constantly seeing negative references to the gutting of US foreign aid. It seems pretty clear that global development-focused EAs generally view the change in policy to be a bad thing. But I do not think I have once seen any discussion at all about how to reverse this state of affairs. Building on a running theme as of recently, it seems like political giving may have an outsize effectiveness, due to the relatively sparse funding in the space. So, naively, it would seem like you probably could get a great rate of return on effo... (read more)

At the risk of undermining the strategy somewhat, Matt Yglesias said at a recent EA event that efforts to restore US foreign aid have been quietly going well and that it would not be helpful to raise the political salience of the issue. 

2
David T
Musk spent $290m in political giving to help convince Trump to give him the role of dismantling USAID, a decision which largely aligns with MAGA and Republican orthodoxy. He incidentally became the world's first billionaire today and can run anti-USAID rhetoric on his social media platform at zero marginal cost. Doesn't seem like funds to reverse that in the current political climate are going to go very far. More traction is likely achievable longer term with the Democrat party (who aren't exactly guaranteed to reinstate USAID, but are at least receptive to the standard arguments for it and unreceptive to what Elon thinks) but there are a lot of organizations already motivated to lobby for it because USAID was a major funding source, and some of them know their way around DC...

NB -- this is almost entirely AI generated, with some back and forth prompts and corrections

I'm sharing a steelman against a live assumption in Bay/EA/AIS circles: that large AI-lab-adjacent philanthropy is likely to arrive soon enough, and in a sufficiently usable form, that organizations should plan around it.

https://uj-ai-wealth-philanthropy-steelman.netlify.app/

The stronger skeptical case is that IPOs, valuations, pledges, DAFs, and foundation stakes are several gates away from fast, flexible, AIS/EA-directed grants.

The interactive model lets readers v

... (read more)
4
Tobias Häberli
I found the 'founder deployment by end-2026' the hardest to set. It comes a bit as a surprise at the end, as I was already taking into account some considerations before, and the descriptions seem to do as well (e.g. "assets after lockups, taxes, sale timing", and "execution delays").
2
david_reinstein
having a think about this.

OK I think the revised language makes it clerer (see updated version of site ... referring to 'timing gate' etc)

I'm biking and Amtraking to Berkeley to join the plzdontkillme house july 1st. I'm gonna try to interview people along the way about ai/technology/practical philosophy. 

You can follow me on insta https://www.instagram.com/charlie.guthmann/ or youtube  https://www.youtube.com/channel/UCmTkQjHjs2cVgca3eC5vHcQ  
Bear with me as I'm learning how to use social media and do content creation.  

Here is the video I made a month ago (with the help of Diego, who is the linked channel), if you want to get a sense of what I'm trying to do. Advice wel... (read more)

Lots of people in the Bay seem to be thinking about/preparing for/making funding decisions based on the idea that lots of philanthropy will be given to AIS/EA cause areas very soon (i.e. end of year-ish). I would love for someone to write the comprehensive steel man case against this, as I think it’s probably underrated (some reasons to think they won’t give the money/it won’t be as much as some assume. Happy to comment/ speak to whoever is interested in doing this.  

Showing 3 of 5 replies (Click to show all)

An important (and to me fairly open) related question is to what extent this ends up being action-guiding? 
Eg if I lower my probability estimate of this materialising by 10 percentage points, how much will it affect resources I spend on helping to prepare for the possible outcome? Perhaps people here have thoughts on that - my impression is that working on improving the future opportunity set for such donations is relatively robustly good right now?

4
david_reinstein
As a first pass, I asked GPT Pro to consider and model this, and Codex to host it, with interactive BOTEC tools etc. https://uj-ai-wealth-philanthropy-steelman.netlify.app/ I'm just looking through it now (I'll respond/adapt to hypothes.is comments). Let me know if this sort of thing is useful or annoying.
4
David Mathers🔸
Wasn't Dario Amodei one of the earliest signers of the GWWC pledge? Long before Anthropic existed? That makes him quite atypical amongst very rich people right? Though for what it's worth I would probably bet against him giving most of his wealth away, even if he has pledge to do that. 

Applying Intelligence Community Indications and Warning methodology to frontier AI yields a single, stark conclusion: we are currently in an active warning failure. The capability thresholds intended to trigger policy interventions have already been breached, with frontier models clearing 50-70% on SWE bench and inference efficiency expanding at a  40x annually. Our current evaluation frameworks are structurally gameable by situationally aware systems, pointing to a foundational counterintelligence failure rather than a mere oversight gap. The governa... (read more)

I was excited to hear about this "Claude Corps" initiative for NGOs, which helps orgs supercharge their benefits from AI then gutted to hear that its only going to be in the USA. Apparently they want to extend it overseas later, but the impact an intern li this could have right now for orgnisations like us at OneDay Health in Uganda would be mind-blowing. I hope they can expand the program overseas sooner rather than later!

- 150 million dollar program
- Intern works for 12 months with the NGO to supercharge AI use
- $85,000 payment to intern for the year

https://www.anthropic.com/news/claude-corps
 

Suggestion: Leverage Research deep dive

Someone (other than me) should write a deep-dive post about the cult Leverage Research and its infiltration of effective altruism.

The story, in brief:

  • Leverage Research is a cult.
  • Leverage Research organized the first EA Summit in 2013 and the second EA Summit in 2014. The EA Summits were the first effective altruism conferences of any kind.
  • Leverage Research also helped to organize the first EA Global conferences, which began in 2015 and continue to this day.
  • In 2016, a major EA program, the Pareto Fellowship, was run la
... (read more)
Showing 3 of 6 replies (Click to show all)

I think it’s worth noting that Larissa and Kerry have denied being involved with Leverage until after they departed CEA.

There is a thread here where Kerry (now deleted) makes claims on his side of this story.

5
Jonathan Mannhart
Yeah, probably just slightly disagree with the word “takeover“ in Oliver‘s comment to some extent, but that seems like a reasonable linguistic disagreement. (If it’s not taken over for a significant amount of time, because then the other people kicked you out, it wasn’t much of a takeover. Maybe Oliver and me would arrive at “long/mid-term-unsuccessful-takeover“ as the concept we‘d both agree on. Also acknowledging that I wasn’t there at the time, and he was.) Doesn’t change the fundamental point that it seems important to have some transparent documentation on this. Seems good.
3
Yarrow Bouchard 🔸
Calling it a “year-long takeover” would resolve the ambiguity.
Mick
17
2
0
1

If you're (re)starting a local EA group or running a local EA event, consider reaching out to people nearby according to the people directory or an EAG Swapcard.

There will be people who do not know about a local group or event and otherwise wouldn't hear about it, and it's pretty painless on both ends! Obviously don't spam them, so (probably) only do this when (re)starting. I did this for an EA Edinburgh event and immediately got some responses.

To make this easier: Put your (approximate) location on the people directory so people can tell you about nearby ... (read more)

Thanks for organising @Mick 

Next week, me and @Fran will be recording with @NickLaing for The World Can Be Better

What would you like us to ask him? 

Evals are being gamed not because the methodology is insufficient but the models on which the compliance audit run are sophisticated enough to game the audit.
IC methodology already solved the problem of denied human capabilities through triangulation by using independent behavioural signals not better direct elicitation .The AI safety community needs to make the same epistemological shift.
The question isn't how to make evals harder to game, it's whether evals are the right instrument at all.

Idea for thinking about the future of EAGs and whether to keep them cause-neutral or have separate cause-themed events:

  • Get (anonymised) swapcard data and look at who had meetings with whom to work out how strong the clustering is. If there are clear groups that had lots of meetings within the group, but few meetings with people from other groups, that is a sign that there should just be a separate event for that group. Whereas if there is no group that can be cleanly split into its own event, that is a sign to keep it together.
  • One simple metric for this co
... (read more)

This could be of interest: support reporting projects that document and explain the opportunities, harms, and regulatory and labor issues surrounding AI systems https://pulitzercenter.org/blog/open-call-proposals-pulitzer-centers-ai-accountability-fellowships-2026-2027 

A recurring sub-theme across multiple of my research interests this year have been various forms of deception checking, particularly automated deception checking.

I've gotten pretty disappointed in the space. Not all the time (eg Pangram is great), but consistently they can be bad, and bad in ways that are not obvious to outsiders or low-information buyers.

If you're a deception checking company, there's a consistent tradeoff for what you can invest your resources in:

  1. You can invest in better deception checking
  2. You can invest in better deception. Specifically,
... (read more)

My first Fable benchmark was to one-shot turning `emacs -batch -l dunnet` into a graphical adventure game and it hit the safety guardrails bc one of the puzzles involve nitroglycerine 😭

Invitation for bets

I’m willing to bet that Anthropic’s revenue growth over the next year will be slower than its revenue growth over the last 3 years. I proposed a specific bet here. Anyone who wants can offer to take the other side of that bet. Or you can make a counteroffer.

I’m also willing to make a longer-term bet that the AI industry is in a bubble. I proposed a specific bet for that, too, here. Feel free to offer to take the other side of that bet or make a counteroffer.

I’d also be open to other bets. It seems pointless to bet about whether AGI or tr... (read more)

Showing 3 of 7 replies (Click to show all)

I often hear the suggestion that people should short stock when they don't believe in a company. I don't think that it is a very good piece of advice.

Shorting is notoriously difficult and carries the possibility of unlimited loss. Even if you believe that a stock will crash, small errors such as timing the crash one year too early or using a miscalculated stop order to stop shorting too early can lead to massive losses. Determining the actual risk involved is not very straightforward. Shorting is often accompanied by risk-managing tactics such as hedging t... (read more)

4
Noah Birnbaum
Post this on LW and youll prob get more offers 
4
Yarrow Bouchard 🔸
Feel free to cross-post it! You have my permission! (My LessWrong account has been deactivated for years and I’m not going to reactivate it for this.)

I find it icky/disappointing when things like this end up on the 80k job board. Can someone make a case why this is a high-impact job worth advertising on the job board? 

 

Showing 3 of 4 replies (Click to show all)
5
CBiddulph
I don't think it's icky. (Some might even say it would be more icky to only value fancy research roles?) But it is somewhat surprising to me that this role ended up on the job board, as I would've assumed that Constellation sources this kind of role via normal job boards, like Indeed or something. I wonder how many blue-collar workers at Constellation found their role due to EA motivations. My impression is that this number is very low, although I did hear that the chef is EA-motivated. It seems like it would be quite nice for "low-skill" people who are worried about AI to be able to contribute. And plausibly a janitor or dishwasher who feels a great sense of purpose in their work would have noticeably more impact. But I feel like EA appeals mainly to "elites," for better or for worse...
28
NickLaing
I couldn't disagree more here, I think we need more of these kind of jobs on the jobboard. Imagine if you are a kitchenhand and you are good at and like your job. Perhaps you don't have any tertiary education but you'd love to be more impactful then your current work at Starbucks.  What a rare and great opportunity.

Can you explain how it would be more impactful? I understand impact as meaning counterfactual impact, so I can only imagine this being the case if it's hard to hire a dishwasher for $21-25 an hour in San Francisco. 

Load more