Welcome to the EA Forum bot site. If you are trying to access the Forum programmatically (either by scraping or via the api) please use this site rather than forum.effectivealtruism.org.

This site has the same content as the main site, but is run in a separate environment to avoid bots overloading the main site and affecting performance for human users.

New & upvoted

Customize feedCustomize feed

Quick takes

Show community
View more
A week ago, Anthropic quietly weakened their ASL-3 security requirements. Yesterday, they announced ASL-3 protections. I appreciate the mitigations, but quietly lowering the bar at the last minute so you can meet requirements isn't how safety policies are supposed to work. (This was originally a tweet thread (https://x.com/RyanPGreenblatt/status/1925992236648464774) which I've converted into a quick take. I also posted it on LessWrong.) What is the change and how does it affect security? 9 days ago, Anthropic changed their RSP so that ASL-3 no longer requires being robust to employees trying to steal model weights if the employee has any access to "systems that process model weights". Anthropic claims this change is minor (and calls insiders with this access "sophisticated insiders"). But, I'm not so sure it's a small change: we don't know what fraction of employees could get this access and "systems that process model weights" isn't explained. Naively, I'd guess that access to "systems that process model weights" includes employees being able to operate on the model weights in any way other than through a trusted API (a restricted API that we're very confident is secure). If that's right, it could be a high fraction! So, this might be a large reduction in the required level of security. If this does actually apply to a large fraction of technical employees, then I'm also somewhat skeptical that Anthropic can actually be "highly protected" from (e.g.) organized cybercrime groups without meeting the original bar: hacking an insider and using their access is typical! Also, one of the easiest ways for security-aware employees to evaluate security is to think about how easily they could steal the weights. So, if you don't aim to be robust to employees, it might be much harder for employees to evaluate the level of security and then complain about not meeting requirements[1]. Anthropic's justification and why I disagree Anthropic justified the change by
Who said EA was dying? I have 1400 contacts on my EAG London spreadsheet! Yeah I know it's a bit of a lame datapoint and this is more of a tweet than a forum post but hey.... 😘
Speaking from what I've personally seen, but it's reasonable to assume it generalizes. There's an important pool of burned out knowledge workers, and one of the major causes is lack of value alignment, i.e. working for companies that only care about profits. I think this cohort would be a good target for a campaign: * Effective giving can provide meaning for the money they make * Dedicating some time to take on voluntary challenges can help them with burnout (if it's due to meaninglessness)
Would a safety-focused breakdown of the EU AI Act be useful to you? The Future of Life Institute published a great high-level summary of the EU AI Act here: https://artificialintelligenceact.eu/high-level-summary/ What I’m proposing is a complementary, safety-oriented summary that extracts the parts of the AI Act that are most relevant to AI alignment researchers, interpretability work, and long-term governance thinkers.  It would include: * Provisions related to transparency, human oversight, and systemic risks * Notes on how technical safety tools (e.g. interpretability, scalable oversight, evals) might interface with conformity assessments, or the compliance exemptions available for research work. * Commentary on loopholes or compliance dynamics that could shape industry behavior * What the Act doesn't currently address from a frontier risk or misalignment perspective Target length: 3–5 pages, written for technical researchers and governance folks who want signal without wading through dense regulation. If this sounds useful, I’d love to hear what you’d want to see included, or what use cases would make it most actionable.  And if you think this is a bad idea, no worries. Just please don’t downvote me into oblivion, I just got to decent karma :). Thanks in advance for the feebdack!
Potential Megaproject: 'The Cooperation Project' (or the like) This is a very loose idea, based on observations like these: * We have ongoing geopolitical tensions (e.g. China-US, China-Taiwan, Russia-Ukraine) and a lot of resources and attention spent on those. * We have (increasing?) risks from emerging technology that potentially threaten everyone. It's difficult to estimate the risk levels, but there seems to be an emerging consensus that we are on a reckless path, even from perspectives concerned purely with individual or national self-interest. The project would essentially seek to make a clear case for broad cooperation toward avoiding widely agreed-upon bad outcomes from emerging technologies — outcomes that are in nobody's interest. The work could, among other things, consist in reaching out to key diplomats as well as doing high-visibility public outreach that emphasizes cooperation as key to addressing risks from emerging technologies. Reasons it might be worth pursuing: * The degree of cooperation between major powers, especially wrt tech development, is plausibly a critical factor in how well the future will go. Even marginal improvements might be significant. * A strong self-interested case can seemingly be made for increasing cooperation, but a problem might be its relatively low salience as well as primitive status and pride psychology preventing this case from being acted on. * Even if the case is fairly compelling to people, other motivations might nevertheless feel more compelling and motivating; slight pushes in terms of how salient certain considerations are, both in the minds of the public and leaders, could potentially tip the scales in terms of which paths end up being pursued. * The broader goal seems quite commonsensical and like something few people would outright oppose (though see the counter-considerations below). * The work might act as a lever or catalyst of sorts: one can make compelling arguments regarding specific tec