Non-EA interests include chess and TikTok (@benthamite). Formerly @ CEA, METR + a couple now-acquired startups.
Feedback always appreciated; feel free to email/DM me or use this link if you prefer to be anonymous.
People seem surprised and bewildered when AI folks defect away from AI safety towards capabilities. People trust that as AI companies grow, those gaining power and money from shares will not be adversely influenced by that power and money.
fwiw I don't actually know many examples of this, and the ones I hear cited often seem uncompelling to me. E.g.:
(Counterexamples appreciated, though!)
And credit to the AI skeptics that they seem to mostly have updated in light of the new evidence (or at least claimed that they never actually believed in long timelines, which is maybe less noble, but ends up in the same place).
Yeah I agree that if you only have one bit of detail that you can store, then saying it is "hard" rather than "easy" is probably the correct bit. However I would suggest that for something as important as your career you should investigate in substantially more detail. If you do so I expect you will come up with a range of needed skills/attributes for these jobs, some of which you might find easy, others of which you might find hard.Â
Many people said they wanted to work for METR. I made what I thought was a good offer: take one of the benchmarks we give AIs; if you get a good score then I guarantee that I will fly you out for an interview, even if you have no work history, have no money to pay for the trip, or any other barrier one might have to employment.
Exactly zero people took me up on this.[1]
How is it possible for there to be sky-high rejection rates yet also zero people sending me applications?
I think the answer is that raw rejection rates aren't a very useful metric. After all, an 80% rejection rate means that the AI safety jobs are 1/10th as selective as Walmart!
I would suggest ignoring raw rejection rates in favor of just looking at the criteria for the jobs you want. Particularly for something like s-risks the criteria are going to be unusual and specific, meaning that even generically qualified people will often have to dedicate substantial time to skilling up, but if you're able to do so, then your odds are pretty good.[2]Â
I wouldn't be surprised to learn that some people tried this, failed, and then were too embarrassed about failing to tell me. But, to the best of my recollection, literally zero people have told me that they even attempted this task.
I say this even with the knowledge that you are 19. I don't want to pretend that the deck isn't stacked against younger people - it totally is - but we employ some 19 year olds, as do other AI safety orgs. If a 19 year old had sent me a good solution to that METR challenge, for example, I would have been happy to hire them.
I see, thanks! I'm not sure exactly what you'd consider as evidence here, but e.g. here's citation count on papers from the past year vs. AI Lab Watch safety rating[1]
Raw data. Note that anthropic doesn't use arxiv, which affects their citation counts. This is just coming from a dumb search of semantic scholar; I expect a lot of disagreement could be had over the exact criteria for considering something "interpretability" but I expect the Ant/GDM > OAI >> * ordering to be true for almost any definition.
I suspect that I'm still misunderstanding you, but: eg interpretability tools are empirically able to identify misalignment, which feels like a (somewhat simple example of) the thing we want. Neel Nanda's 80k podcast goes over the state of the field; tldr is roughly that there are pretty meaningful advances but also he's skeptical that it will be a silver bullet.
I agree with Ben Stewart that there's a galaxy-brain argument that these positive impacts are outweighed by accelerating progress, but it seems hard to argue that things like interpretability aren't making progress on their own terms.Â
Thanks! I only know a handful of people in this category, but for what it's worth, it again feels like people who were predisposed to thinking that working on pretraining would be okay rather than them being "corrupted."Â
E.g., I recently talked to someone who told me that their main takeaway from a safety fellowship was realizing that they didn't fit in because they actually weren't worried about existential risk in the same way that the other attendees were.