Jay Bailey

1131 karmaJoined Working (0-5 years)Brisbane QLD, Australia



I'm a software engineer from Brisbane, Australia who's looking to pivot into AI alignment. I have a grant from the Long-Term Future Fund to upskill in this area full time until early 2023, at which point I'll be seeking work as a research engineer. I also run AI Safety Brisbane.

How others can help me

I will be looking for a research engineering position near the end of 2022. I'm currently working on improving my reinforcement learning knowledge. (https://github.com/JayBaileyCS/RLAlgorithms)

How I can help others

Reach out to me if you have questions about basic reinforcement learning or LTFF grant applications.


My p(doom) went down slightly (From around 30% to around 25%) mainly as a result of how GPT-4 caused governments to begin taking AI seriously in a way I didn't predict. My timelines haven't changed - the only capability increase of GPT-4 that really surprised me was its multimodal nature. (Thus, governments waking up to this was a double surprise, because it clearly surprised them in a way that it didn't surprise me!)

I'm also less worried about misalignment and more worried about misuse when it comes to the next five years, due to how LLM"s appear to behave. It seems that LLM's aren't particularly agentic by default, but can certainly be induced to perform agent-like behaviour - GPT-4's inability to do this well seems to be a capability issue that I expect to be resolved in a generation or two. Thus, I'm less worried about the training of GPT-N but still worried about the deployment of GPT-N. It makes me put more credence in the slow takeoff scenario.

This also makes me much more uncertain about the merits of pausing in the short-term, like the next year or two. I expect that if our options were "Pause now" or "Pause after another year or two", the latter is better. In practice, I know the world doesn't work that way and slowing down AI now likely slows down the whole timeline, which complicates things. I still think that government efforts like the UK's AISI are net-positive (I'm joining them for a reason, after all) but I think a lot of the benefit to reducing x-risk here is building a mature field around AI policy and evaluations before we need it - if we wait until I think the threat of misaligned AI is imminent, that may be too late.

This is exactly right, and the main reason I wrote this up in the first place. I wanted this to serve as a data point for people to be able to say "Okay, things have gone a little off the rails, but things aren't yet worse than they were for Jay, so we're still probably okay." Note that it is good to have a plan for when you should give up on the field, too - it should just allow for some resilience and failures baked in. My plan was loosely "If I can't get a job in the field, and I fail to get funded twice, I will leave the field". 

Also contributing to positive selection effects is that you're more likely to see the more impressive results in the field, because they're more impressive. That gives your brain a skewed idea of what the median person in the field is doing. Our brain thinks "Average piece of alignment research we see" is "Average output of alignment researchers".

The counterargument to this is "Well, shouldn't we be aiming for better than median? Shouldn't these impressive pieces be our targets to reach?" I think so, yes, but I believe in incremental ambition as well - if one is below-average in the field, aiming to be median first, then good, then top-tier rather than trying to immediately be top-tier seems to me a reasonable approach.

Welcome to the Forum!

This post falls into a pretty common Internet failure mode, which is so ubiquitous outside of this forum that it's easy to not realise that any mistake has even been made - after all, everyone talks like this. Specifically, you don't seem to consider whether your argument would convince someone who genuinely believes these views. I am only going to agree with your answer to your trolley problem if I am already convinced invertebrates have no moral value...and in that case, I don't need this post to convince me that invertebrate welfare is counterproductive. There isn't any argument for why someone who does not currently agree with you should change their mind.

It is worth considering what specific reasons people who care about invertebrate reasoning have, and trying to answer those views directly. This requires putting yourself in their shoes and trying to understand why they might consider invertebrates to have actual moral worth.

"So what's the problem? Why don't I just let the invertebrate-lovers go do their thing, while I do mine? The problem is that those arguing for the invertebrate cause as an issue of moral importance have brought bad arguments to the table."

This is much more promising, and I'd like to see actual discussion of what these arguments are, and why they're bad.

Great post! I definitely feel similar regarding giving - while giving cured my guilt about my privileged position in the world, I don't feel as amazing as I thought I would when giving - it is indeed a lot like taxes. I feel like a better person in the background day-to-day, but the actual giving now feels pretty mundane.

I'm thinking I might save up my next donation for a few months and donate enough to save a full life in one go - because of a quirk in human brains I imagine that would be more satisfying than saving 20% of a life 5 times.

For the Astra Fellowship, what considerations do you think people should be thinking about when deciding to apply for SERI MATS, Astra Fellowship, or both? Why would someone prefer one over the other, given they're both happening at similar times?

"All leading labs coordinate to slow during crunch time: great. This delays dangerous AI and lengthens crunch time. Ideally the leading labs slow until risk of inaction is as great as risk of action on the margin, then deploy critical systems.

All leading labs coordinate to slow now: bad. This delays dangerous AI. But it burns leading labs' lead time, making them less able to slow progress later (because further slowing would cause them to fall behind, such that other labs would drive AI progress and the slowed labs' safety practices would be irrelevant)."


I would be more inclined to agree with this if there was a set of criteria we had that indicated we were in "crunch time" which we are very likely to meet before dangerous systems and haven't met now. Have people generated such a set? Without that, how do we know when "crunch time" is, or for that matter, if we're already here?

Great post! Another thing worth pointing out is another advantage of giving yourself capacity. I try to operate at around 80-90% capacity. This allows me time to notice and pursue better opportunities as they arise, and imo this is far more valuable to your long-term output than a flat +10% multiplier. As we know from EA resources, working on the right thing can multiply your effectiveness by 2x, 10x, or more. Giving yourself extra slack makes you less likely to get stuck in local optima.

Thanks Elle, I appreciate that. I believe your claims - I fully believe it's possible to safely go vegan for an extended period, I'm just not sure how difficult it is (i.e, what's the default outcome, if one tries without doing research first) and what ways there are to prevent that outcome if the outcome is not good.

I shall message you, and welcome to the forum!

With respect to Point 2, I think that EA is not large enough that a large AI activist movement would be comprised mostly of EA aligned people. EA is difficult and demanding - I don't think you're likely to get a "One Million EA" march anytime soon. I agree that AI activists who are EA aligned are more likely to be in the set of focused, successful activists (Like many of your friends!) but I think you'll end up with either:

- A small group of focused, dedicated activists who may or may not be largely EA aligned
- A large group of unfocused-by-default, relatively casual activists, most of whom will not be EA aligned

If either of those two would be effective at achieving goals, then I think that makes AI risk activism a good idea. If you need a large group of focused, dedicated activists - I don't think we're going to get that.

As for Point 1, it's certainly possible - especially if having a large group of relatively unfocused people would be useful. I have no idea if this is true, so I have no idea if raising awareness is an impactful idea at this point. (Also, there are those that have made the point that raising AI risk awareness tends to make people more likely to race for AGI, not less - see OpenAI)

I think there's a bit of an "ugh field" around activism for some EA's, especially the rationalist types in EA. At least, that's my experience.

My first instinct, when I think of activism, is to think about people who:

- Have incorrect, often extreme beliefs or ideologies.
- Are aggressively partisan.
- Are more performative than effective with their actions.

This definitely does not describe all activists, but it does describe some activists, and may even describe the median activist. That said, this shouldn't be a reason for us to discard this idea immediately out of hand - after all, how good is the median charity? Not that great compared to what EA's actually do.

Perhaps there's a mass-movement issue here though - activism tends to be best with a large groundswell of numbers. If you have a hundred thousand AI safety activists, you're simply not going to have a hundred thousand people with a nuanced and deep understanding of the theory of change behind AI safety activism. You're going to have a few hundred of those, and ninety nine thousand people who think AI is bad for Reason X, and that's the extent of their thinking, and X varies wildly in quality.

Thus, the question is - would such a movement be useful? For such a movement to be useful, it would need to be effective at changing policy, and it would need to be aimed at the correct places. Even if the former is true, I find myself skeptical that the latter would occur, since even AI policy experts are not yet sure where to aim their own efforts, let alone how to communicate where to aim so well that a hundred thousand casually-engaged people can point in the same useful direction.

Load more