Bio

Participation
10

I’m a generalist and open sourcerer that does a bit of everything, but perhaps nothing particularly well. I'm also the Co-Director of Kairos, an AI safety fieldbuilding org.

I was previously the AI Safety Group Support Lead at CEA and a Software Engineer in the Worldview Investigations Team at Rethink Priorities.

Posts
38

Sorted by New

Comments
116

Topic contributions
15

Kairos (my org) doesn't run a ton of ads, so we don't do this, but one thing we recently started doing is to consistently push all of our funnel analytics data to PostHog, and then connect Claude to PostHog via MCP to pull out a more qualitative analysis of our application funnels for programs. I've thought this has been moderately helpful so far at identifying opportunities and letting us understand where applications are coming from. 

Separately, I recently read this blogpost which goes over a similar workflow that is used by growth marketing at Anthropic. I think this is a pretty good reference for orgs thinking about doing this.

Hey! Welcome to the forum and thanks for writing this.

I think this post conflates two different problems. AI safety philanthropic funding (Open Phil, SFF, regranting programs) doesn't generally flow to frontier labs as discretionary capital; it mostly funds specific projects at academic groups, nonprofits, and small research teams, with defined deliverables (when trust is low) or general funding for a charitable purpose (when trust is high). Frontier-lab safety work is funded by the labs themselves out of revenue. So the principal-agent framing doesn't really describe the funding landscape we have, and a philanthropic gate wouldn't touch the actual capital flows we'd care about.

Compute verification (ZKPs of training, hardware-linked attestation, on-chip governance) is extremely promising, but mostly a lever for AI policy, aimed at government regulation of frontier development, not as a grant verification/compliance mechanism. Just in case you're not familiar, I'd recommend looking at FlexHEG and related work on compute verification and AI assurance.

Hey, I downvoted this and wanted to explain why.

I think there's a real concern under this, but the framing overclaims what the evidence supports. The specific examples mostly have mundane, well-documented explanations: contradictory post-hoc rationalizations are a known consequence of LLMs lacking privileged access to their own sampling (calling it "gaslighting" anthropomorphizes in a way that obscures the actual mechanism), and Japanese rendering failures are downstream of CJK tokenization plus much smaller pretraining-data share, a failure mode shared across Arabic, Hindi, Korean, etc. Reading these as evidence of architectural deprioritization specifically targeting non-Western users requires a comparison condition the post doesn't have. I think it's not an implausible hypothesis, but there's a nuance to be had re. intentionality vs this being the default outcome of the relevant pretraining datasets.

I also don't buy the bridge to existential risk. The argument is roughly: current systems optimize for measured things over unmeasured things, therefore documenting cultural-fluency gaps is alignment work at AGI scale. The first step is just Goodhart and applies to almost any deployed ML system; the second step requires assuming that "renders Japanese poorly" is mechanistically predictive of the failure modes that produce catastrophic outcomes from much more capable systems. I don't think that holds. The threat models that drive x-risk concern (deceptive alignment, power-seeking, loss of human oversight) aren't strongly related to multilingual capability being uneven (especially in image models, which aren't generally capable), and collapsing them probably weakens both conversations.

I'd also gently flag that the methodological pushback (the Apart Research response) gets folded into the theory as further evidence of the pattern. I think this is a bad argument, because it proves too much; it is a fully general defense for these types of arguments, pretty much regardless of what's true. 

There's probably a good paper here about multilingual evaluation gaps, and I'd recommend checking some of the existing work on this topic. One possible direction is trying to make this into a controlled cross-language study with blinded LLM or human raters.

Wishing you well on the project!

I don't believe these things are mutually exclusive. The strongest founders/operators I know want to move the needle in a specific market, and if you want those folks, then it helps to frame the conversation around the problems they're already attacking.

Yeah, this makes sense. I agree that this is another intervention we should consider doing in parallel. 

Btw, happy to see more YC founders around :)

(Again, speaking for myself, not the co-authors)

I agree with you that thinking about incentives is important, but if you accept the premise that we're not constrained by capital, then competent founders who have important problems they want to solve can just get funding from AI safety funders rather than wait for the market to supply the needed capital. I do think the selfish economic incentive is different: if you run a nonprofit, you probably won't earn as much money as if you're running a well-valued startup, even if both have raised the same amount of funding.

Right now, AI safety is playing a bit of both strategies: supporting nonprofits via mainstream funders like CG, SFF, etc, and supporting for-profits via initiatives like Halcyon Futures or SAIF. I think we probably need both, since we want to diversify our bets and eat the low-hanging fruit on each side.

This is all to say, I'm not sure if the lever we need to be pulling at the margin is market-shaping. I think the best lever is still probably talent.

Hi Karen,

Thanks for engaging with our post. To be clear, do you think that there's a problem related to how much we signpost the roles we need publicly? It's my impression that the majority of vacant generalist roles are indeed posted and circulated through job boards and LinkedIn, it's just that most of the applicants don't meet the criteria orgs are searching for.

(Answering personally, not necessarily endorsed by the other authors)

I think there's a lot of nuance here. To be clear, I don't think it is the case that people without AI safety context are never a good fit for soft ops roles, and indeed I've seen a few orgs do this successfully, but the nature of those roles tend to be different.

Maybe the biggest consideration here is the level of ownership required by the role. If you're a program manager in a small org, you're basically making highly strategic judgment calls on a weekly basis, and when people don't have strong mental models around AI safety, they tend to make the wrong ones. I think at present the majority of demand for soft ops roles in the ecosystem looks like this: they're roles that require one to make frequent judgment calls that benefit highly from field-specific context and a solid internalization of our priorities. There are important exceptions to this, especially at bigger orgs, where soft ops roles can be more specialized and therefore there's less of a need for a high degree of context and deep mission-alignment (for example, I suspect many soft ops roles at Coefficient Giving are like this).

I also think AI safety makes this particularly hard, because newcomers tend to start with very bad mental models about our priorities; unlike other fields where it's much easier to grasp what the key goals and types of prioritized interventions are. In my experience, it can take months for people to get to the level of context needed to do core work, and it often fails to occur, making investments like this very expensive and risky for organizations.

Load more