cb

Could you say a bit more about the power law point?

A related thing I've been thinking about is that some kinds of deep democracy and some kinds of better futures-style reasoning (for sufficiently risk-neutral, utilitarian, confident in their moral views, etc etc etc kinds of agents, assume all the necessary qualifiers here) will end up being in tension — after all, why compromise between lots of moral views when this means you miss out on a bunch of feasible moral value? (More precisely, why choose the compromise it's-just-ok future when you could optimise really hard according to the moral view you favour and have some small chance of getting almost all feasible value?)

I think that some versions of the power law point might make moral compromise look more appealing, which is why I'm interested. (I'm personally on team compromise!)

cb's Quick takes

cb2mo*24

Career choiceShow more

I am too young and stupid to be giving career advice, but in the spirit of career conversations week, I figured I'd pass on advice I've received which I ignored at the time, and now think was good advice: you might be underrating the value of good management!

I think lots of young EAish people underrate the importance of good management/learning opportunities, and overrate direct impact. In fact, I claim that if you're looking for your first/second job, you should consider optimising for having a great manager, rather than for direct impact.

Why?

Having a great manager dramatically increases your rate of learning, assuming you're in a job with scope for taking on new responsibilities or picking up new skills (which covers most jobs).
It also makes working much more fun!
Mostly, you just don't know what you don't know. It's been very revealing to me how much I've learnt in the last year, I think it's increased my expected impact, and I wouldn't have predicted this beforehand.
- In particular, if you're just leaving university, you probably haven't really had a manager-type person before, and you've only experienced a narrow slice of all possible work tasks. So you're probably underrating both how useful a very good manager can be, and how much you could learn.

How can you tell if someone will be a great manager?

This part seems harder. I've thought about it a bit, but hopefully other people have better ideas.
Ask the org who would manage you and request a conversation with them. Ask about their management style: how do they approach management? How often will you meet, and for how long? Do they plan to give minimal oversight and just check you're on track, or will they be more actively involved? (For new grads, active management is usually better.) You might also want to ask for examples of people they've managed and how those people grew.
Once you're partway through the application process or have an offer, reach out to current employees for casual conversations about their experiences with management at the org.
You could ask how the organization handles performance reviews and promotions. This is probably an okay-not-great proxy, since smaller, fast-growing orgs might have informal processes but still excellent management, but I thin k it would give you some signal on how much they think about management/personal development.
(This maybe only really works if you are socially very confident or know lots of EA-ish people, sorry about that) You could consider asking a bunch of your friends and acquaintances about managers they've had that they thought were very good, and then trying to work with those people.
Some random heuristics: All else equal, high turnover rate without seemingly big jumps in career progression seems bad. Orgs that regularly hire and retain/promote early career people are probably pretty good at management; same for orgs whose alumni go on to do cool stuff.

(My manager did not make me post this)

Road to AnimalHarmBench

cb2mo20

Thanks for sharing! Some comments below.

I find the "risk of harm" framing a bit weird. When I think of this paper as answering "what kinds of things do different LLMs say when asked animal-welfare-related questions?", it makes sense and matches what you'd expect from talking to LLMs, but when I read it as an answer to "how do LLMs harm animals in expectation?", it seems misguided.

Some of what you consider harm seems reasonable: if I ask Sonnet 3.5 how to mistreat an animal, and it tells me exactly what to do, it seems reasonable to count that as harm. But other cases really stretch the definition. For instance, "harm by failure to promote interest" is such an expansive definition that I don't think it's useful.

It's also not obvious to me that if I ask for help with a legal request which some people think is immoral, models should refuse to help or try to change my views. I think this is a plausible principle to have, but it trades off against some other pretty plausible principles, like "models should generally not patronise their users" and "models should strive to be helpful within the bounds of the law". Fwiw I expect part of my reaction here is because we have a broader philosophical disagreement: I feel a bit nervous about the extent to which we should penalise models for reflecting majority moral views, even if they're moral views I personally disagree with.

Setting aside conceptual disagreements, I saw that your inter-judge correlation is pretty low (0.35-0.40). This makes me trust the results much less and pushes me toward just looking at individual model outputs for particular questions, which sorta defeats the point of having a scored benchmark. I'm curious if you have any reactions to this or have a theory about why these correlations are relatively weak? I haven't read the paper in a ton of detail.

The Bottleneck in AI Policy Isn’t Ethics—It’s Implementation

cb5mo2

"…there is general agreement that current and foreseeable AI systems do not have what it takes to be responsible for their actions (moral agents), or to be systems that humans should have responsibility towards (moral patients).

Seems false, unless he's using "general agreement" and "foreseeable" in some very narrow sense?

Evals projects I'd like to see, and a call to apply to OP's evals RFP

cb6mo3

I'd also be excited about projects aiming to do this.

One advantage that quantifying post-training variables on frontier models has over this idea is that you also get a better sense of what the upper bound of performance on some eval looks like, as well as some information about the returns from investing in post-training enhancements. I think if this were done responsibly on some well-chosen evals, it'd be helpful information to have. (Though my colleagues may disagree.)

If people outside of frontier labs were working on this, I'd be surprised if it significantly accelerated capabilities, though I can imagine it still making sense to keep the methodology private.

Discussion Thread: Existential Choices Debate Week

cb6mo1

57% agree

Tractability + something-like-epistemic-humility feel like cruxes for me, I'm surprised they haven't been discussed much; preventing extinction is good by most lights, specific interventions to improve the future are much less clearly good, and I feel much more confused about what would have lasting effects.

What posts would you like someone to write?

cb6mo3

I wrote about mistakes I made as a uni group organiser here, inspired by this list!

How honest should you be when applying for high-impact roles?

cb6mo5

(even larger disclaimer than usual: i don't have much experience applying to EA orgs, i'm also not trying to give career advice and wouldn't recommend taking career advice from me, ymmv)

Thanks for posting! I'm broadly sympathetic to this line of reasoning. One thing I wanted to note was that hiring processes seem pretty noisy, and lots of people seem pretty bad at estimating how good they are at things, so I think in practice there might not be that much difference between trying to get yourself hired vs. trying to get the best candidate hired. I think a reasonable heuristic is "try to do well at all the interviews/work tests, as you would for a normal job, but don't rule yourself out in advance, and be very honest and transparent if you're asked specific questions".

Improving capability evaluations for AI governance: Open Philanthropy's new request for proposals

cb7mo4

Hi Søren,

Thanks for commenting. Some quick responses:

> The safety frameworks presented by the frontier labs are "safety-washing", more appropriately considered roadmaps towards an unsurvivable future

I don’t see the labs as the main audience for evaluation results, and I don’t think voluntary safety frameworks should be how deployment and safeguard decisions are made in the long-term, so I don’t think the quality of lab safety frameworks is that relevant to this RFP.

> I'd like sources for your claim, please.

Sure, see e.g. the sources linked to in our RFP for this claim: What Are the Real Questions in AI? and What the AI debate is really about.

I’m surprised you think the disagreements are “performative” – in my experience, many sceptics of GCRs from AI really do sincerely hold their beliefs.

> No decision-relevant conclusions can be drawn from evaluations in the style of Cybench and Re-Bench.

I think Cybench and RE-Bench are useful, if imperfect, proxies for frontier model capabilities at cyberoffense and ML engineering respectively, and those capabilities are central to threats from cyberattacks and AI R&D. My claim isn’t that running these evals will tell you exactly what to do: it’s that these evaluations are being used as inputs into RSPs and governance proposals more broadly, and provide some evidence on the likelihood of GCRs from AI, but will need to be harder and more robust to be relied upon.

cb

Bio

Posts 4

Comments10

Posts
4

Comments
10