RG

Ryan Greenblatt

Member of Technical Staff @ Redwood Research
940 karmaJoined

Bio

This other Ryan Greenblatt is my old account[1]. Here is my LW account.

  1. ^

    Account lost to the mists of time and expired university email addresses.

Comments
187

Topic contributions
2

I don't think non-myopia is required to prevent jailbreaks. A model can in principle not care about the effects of training on it and not care about longer term outcomes while still implementing a policy that refuses harmful queries.

I think we should want models to be quite deontological about corrigibility.

This isn't responding to this overall point and I agree by default there is some tradeoff (in current personas) unless you go out of your way to avoid this.

(And, I don't think training your model to seem myopic and corrigible necessarily suffices as it could just be faked!)

This is an old thread, but I'd like to confirm that a high fraction of my motivation for being vegan[1] is signaling to others and myself. (So, n=1 for this claim.) (A reasonable fraction of my motivation is more deontological.)

  1. ^

    I eat fish rarely as I was convinced that the case for this improving productivity is sufficiently strong.

I suppose the complement to the naive thing I said before is "80k needs a compelling reason to recruit people to EA, and needs EA to be compelling to the people to recruit to it as well; by doing an excellent job at some object-level work, you can grow the value of 80k recruiting, both by making it easier to do and by making the outcome a more valuable outcome. Perhaps this might be even better for recruiting than doing recruiting."

I think there are a bunch of meta effects from working in an object level job:

  • The object level work makes people more likely to enter the field as you note. (Though this doesn't just route through 80k and goes through a bunch of mechanisms.)
  • You'll probably have some conversations with people considering entering the field from a slightly more credible position at least if the object level stuff goes well.
  • Part of the work will likely involve fleshing stuff out so people with less context can more easily join/contribute. (True for most / many jobs.)

I think people wouldn't normally consider it Pascalian to enter a postive total returns lottery with a 1 / 20,000 (50 / million) chance of winning?

And people don't consider it to be Pascalian to vote, to fight in a war, or to advocate for difficult to pass policy that might reduce the chance of nuclear war?

Maybe you have a different-than-typical perspective on what it means for something to be Pascalian?

I agree that it is a poor analogy for AI risk. However, I do think it is a semi-reasonable intuition pump for why AIs that are very superhuman would be an existential problem if misaligned (and without other serious countermeasures).

I think that the political activation of Silicon Valley is the sort of thing which could reshape american politics, and that twitter is a leading indicator.

I don't disagree with this statement, but also think the original comment is reading into twitter way too much.

I haven't seen those comments

Scroll down to see comments.

Once again, if you disagree, I'd love to actually here why.

I think you're reading into twitter way too much.

absence of evidence of good arguments against it is evidence of the absence of said arguments. (tl;dr - AI Safety people, engage with 1a3orn more!)

There are many (edit: 2) comments responding and offering to talk. 1a3orn doesn't appear to have replied to any of these comments. (To be clear, I'm not saying they're under any obligation here, just that there isn't a absence of attempted engagement and thus you shouldn't update in the direction you seem to be updating here.)

The limited duty exemption has been removed from the bill which probably makes compliance notably more expensive while not improving safety. (As far as I can tell.)

This seems unfortunate.

I think you should still be able to proceed in a somewhat reasonable way by making a safety case on the basis of insufficient capability, but there are still additional costs associated with not getting an exemption.

Further, you can't just claim an exemption prior to starting training if you are behind the frontier which will substantially increase the costs on some actors.

This makes me more uncertain about whether the bill is good, though I think it will probably still be net positive and basically reasonable on the object level. (Though we'll see about futher amendments, enforcement, and the response from society...)

(LW x-post)

Load more