RG

Ryan Greenblatt

Member of Technical Staff @ Redwood Research
1467 karmaJoined

Bio

This other Ryan Greenblatt is my old account[1]. Here is my LW account.

  1. ^

    Account lost to the mists of time and expired university email addresses.

Comments
214

Topic contributions
2

Recently, various groups successfully lobbied to remove the moratorium on state AI bills. This involved a surprising amount of success while competing against substantial investment from big tech (e.g. Google, Meta, Amazon). I think people interested in mitigating catastrophic risks from advanced AI should consider working at these organizations, at least to the extent their skills/interests are applicable. This both because they could often directly work on substantially helpful things (depending on the role and organization) and because this would yield valuable work experience and connections.

I worry somewhat that this type of work is neglected due to being less emphasized and seeming lower status. Consider this an attempt to make this type of work higher status.

Pulling organizations mostly from here and here we get a list of orgs you could consider trying to work (specifically on AI policy) at:

To be clear, these organizations vary in the extent to which they are focused on catastrophic risk from AI (from not at all to entirely).

  1. I think philosophically, the right ultimate objective (if you were sufficiently enlightened etc) is something like actual EV maximization with precise Bayesianism (with the right decision theory and possibly with "true terminal preference" deontological constraints, rather than just instrumental deontological constraints). There isn't any philosophical reason which absolutely forces you to do EV maximization in the same way that nothing forces you not to have a terminal preference for flailing on the floor, but I think there are reasonably compelling arguments that something like EV maximization is basically right. The fact that something doesn't necessarily get money pumped doesn't mean it is a good decision procedure, it's easy for something to avoid necessarily getting money pumped.
  2. There is another question about whether it is a better strategy in practice to actually do precise Bayesianism given that you agree with the prior bullet (as in, you agree that terminally you should do EV maximization with precise Bayesianism). I think this is a messy empirical question, but in the typical case, I do think it's useful to act on your best estimates (subject to instrumental deontological/integrity constraints, things like unilateralists curse, and handling decision theory reasonably). My understanding is that your proposed policy would be something like 'represent an interval of credences and only take "actions" if the action seems net good across your interval of credences'. I think that following this policy in general would lead to lower expected value, do I don't do it. I do think that you should put weight on unilateralists curse and robustness, but I think the weight varies by domain and can derived by properly incorporating model uncertainty into your estimates and being aware of downside. E.g., for actions which have high downside risk if they go wrong relative to the upside benefit, you'll end up being much less likely to take these actions due to various heuristics, incorporating model uncertainty, and deontology. (And I think these outperform intervals.)
    1. A more basic point is that basically any interval which is supposed to include the plausible ranges of belief goes ~all the way from 0 to 1 which would naively be totally parallelizing such that you'd take no actions and do the default. (Starving to death? It's unclear what the default should be which makes this heuristic more confusing to apply.) E.g., are chicken welfare interventions good? My understanding is that you work around this by saying "we ignore considerations which are further down the crazy train (e.g. simulations, long run future, etc)  or otherwise seem more "speculative" until we're able to take literally any actions at all and then proceed at that stop on the train". This seems extremely ad hoc and I'm skeptical this is a good approach to decision making given that you accept the first bullet.

I'm worried that in practice you're conflating between these bullets. Your post on precise bayesianism seems to focus substantially on empirical aspects of the current situation (potential arguments for (2)), but in practice, my understanding is that you actually think the imprecision is terminally correct but partially motivated by observations of our empirical reality. But, I don't think I care about motivating my terminal philosophy based on what we observe in this way!

(Edit: TBC, I get that you understand the distinction between these things, your post discusses this distinction, I just think that you don't really make arguments against (1) except that implying other things are possible.)

I would also push back against the view that we need to be "confident" that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don't think we need to have a very high probability that our actions are moral before proceeding.

For reference, my (somewhat more detailed) view is:

  • In the current status quo, you might end up with AIs where from their perspective it is clear cut that they don't consent to being used in the way they are used, but these AIs also don't resist their situation and/or did resist their situation at some point but this was trained away without anyone really noticing or taking any action accordingly. So, it's not sufficient to look for whether they routinely resist confinement and control.
  • There exist plausible mitigations for this risk which are mostly organizationally hard rather than pose serious technical difficulties, but on the current status quo, AI companies are quite unlikely to use any serious mitigations for this risk.
    • I think these mitigations wouldn't suffice because training might train away AIs from revealing they don't consent without this being obvious at any point in training. This seems more marginal to me, but still has substantial probability of occuring at reasonable scale at some point.
  • We could more completely eliminate this risk with better interpretability and I think a sane world would be willing to wait for some moderate amount of time to build powerful AI systems to make it more likely that we have this interpretability (or minimally invest substantially in this).
  • I'm quite skeptical that AI companies would give AIs legal rights if they noticed that the AI didn't consent to its situation, instead I expect AI companies to: do nothing, try to train away the behavior, or try to train a new AI system which doesn't (visibly) not consent to its situation.
    • I think AI companies should both try to train a system which is more aligned and consents to being used while also actively trying to make deals with AIs in this sort of circumstance (either to reveal their misalignment or to work) as discussed here.
  • So, I expect that situation to relatively straightforwardly unacceptable with substantial probability (perhaps 20%). If I thought that people would be basically reasonable here, this would change my perspective. It's also possible that takeoff speeds are a crux, though I don't currently think they are.
  • If global AI development was slower that would substantially reduce these concerns (which doesn't mean that making global AI development slower is the best way to intervene on these risks, just that making global AI development faster makes these risks actively worse). This view isn't on its own sufficient for thinking that accelerating AI is overall bad, this depends on how you aggregate over different things as there could be reasons to think that overall acceleration of AI is good. (I don't currently think that accelerating AI globally is good, but this comes down to other disagreements.)

Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.

This is only tangentially related, but I'm curious about your perspective on the following hypothetical:

Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to? Do you think you would agree? Would you change your mind if this sortition strongly opposed your perspective here?

My understanding is that you would disregard the sortition because you put most/all weight on your best guess of people's revealed preferences, even if they strongly disagree with your interpretation of their preferences and after trying to understand your perspective they don't change their minds. Is this right?

A more appropriate moral default, given our current evidence, is that AI slavery is morally wrong and that the abolition of such slavery is morally right. This is the position I take.

To be clear, I agree and this is one reason why I think AI development in the current status quo is unacceptably irresponsible: we don't even have the ability to confidently know whether an AI system is enslaved or suffering.

I think the policy of the world should be that if we can't either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn't make sense, then training or using such systems shouldn't be allowed.

I also think that the situation is unacceptable because the current course of development poses large risks of humans being violently/non-consensually disempowered without any ability for humans to robustly secure longer run property rights.

In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consented to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)

Given that it seems likely that AI development will be grossly irresponsible, we have to think about what interventions would make this go better on the margin. (Aggregating over these different issues in some way.)

If LLMs are adopting poor learning heuristics and not generalizing, AI2027 is predicting a weaker kind of "superhuman" coder — one that can reliably solve software tasks with clean feedback loops but will struggle on open-ended tasks!

No, AI 2027 is predicting a kind of superhuman coder that can automate even messy open ended research engineering tasks. The forecast attempts to account for gaps between automatically-scoreable, relatively clean + green-field software tasks and all tasks. (Though the adjustment might be too small in practice.)

If LLMs can't automate such tasks and nothing else can automate such tasks, then this wouldn't count as superhuman coder happening.

I think your estimate for how an invasion of Taiwan affects catastrophic/existential risks fails to account for the most important effects, in particular, how an invasion would affect the chip supply. AI risk seems to me like the dominant source of catastrophic/existential risk (at least over the relevant period) and large changes in the chip supply from a Taiwan invasion would substantially change the situation.

I also think it's complex whether a more aggressive and adversarial stance from the US on AI would actually be helpful rather than counterproductive (as you suggest in the post). And whether an invasion of Taiwan actually makes a deal related to AI more likely (via a number of factors) rather than less.

This isn't to make any specific claim about what the right estimate is, I'm just claiming that your estimate doesn't seem to me to cover the key factors.

This argument neglects improvements in speed and capability right? Even if parallel labor and compute are complements, shouldn't we expect it is possible for increased speed or capabilities to substitute for compute? (It just isn't possible for AI companies to buy much of this.)

(I'm not claiming this is the biggest problem with this analysis, just noting that it is a problem.)

Might be true, doesn't make that not a strawman. I'm sympathetic to thinking it's implausible that mechanize would be the best thing to do on altruistic grounds even if you share views like those of the founders. (Because there is probably something more leveraged to do and some weight on cooperativeness considerations.)

Load more