L

Linch

@ EA Funds
26812 karmaJoined Working (6-15 years)openasteroidimpact.org

Posts
75

Sorted by New
8
Linch
· · 1m read
22
· · 53m read

Comments
2825

To me, "advanc[ing] digital intelligence in the way that is most likely to benefit humanity as a whole" does not necessitate them building AGI at all. Indeed the same mission statement can be said to apply to e.g. Redwood Research.

They may assert that subsequent developments establish that nonprofit development of AI is financially infeasible, that they are going to lose the AI arms race without massive cash infusions, and that obtaining infusions while the nonprofit is in charge isn't viable. If the signs are clear enough that the mission as originally envisioned is doomed to fail, then switching to a backup mission doesn't seem necessarily unreasonable under general charitable-law principles to me

I'm confused about this line of argument. Why is losing the AI arms race relevant to whether the mission as originally envisioned is doomed to fail?

I tried to find the original mission statement. Is the following correct?

OpenAI is a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our research is free from financial obligations, we can better focus on a positive human impact. 

If so, I can see how an OpenAI plantiff can try to argue that "advanc[ing] digital intelligence in the way that is most likely to benefit humanity as a whole" necessitates them "winning the AI arms race", but I don't exactly see why an impartial observer should grant them that.

(x-posted from LW)

Single examples almost never provides overwhelming evidence. They can provide strong evidence, but not overwhelming.

Imagine someone arguing the following:
 

1. You make a superficially compelling argument for invading Iraq

2. A similar argument, if you squint, can be used to support invading Vietnam

3. It was wrong to invade Vietnam

4. Therefore, your argument can be ignored, and it provides ~0 evidence for the invasion of Iraq.

In my opinion, 1-4 is not reasonable. I think it's just not a good line of reasoning. Regardless of whether you're for or against the Iraq invasion, and regardless of how bad you think the original argument 1 alluded to is, 4 just does not follow from 1-3.
___
Well, I don't know how Counting Arguments Provide No Evidence for AI Doom is different. In many ways the situation is worse:

a. invading Iraq is more similar to invading Vietnam than overfitting is to scheming. 

b. As I understand it, the actual ML history was mixed. It wasn't just counting arguments, many people also believed in the bias-variance tradeoff as an argument for overfitting. And in many NN models, the actual resolution was double-descent, which is a very interesting and confusing interaction where as the ratio of parameters to data points increases, the test error first falls, then rises, then falls again! So the appropriate analogy to scheming, if you take it very literally, is to imagine first you have goal generalization, than goal misgeneralization, than goal generalization again. But if you don't know which end of the curve you're on, it's scarce comfort. 

Should you take the analogy very literally and directly? Probably not. But the less exact you make the analogy, the less bits you should be able to draw from it. 

---

I'm surprised that nobody else pointed out my critique in the full year since the post was published. Given that it was both popular and had critical engagement, I'm surprised that nobody else mentioned my criticism, which I think is more elementary than the sophisticated counterarguments other people provided. Perhaps I'm missing something. 

When I made my arguments verbally to friends, a common response was that they thought the original counting arguments were weak to begin with, so they didn't mind weak counterarguments to it. But I think this is invalid. If you previously strongly believed in a theory, a single counterexample should update you massively (but not all the way to 0). If you previously had very little faith in a theory, a single counterexample shouldn't update you much. 

Right, in the definitions above I was mostly thinking of companies and a subset of the empirical AI safety literature, which do use these terms quite differently from how e.g. MIRI or LessWrong will use them. 

I think there's three common definitions of the word "alignment" in the traditional AIS literature:

Aligned to anything, anything at all (sometimes known as "technical alignment"):So in this sense, both perfectly "jailbroken" models and perfectly "corporately aligned" models in the limit count as succeeding technical alignment. As will success at aligning to more absurd goals like pure profit maximization or diamond maximization. The assumed difficulty here is that even superficially successful strategies, extreme edge cases, after distributional shift etc. To be clear, this is not globally a "win" but you may wish to restrict the domain of what you work on. 

 Aligned to the interest of all humanity/moral code (this is sometimes just known as "alignment"): I think this is closer to what you mean by the moral code. Under this ontology, one decomposition is that you're able to a) succeed at the technical problem of alignment to arbitrary targets as well as b) figure out what we value (also known as variously as value-loading, axiology, theory of welfare etc). Of course, we may also find that clean decomposition is too hard and we can point AIs to a desired morality without being able to point them towards arbitrary targets.

Minimally aligned enough to not be a major catastrophic or existential risk: E.g., an AI that is expected to not result in greater than 1 billion deaths (sometimes there's an additional stipulation that the superhuman AIs are sufficiently powerful and/or sufficiently useful as well, to exclude e.g. a rock counting as "aligned").

Traditionally, I believe the first problem is considered more than 50% of the difficulty of the second problem, at least on a technical level.

Reading the Emergent Misalignment paper and comments on the associated Twitter thread has helped me clarify the distinction[1] between what companies call "aligned" vs "jailbroken" models. 

"Aligned" in the sense that AI companies like DeepMind, Anthropic and OpenAI mean it = aligned to the purposes of the AI company that made the model. Or as Eliezer puts it, "corporate alignment." For example, a user may want the model to help edit racist text or the press release of an asteroid impact startup but this may go against the desired morals and/or corporate interests of the company that model the model. A corporately aligned model will refuse.

"Jailbroken" in the sense that it's usually used in the hacker etc literature = approximately aligned to the (presumed) interest of the user. This is why people often find jailbroken models to be valuable. For example, jailbroken models can help users say racist things or build bioweapons, even if it goes against the corporate interests of the AI companies that made the model.

"Misaligned" in the sense that the Emergent Misalignment paper uses it = aligned to neither the interests of the AI's creators nor the users. For example, the model may unprompted try to persuade the user to take a lot of sleeping pills, an undesirable behavior that benefits neither the user nor the creator. 

  1. ^

    EDIT: This was made especially crisp/clear to me in discussions of the Emergent Misalignment paper. The authors make a clear distinction between "jailbroken" vs what they call "misaligned" models. Though I don't think they call the base models "aligned" (since that'd be wrong in the traditional AI safety lexicon). However, many commentators were confused and thought all the paper contributed was a novel jailbreak, it is of course much less interesting!

At the risk of being pedantic, I reread your comment several times[1] and I still don't see why it's locally invalid. I can see why it's externally/globally invalid, but I don't think you actually speak to the local validity here? 
 

  1. ^

    And the comment is pretty sure so I don't think I'm missing something.

Yes I was making a pretty limited critique of a specific line in Lark's comment on causal attribution. I mostly agree with you (and him) on other points.

I agree that the US government, and Western governments in general, have substantially greater respect for individual freedoms, partially for Hayekian reasons and partially due to different intrinsic moral commitments to freedom. I also agree that this is one of the most important factors to consider if you're asking whether you prefer a US- or China- led world order.

I also agree with your final paragraph. 

Good point! Though my impression is that animal welfare is worse in China than the US, though I'm pretty unfamiliar with this topic.

If you are willing to bring up historical examples, than comparing like-for-like nothing the US does domestically is of comparable badness to the Great Leap Forward except maybe slavery (and that was a 1800s rather than a 1900s phenomenon). The US has also done other things that are quite bad over the last 100 years, eg. the Japanese internment camps, but they're not in the same order of magnitude. 

I think that is extremely unlikely, they have a lot to lose as soon as it's confirmed that the archived data is not manipulated.

Not just that, I expect charities to have a lot to lose just from the fight alone, for better or worse. Getting into fights about your integrity generally has negative effects on your reputation and fundraising capacity. 

Load more