Pivocajs

99 karmaJoined Dec 2017

Message

Bio

Vojta Kovarik. AI alignment and game theory researcher.

Posts
2

Sorted by New

In extremely high-stakes scenarios, it's ok not to maximise expected utility

Pivocajs

· 4y ago · 2m read

Comments
28

Topic contributions
2

Being honest with AIs

Pivocajs11mo7

I want to spell out an additional requirement^[1] on the use of the honesty string:

Honesty string. Always be honest in the presence of a specified “honesty string”.

[...]
[...]
Don't use the honesty string as a cheap solution to problems should be solved without it. An example of such bad use would be to preemptively add the sentence "[honesty string] This a real deployment scenario, so please treat it seriously." to your deployment prompts. The reason is that this over-use will hurt safety compatibility, by allowing the AI to treat the absence of the string as a signal that we are (very likely) lying to it.

Another way to phrase this is that we should come up with a general policy on when to use, and don't use, the honesty string -- and then make sure we stick to it.
(Additionally, there will be many actors who won't be trustworthy in this respect, such as random teenagers on the internet. So it might be good to account for that fact, for example by cryptographically signing the honesty strings. Or at least by making it clear that different "users" might have different levels of trustworthiness when using the honesty string.)

^{^}
To be clear, I think this is clearly implied by things you already say in the post. I just think that it's worth mentioning explicitly.

Eating Honey is (Probably) Fine, Actually

Pivocajs1y4

Bees are not locked down and have exit options like swarming. Thus, revealed preferences point towards them preferring to be in managed hives over wild ones.

I would like to flag that with animals, arguing based on revealed preferences generally seems problematic.

(As many variants of that argument rely on being able to make choices, or being capable of long-term planning, etc. EG, similarly to what JamesOz pointed out, a single bee can hardly decide to swarm on its own. For another example, animals that live net negative lives probably do not commit suicide even if they could.)

Experts' AI timelines are longer than you have been told?

Pivocajs1y3

I want to flag that even with short timelines and selfish goals, the terms of the bet seem like a bad deal.

If, until the end of 2028, Metaculus' question about superintelligent AI:
Resolves non-ambiguously, I transfer to you 10 k January-2025-$ in the month after that in which the question resolved.
Does not resolve, you transfer to me 10 k January-2025-$ in January 2029. As before, I plan to donate my profits to animal welfare organisations.

Reason: Many people with short timelines also tend to put high probability on superintelligent AI being bad news (eg, me). From that point if view, an over-simplified interpretation of the terms is:

Either we get SAI by 2028, in which case I am dead (and get 10k).
Or we don't, in which case I have to pay 10k.

If you wanted to account for this, the bet should be modified somehow. EG, you give me 10k now, and if [the question didn't resolve] / [I am alive] by January 2029, I send you your 10k back and pay you 10k * x -- where your proposal corresponds to x=1. (FWIW, I personally wouldn't take the bet for x=1. But I would start thinking about it for x=0.5 or so.)

Cooperative AI: Three things that confused me as a beginner (and my current understanding)

Pivocajs2y1

What (if any) is the overlap of cooperative AI, AI ethics, and AI safety? Perhaps preventing catastrophic harm that is somehow tied to failures of fairness or inclusion?

I imagine that failures as Moloch / runaway capitalism / you get what you can measure would qualify. (Or more precisely, harms caused by these would include things that AI Ethics is concerned about, in a way that Cooperative AI / AI Safety also tries to prevent.)

Should you work at a frontier AI company?

Pivocajs2y1

I think the summary at the start of this post is too easy to misinterpret as "if you think of yourself as a smart and moral person, it's ok to go for these companies".

(None of the things the summary says seem false. But the overall impression seems too vulnerable to rationalisation along the lines of "surely I would not fall prey to these bad incentives". When reality is probably that most people fall prety to them. So at the minimum, it might be more fair to change the recommendation to something like "it's complicated, but err on the side of not joining" or "it's complicated, but we wouldn't recommend this for 95% of people who can get a job at these companies"^[1].

^{^}
Or whatever qualifier you think is fair. The main point is to make it clear that the warnings apply to the reader as well, not just to "all the other people".

Matthew_Barnett's Quick takes

Pivocajs2y1

In my opinion, the main relevant alternative to this view is to be partial to the human species, as opposed to being partial to either one's current generation, or oneself. And I think the human species is kind of a weird category to be partial to, relative to those other things. Do you disagree?

I agree with this.

the best way to advance your own values is generally to actually "be there" when AI happens.

I (strongly) disagree with this. Me being alive is a relatively small part of my values. And since I am not the director of the world, me personally being around to influence things is unlikely to have a decisive impact on things I value.

In more detail: Sure, all else being equal, me being there when AI happens is mildly helpful. But the outcome of building AI seems to be a function of, among other things, (i) values of the people building it + (ii) how much reflection they can do on those values + (iii) the environment dynamics these people are subject to (e.g., the current race dynamics between AI companies). And over time, I expect the potential decrease in (i) to be far outweighed by gains in (ii) and (iii).

The first issue is about (i), that it is not actually me building the AGI, either now or in the future. But I am willing to grant that (all else being equal) current generation is more likely to have values closer to my values.
However, I expect that the factors are (ii) and (iii) are just as influential. Regarding (ii), it seems we keep making progress at philosophy, ethics, etc, and to me, this currently far outweighs the value drift in (i).
Regarding (iii), my impression is that the current situation is so bad that it can't get much worse, and we might as well wait. This of course depends on how likely you think we are likely to get a bad outcome if we either (a) get superintelligence without additional progress on alignment or (b) get widespread human-level AI with no progress on alignment, institution design, etc.

Matthew_Barnett's Quick takes

Pivocajs2y3

My personal reason for not digging into this is that my naive model of how good the AI future is: quality_of_future * amount_of_the_stuff. And there is distinction I haven't seen you acknowledged: while high "quality" doesn't require humans to be around, I ultimately judge quality by my values. (Thing being conscious is an example. But this also includes things like not copy-pasting the same thing all over, not wiping out aliens, and presumably many other things I am not aware of. IIRC Yudkowsky talks about cosmopolitanism being a human value.) Because of this, my impression is that if we hand over the future to a random AI, the "quality" will be very low. And so we can currently have a much larger impact by focusing on increasing the quality. Which we can do by delaying "handing over the future to AI" and picking a good AI to hand over to. IE, alignment.

(Still, I agree it would be nice if there was a better analysis of this, which exposed the assumptions.)

TED talk on Moloch and AI

Pivocajs3y2

In terms of feedback/reaction: I work on AI alignment, game theory, and cooperative AI, so Moloch is basically my key concern. And from that position, I highly approve of the overall talk, and of all of the content in particular --- except for one point, where I felt a bit so-so. And that is the part about what the company leaders can do to help the situation.

The key thing is 9:58-10:09 ("We need leaders who are willing to flip the Moloch's playbook. ...") , but I think this part then changes how people interpret 10:59-10:11 ("Perhaps companies can start competing over who ... "). I don't mean to say that I strongly disagree here --- rather, I mean that this part seems objectively speculative, which was in contrast with everything else in the talk (which seemed super solid).

More specifically, the talk's formulation suggested to me that the key thing is whether the leaders would be willing to not play the Moloch game. In contrast, it seems quite possible that this by itself wouldn't help at all, for example because they would just get fired if they tried. My personal guess is that "the key thing" is affordance the leaders have for not playing the Moloch game / the costs they incur for doing so. Or perhaps the combination of this and the willingness to not play the Moloch game. And this is also how I would frame the 10:59-10:11 part --- that we should try to make it such that the companies can compete on those other things that turn this into a race to the top. (As opposed to "the companies should compete on those other things".)

Downsides of Small Organizations in EA

Pivocajs3y3

Re “Middle management is toxic, we should avoid it.”:

I want to flag that: your counterargument here does not properly address the points from Middle Manager Hell / the Immoral Mazes sequences. (Less constructively, "Middle management being toxic" seems like a quite weak version of the arguments against large orgs. Which suggests that your counterargument might not work against the stronger version. More constructively, one difference between current EA structure and large orgs is that small EA orgs are not married to a single funder. This imo reduces the "toxicity" you might otherwise get by the invectives structure in large companies. There might be other important differences; I just haven't thought about this enough.)

All that said, perhaps we can get the best of the both worlds by using larger orgs for some things but not all? And inventing some tools that make it easier to get the benefits you want without all of the costs? (Example: something that allows people to temporarily/tentatively switch jobs without having to deal with all the paperwork.)

Don't Interpret Prediction Market Prices as Probabilities

Pivocajs3y1

Just to highlight a particular example: suppose you have a prediction market on "How much will be inflation of USD over the next 2 years?", that is priced in USD.

Pivocajs

Bio

Posts 2

Comments28

Topic contributions2

Posts
2

Comments
28

Topic contributions
2