H

harfe

927 karmaJoined

Posts
1

Sorted by New
5
· · 1m read

Comments
168

warning the Tesla founder his wealth would go to “left-wing nonprofits that will be chosen by Bill Gates."

Am I missing something or does this argument make no sense? As far as I can tell, Musk can easily fulfill his giving pledge by giving to his preferred not-left-wing nonprofits without deferring to Bill Gates.

I don't think so.

Some less tribalistic hypotheses I can think of:

  • EAs concerned about animal welfare have typically focused on farmed animals, as opposed to animal testing, because of the much larger scale of the suffering
  • EAs mostly haven't heard of it.
  • Maybe some EAs have heard about it, but they don't think it is worth the effort to write a post about it.

But tribalistic explanations could be a factor too (e.g. MAHA has anti-science vibes, and EAs like to stay on the pro-science side).

(This is probably not the most constructive feedback, but my initial reaction to this short form was that it felt like a right-wing analog of left-wing "Why don't the EAs tweet about Gaza?"-style criticisms).

I think halting undecidability and Rice's theorem are being misapplied here. It is true that no algorithm can determine, for every possible program and input, whether that program will halt. But for specific programs and inputs, it is often possible to figure out whether they halt or not.

I agree that there is no method that allows us to check all possible AGI designs for a specific nontrivial behavioral property. But this does not forbid us to select an AGI design for which we can prove that it has a specific behavioral property!

Can you say more on why you think a 1:24 ratio is the right one (as opposed to lower or higher ratios)? And how might this ratio differ for people who have different beliefs than you, for example about xrisk, LTFF, or the evilness of these companies?

I do not recall seeing this usage in AI safety or LW circles. Can you link to examples?

Once upon a time, some people were arguing that AI might kill everyone, and EA resources should address that problem instead of fighting Malaria. So OpenPhil poured millions of dollars into orgs such as EpochAI (they got 9 million). Now 3 people from EpochAI created a startup to provide training data to help AI replace human workers. Some people are worried that this startup increases AI capabilities, and therefore increases the chance that AI will kill everyone.

However, a model trained to obey the RLHF objective will expect negative reward if decided taking over the world

If an AI takes over the world there is no-one around to give it a negative reward. So the AI will not expect a negative reward for taking over the world

The issue is not whether the AI understands human morality. The issue is whether it cares.

The arguments from the "alignment is hard" side that I was exposed to don't rely on the AI misinterpreting what the humans want. In fact, superhuman AI assumed to be better at humans at understanding human morality. It still could do things that go against human morality. Overall I get the impression you misunderstand what alignment is about (or maybe you just have a different association to words as "alignment" than me).

Whether a language model can play a nice character that would totally give back the dictatorial powers after takeover is barely any evidence whether the actual super-human AI system will step back from its position of world dictator after it has accomplished some tasks.

Load more