Hide table of contents

Please spend two minutes filling in the below polls!

Planning where we focus at CaML requires forming views on many controversial questions, particularly with regards to alignment. In many cases, people we've talked to have very different intuitions about where the alignment community stands on these issues. These polls will help us get a sense of where the main areas of (dis)agreement lie. 

Please feel free to tell us if you think the questions are ambiguous or embed false assumptions. 
EDIT: Please answer based on your own best guess (and confidence) in these questions.

47

0
0
1

Reactions

0
0
1
New Answer
New Comment

1 Answers sorted by

Robust alignment requires alignment-relevant intervention during pretraining

I'd say this is the wrong question. Like, I do not expect that any current alignment approach is going to work. If we do ever figure out what works, it will not look like "pretraining" or "post-training", it will be something completely different.

Although I guess you could call that "pretraining"?

Thanks Michael, we avoided mentioning post-training to imply that "new paradigm needed" would also count on the "disagree" side of the spectrum. In other words, "disagree" on this question would mean either "post-training is sufficient" or "new paradigms are needed/sufficient".

Comments25
Sorted by Click to highlight new comments since:
Ozzie Gooen
2
0
0
60% disagree

Multipolar worlds will compete away >90% of net value that would otherwise be preserved

If they're halfway-reasonable, they could use smart AIs to negotiate for them. Big question is who will control these worlds. 

I think it's likely humans will settle on AI solutions that lose 90% of the value vs. my optimal solution, but that's very much a values question, not a multipolar vs. unipolar question. 

Ozzie Gooen
2
0
0
50% agree

AI alignment to humans will in practice avoid moral catastrophes to animals

I expect certain conservative/religious communities to lock-in values that could be really bad. But I'd expect that better tech can remove say ~90% of the damages? But this is very hand-wavy. 

Max Clarke
2
0
0
70% agree

Alignment to specific values is underrated in research relative to control

Yes, I think control is a waste of time. We need actual alignment to actual (universalized) values.

Max Clarke
2
0
0
0% agree

Research into digital mind suffering is sufficiently tractable to work on

I don't know.

Max Clarke
2
0
0
70% agree

AI alignment to humans will in practice avoid moral catastrophes to animals

Alignment requires a mechanical understanding of good and bad, and it will be clear how to apply it to animals. Note that wild animal suffering arguments imply that the status quo is likely a moral catastrophe. I believe an aligned entity or system would attempt to change that.

NickLaing
2
0
0
40% agree

I think the world is more likely to not end then end, when TAI comes in so I feel like I have to vote agree here?

Max Clarke
2
0
0
70% agree

AI alignment to humans will in practice avoid moral catastrophes to digital minds

Likewise, alignment requires a mechanical understanding of good and bad, and it will be clear how to apply it to digital minds.

Paolo Bova
1
0
0
70% disagree

AI alignment to humans will in practice avoid moral catastrophes to animals

Humans are currently very motivated to perpetuate moral catastrophes to animals. If AI alignment means aligned to the intent of their users, then AI systems help humans perpetuate moral catastrophes. If AI alignment is in terms of human moral preferences, then even well-chosen mechanism for aggregating human preferences will select for speciest values. There is a strong sense in which avoiding moral catastrophes to animals is usually misaligned with human preferences. Admittedly the same could be said of other moral issues such as attitudes towards outgroups and foreigners. There appears to be room in the current human alignment agenda for ensuring AI does not succumb to tribal prejudices, so there is likely scope for compatability between the current alignment agenda and avoiding moral catastrophes to animals. It does not happen by default and given how deep speciesm goes, it is likely much harder to avoid. Hence, why I still disagree with this poll as written. 

Research into digital mind suffering is sufficiently tractable to work on

I mildly agree, but I specifically mean "research into". I haven't seen any compelling interventions (including e.g. letting Claude stop chats).

Max Clarke
1
0
0
100% agree

Multipolar worlds will compete away >90% of net value that would otherwise be preserved

Strongly agree

Tristan Katz
1
0
0
60% disagree

Research into digital mind suffering is sufficiently tractable to work on

I am yet to see any reliable way to test for consciousness in AI systems. More fundamentally, since current LLMs are trained to respond in human-like ways, any appearance of suffering should be viewed with great scepticism. The likes of Anthropic's welfare report strikes me as nothing more than humane-washing. 

Until more reliable methods are devised, I do not view this as tractable (but I hope to be proven wrong). I think it is important for some people to work on, but people already are and I think the marginal benefit of additional labor is likely low. 

Partially aligned transformative AIs are likely to be stable under reflection

I'm not sure what this means (stable, under reflection) - can someone help?

Some people believe that if we get partial alignment (i.e. cares about what we want, but also cares about other things) then we can get decent outcomes for the future (analogous to humans being partially aligned to each other). But others think that if we don't get alignment perfect ASIs will have incentive to take over, and then will either have value-drift towards something orthogonal to humans or will deliberately reformat it's own values. "Stable under reflection" is the opinion that this wouldn't happen: that ASIs that care somewhat about humans would continue to care somewhat about humans in the long term

AI alignment to humans will in practice avoid moral catastrophes to animals

Alignment to humans means (for me) that the AI would serve the intended goals of the user and their creators. Avoiding a moral catastrophe to animals, on the other hand, imply a ban to factory farming. Those are two separated things

That's definitely a valid perspective, consistent with your 100% disagree answer. Other people think that aligned ASI would end things like factory farming due to abundance, cheap synthetic meat, uploading, shifts in values, or something else. There's also debates around what it would mean for wild animals

I think it's a good response, but definitely techno-optimism. 

Firstly, we're yet to see whether synthetic meat actually can be made more cheaply, right? Currently it seems like animals actually do make meat fairy efficiently when you consider the important work that their immune systems do (unless I'm mistaken, contamination is one of the main barriers to scaling up synthetic meat). And then, who's to say that ASI won't genetically engineer animals to produce meat more efficiently while ignoring their suffering.

Secondly, there's the more complicated cultural reasons for continuing animal use. Consider that a lentil dal, seitan curry and beyond burger are already delicious - if it was only about efficiency we'd have stopped abusing animals already. But people like eating animals. 

I'm very uncertain about these arguments, but I think it's hard to know so I'm wary of anyone who's too optimistic!

Max Clarke
1
0
0
0% agree

Robust alignment requires alignment-relevant intervention during pretraining

 

Frankly I neither agree nor disagree with this statement. Robust alignment has nothing to do with the current pre training regime. It should work with or without it.

If robust alignment is orthogonal to pretraining then shouldn't that mean a strong disagreement with the statement (that alignment requires pretraining)?

Max Clarke
1
0
0
100% disagree

Partially aligned transformative AIs are likely to be stable under reflection

I disagree that "partially aligned" is a statement that has meaning here.

Daniel Juhl
1
0
0
90% disagree

AI alignment to humans will in practice avoid moral catastrophes to digital minds

I think it is likely that alignment to humans will be at the cost to the digital minds themselves by default.

Tristan Katz
1
0
0
60% disagree

AI alignment to humans will in practice avoid moral catastrophes to animals

I think this is pretty obvious - we already have a moral catastrophe for animals, there's no reason why alignment to humans would avoid this. 

I didn't vote at the extreme because alignment to humans might still be a precondition for avoiding catastrophes. 

Tristan Katz
1
0
0
50% disagree

AI alignment to humans will in practice avoid moral catastrophes to digital minds

I have very low certainty on this, but it seems plausible to me that if AGI shares humanity's goals, it might just have a good time fulfilling them with few conflicts.

But it also seems quite possible that this won't happen, I.e. AGI pursues humanity's goals but is constantly frustrated that it can't achieve them better. 

So my stance is unlikely but possible. 

StanislavKrym
1
0
0
100% disagree

Multipolar worlds will compete away >90% of net value that would otherwise be preserved

Per AI-2027, I expect the emergence of Consensus-1 instead of a multipolar world which KEEPS being multipolar. 

I slightly disagreed with this statement and share some of the same thoughts. I think it's quite likely to have a multi-polar world with fierce competition in the short term; however, in the long term equilibrium, I think the likely outcomes are either (1) we have a dominant winner or (2) we have more cooperation. So I averaged my short vs. long-term predictions.

I think it's important to research into multi-polarity and the competition dynamic because what happens in the short term could impact what happens in the long term, possibly in non-intuitive ways. For instance, the most capable and resourced model/lab in the short term may not always win in the long term if others gang up on them or if the institutional environment uniquely disadvantages them. 

Daniel Juhl
1
0
0
70% disagree

AI alignment to humans will in practice avoid moral catastrophes to animals


There is likely to be a correlation between AIs aligned to humans and AIs treating animals well, but but being aligned to humans will be insufficient - see the current state of how we treat animals.

Curated and popular this week
Relevant opportunities