Hide table of contents

Stephen Casper, scasper@mit.edu. Thanks to Alex Lintz and Daniel Dewey for feedback. 

This is a reply but not an objection to a recent post from Paul Christiano titled AI alignment is distinct from its near term applications. The post is fairly brief and the key point is decently summed up by this excerpt.

I worry that companies using alignment to help train extremely conservative and inoffensive systems could lead to backlash against the idea of AI alignment itself. If such systems are held up as key successes of alignment, then people who are frustrated with them may end up associating the whole problem of alignment with “making AI systems inoffensive.”

I have no disagreements with this claim. But I would push back against the general notion that AI [existential] safety work is disjoint from near term applications. Paul seems to agree with this.

We can develop and apply alignment techniques to these existing systems. This can help motivate and ground empirical research on alignment, which may end up helping avoid higher-stakes failures like an AI takeover. 

This post argues for strongly emphasizing this point. 

What do I mean by near-term applications? Any challenging problem involving consequential AI and society. Examples include:

  • Self-driving cars
  • Recommender systems
  • Search engines
  • AI weapons
  • Cybersecurity
  • Unemployment
  • Bias/fairness/justice involving systems that work with humans or human data
  • Misuses of media generators including text and image generators

I argue that working on these problems probably matters a lot for three reasons. The second and third of which are potential matters of existential safety. 

Non X-risks from AI are still intrinsically important AI safety issues.

There are many important non X-risks in this world, and any altruistic-minded person should care about them. For the same reason we should care about health, wealth, development, and animal welfare, we should also care about making important near-term applications of narrow AI go well for people.  

There are valuable lessons to learn from near-term applications.

Imagine that we figured out ways to make near-term applications of AI go very well. I find it incredibly hard to imagine a world in which we did any of these things without developing a lot of useful technical tools and governance strategies that could be retooled or built on for higher-stakes problems later. Consider some examples.

  • For self-driving cars to go well, we would need to effectively develop and iterate on techniques for robustness, reliability, and anomaly detection/handling. 
  • To make recommender systems go well, we would need to make a lot more progress on efficiently inferring what humans truly want in a way that is disentangled from what they seem to want. 
  • To minimize harms from AI weapons, we would need to introduce a lot of national and international laws and precedent for disincentivizing and responding to deadly AI. 
  • To minimize problems from discriminatory AI or the misuses of media generators (e.g. BLOOM or Stable Diffusion), we would need laws and precedent establishing definitions of harm and methods for recourse. Perhaps most importantly, establishing lots of laws and bureaucracy around AI systems like this may help to establish a legal regime that provides filters and obstacles to the deployment of risky systems. If auditing and slow timelines are good, so is this kind of bureaucracy. 

See also this post

Making allies and growing the AI safety field is useful

AI safety and longtermism (AIS&L) have a lot of critics, and in the past year or so, they seem to have grown in number and profile. Many of whom are people who work on and care a lot about near-term applications of AI. To some extent this is inevitable. Having an influential and disruptive agenda will inevitably lead to some pushback from competing ones. Haters are going to hate. Trolls are going to troll.

But AIS&L probably have more detractors than they should from people who should really be allies. Given how many forces in the world are making AI more risky, there shouldn’t be conflict between groups of people who are working on making it go better but in different ways. In what world could isolation and mutual dismissal between AIS&L people and people working on neartermist problems be helpful? There seem to be too many common adversaries and interests between the two groups to not be allies–especially for influencing AI governance. Having more friends and fewer detractors seems like it could only increase the political and research capital of the AIS&L community. There is also virtually no downside of being more popular. 

I think that some negative press about AIS&L might be due to active or tacit dismissal of the importance of neartermist work by AIS&L people. Speaking for myself, I have had a number of conversations in the past few months with non AIS&L who seem sympathetic but have expressed feelings of dismissal by the community which has made them more hesistant to be involved. For this reason, we might stand to benefit a great deal from less parochialism and more friends. 

Paul argues that 

...companies using alignment to help train extremely conservative and inoffensive systems could lead to backlash against the idea of AI alignment itself.

But I think it is empirically, overwhelmingly clear that a much bigger concern when it comes to "backlash against the idea of AI alignment itself" comes from failures of the AIS&L community to engage with more neartermist work. 

Thanks for reading--constructive feedback is welcome. 

28

0
0

Reactions

0
0

More posts like this

Comments9


Sorted by Click to highlight new comments since:

In what world could isolation and mutual dismissal between AIS&L people and people working on neartermist problems be helpful.

If the people you ultimately want to influence are the technophiles who are building AI, who regard most near-term 'AI safety' people as annoying scolds and culture warriors, it could be good to clearly differentiate yourself from them. If existential safety people get a reputation as reliable collaborators, employees and allies who don't support the bad behaviour of many AI bias people this could put us in a good position. 

I think I disagree with the general direction of this comment but it’s hard to state why, so I’ll just outline an alternative view:

  • Many people are building cutting-edge AI. Many of them are sympathetic to at least some safety and ethics concerns, and some are not that sympathetic to any safety or ethics concerns
  • Of course it is good to have a reputation as a good collaborator and employee. It seems only instrumentally valuable to be an “ally” to the cutting edge research, and at some point you have to be honest and tell those building AI that what they’re doing is interesting but has risks in addition to potential upsides
  • Part of building a good reputation in the field involves honestly assessing others’ work. If you agree with work from AI safety or AI ethics or AI bias people, you should just agree with them. If you disagree with their work, you should just disagree with them. “Distancing” and “aligning” yourself with certain camps is the kind of strategic move that people in research labs often view as vaguely dishonest or overly political

Part of building a good reputation in the field involves honestly assessing others’ work. If you agree with work from AI safety or AI ethics or AI bias people, you should just agree with them. If you disagree with their work, you should just disagree with them. 

Yes, I agree with this. I think in general there is a fair bit of social pressure to give credence to intellectually weak concerns about 'AI bias' etc., which is part of what technophiles dislike, even if they can't say so publicly. Pace your first sentence, I think that self-censorship is helpful for building reputation in some fields. As such, I expect honestly reporting an epistemically rigourous evaluation of these arguments will often suffice to cause 'isolation and mutual dismissal' from Gebru-types, even while it is positive for your reputation among 'builder' capabilities researchers.  

Note that in general existential safety people have put a fair bit of effort into trying to cultivate good relations with near-term AI safety people. The lowest hanging fruit implied by the argument above is to simply pull back on these activities. 

Non X-risks from AI are still intrinsically important AI safety issues

  

Sure but I think they are less intrinically important for the standard ITN reasons.

I think that your statement implies that we should care about them a similar amount to longtermist motivated safety which might be true but you don't make a case for why we should care. I don't think the reasons for prioritising  LT AIS are strongly correlated with the reasons for prioritising NT AIS so it would be somewhat surprising if this were true.

As someone who is a deep learning researcher and came to believe in the importance of AI safety through EA, I would like to say I strongly agree with the last point on making allies and growing the AI safety field. I support the claim that some people feel more hesitant to be involved in AI safety or just give up as there is a somewhat cliquey and dismissive feeling from the community and the community sometimes feels quite fragmented on arguments for and against what's useful. To me, this feels a bit counterproductive and alienating. 

I hypothesize that frowning on, or even just the large focus on questioning the usefulness of near-term safety work adds to the  deterrence of other current deep learning researchers and maybe other communities too engaging with AI safety. Less parochialism and more friends seem like a sensible approach and a more productive community. 

One thing I think is interesting is how similar some of the work is from bay area AI safety folks and other safety crowds, like the area often referred to as "AI ethics." For example, Redwood worked on a paper about safe language generation, focusing on descriptions of physical harm, and safe language generation is a long-running academic research area (including for physical harm! see https://arxiv.org/pdf/2210.10045.pdf). The deepest motivating factors behind the work may differ, but this is one reason I think there is a lot of common ground across safety research areas. 

[anonymous]2
0
0

+1 I think it's very worthwhile to emphasize neartermist reasons to care about work that may be primarily longtermism-oriented. 

Thanks for exploring this issue! I agree that there could be more understanding between AI safety & the wider AI community, and I'm curious to do more thinking about this.

I think each of the 3 claims you make in the body of the text are broadly true. However I don't think they directly back up the claim in the title that "AI safety is not separate from near-term applications".

I think there are some important ways that AI safety is distinct; it goes 1 step further by imagining the capabilities of future systems, and trying to anticipate ways they could go wrong ahead of time. I think there are some research questions it'd be hard to work on if the AI safety field wasn't separate from current-day application research. E.g. agent foundations, inner misalignment and detecting deception.

I think I agree with much of your sentiment still. To illustrate what I mean, I would like it to be true that:

  1. Important AI current-day-application safety issues are worked on by many people, and there is mutual respect between our communities
  2. Work done by near-term application researchers is known about and leverageable by the AGI safety community
  3. Ultimately, there is still a distinct, accessible AGI safety community that works on issues distinct to advanced, general AI systems
[anonymous]3
2
0

No disagreements here. I guess I imagine AIS&L work along with work on the neartermist examples I mentioned as a venn diagram with healthy overlap. I'm glad for the AIS&L community, and I think it tackles some truly unique problems. By "separate" I essentially meant "disjoint" in the title. 

Curated and popular this week
 ·  · 47m read
 · 
Thank you to Arepo and Eli Lifland for looking over this article for errors.  I am sorry that this article is so long. Every time I thought I was done with it I ran into more issues with the model, and I wanted to be as thorough as I could. I’m not going to blame anyone for skimming parts of this article.  Note that the majority of this article was written before Eli’s updated model was released (the site was updated june 8th). His new model improves on some of my objections, but the majority still stand.   Introduction: AI 2027 is an article written by the “AI futures team”. The primary piece is a short story penned by Scott Alexander, depicting a month by month scenario of a near-future where AI becomes superintelligent in 2027,proceeding to automate the entire economy in only a year or two and then either kills us all or does not kill us all, depending on government policies.  What makes AI 2027 different from other similar short stories is that it is presented as a forecast based on rigorous modelling and data analysis from forecasting experts. It is accompanied by five appendices of “detailed research supporting these predictions” and a codebase for simulations. They state that “hundreds” of people reviewed the text, including AI expert Yoshua Bengio, although some of these reviewers only saw bits of it. The scenario in the short story is not the median forecast for any AI futures author, and none of the AI2027 authors actually believe that 2027 is the median year for a singularity to happen. But the argument they make is that 2027 is a plausible year, and they back it up with images of sophisticated looking modelling like the following: This combination of compelling short story and seemingly-rigorous research may have been the secret sauce that let the article to go viral and be treated as a serious project:To quote the authors themselves: It’s been a crazy few weeks here at the AI Futures Project. Almost a million people visited our webpage; 166,00
 ·  · 8m read
 · 
Note: This post was crossposted from the Open Philanthropy Farm Animal Welfare Research Newsletter by the Forum team, with the author's permission. The author may not see or respond to comments on this post. ---------------------------------------- > Despite setbacks, battery cages are on the retreat My colleague Emma Buckland contributed (excellent) research to this piece. All opinions and errors are mine alone. It’s deadline time. Over the last decade, many of the world’s largest food companies — from McDonald’s to Walmart — pledged to stop sourcing eggs from caged hens in at least their biggest markets. All in, over 2,700 companies globally have now pledged to go cage-free. Good things take time, and companies insisted they needed a lot of it to transition their egg supply chains — most set 2025 deadlines to do so. Over the years, companies reassured anxious advocates that their transitions were on track. But now, with just seven months left, it turns out that many are not. Walmart backtracked first, blaming both its customers and suppliers, who “have not kept pace with our aspiration to transition to a full cage-free egg supply chain.” Kroger soon followed suit. Others, like Target, waited until the last minute, when they could blame bird flu and high egg prices for their backtracks. Then there are those who have just gone quiet. Some, like Subway and Best Western, still insist they’ll be 100% cage-free by year’s end, but haven’t shared updates on their progress in years. Others, like Albertsons and Marriott, are sharing their progress, but have quietly removed their pledges to reach 100% cage-free. Opportunistic politicians are now getting in on the act. Nevada’s Republican governor recently delayed his state’s impending ban on caged eggs by 120 days. Arizona’s Democratic governor then did one better by delaying her state’s ban by seven years. US Secretary of Agriculture Brooke Rollins is trying to outdo them all by pushing Congress to wipe out all stat
 ·  · 13m read
 · 
  There is dispute among EAs--and the general public more broadly--about whether morality is objective.  So I thought I'd kick off a debate about this, and try to draw more people into reading and posting on the forum!  Here is my opening volley in the debate, and I encourage others to respond.   Unlike a lot of effective altruists and people in my segment of the internet, I am a moral realist.  I think morality is objective.  I thought I'd set out to defend this view.   Let’s first define moral realism. It’s the idea that there are some stance independent moral truths. Something is stance independent if it doesn’t depend on what anyone thinks or feels about it. So, for instance, that I have arms is stance independently true—it doesn’t depend on what anyone thinks about it. That ice cream is tasty is stance dependently true; it might be tasty to me but not to you, and a person who thinks it’s not tasty isn’t making an error. So, in short, moral realism is the idea that there are things that you should or shouldn’t do and that this fact doesn’t depend on what anyone thinks about them. So, for instance, suppose you take a baby and hit it with great force with a hammer. Moral realism says: 1. You’re doing something wrong. 2. That fact doesn’t depend on anyone’s beliefs about it. You approving of it, or the person appraising the situation approving of it, or society approving of it doesn’t determine its wrongness (of course, it might be that what makes its wrong is its effects on the baby, resulting in the baby not approving of it, but that’s different from someone’s higher-level beliefs about the act. It’s an objective fact that a particular person won a high-school debate round, even though that depended on what the judges thought). Moral realism says that some moral statements are true and this doesn’t depend on what people think about it. Now, there are only three possible ways any particular moral statement can fail to be stance independently true: 1. It’s