Hello everyone, 

 

I've finished Superintelligence (amazing!) and Our Last Invention (also great), and I was wondering what else I can read regarding AI safety? Also, what should I be doing to increase my chances of being able to work on this as a career? I'm in college now, majoring in math. I'm guessing I should be learning computer science/programming as well.

 

 

2

0
0

Reactions

0
0
Comments8


Sorted by Click to highlight new comments since:

Well, you might be getting toward the frontiers of where published AI Safety-focused books can take you. From here, you might want to look to AI Safety agendas and specific papers, and AI textbooks.

MIRI has a couple of technical agendas for more foundational and more machine learning-based research on AI Safety. Dario Amodei of OpenAI, and some other researchers also put out a machine learning-focused agenda. These agendas cite and are cited by a bunch of useful work. There's also great unpublished work by Paul Christiano on AI Control.

In order to understand and contribute to current research, you will also want to do some background reading. Jan Leike (now of Deepmind) has put out a good syllabus of relevant reading materials through 80,000 Hours that includes some good suggestions. Personally, for a math student like yourself wanting to start out with theoretical computer science, George Boolos' book Computability and Logic might be useful. Learning Python and Tensorflow is also great in general.

To increase your chance of working on this career, you might want to look toward the entry requirements for some specific grad schools. You might also want to go for some internships at these groups (or at other groups that do similar work).

In academia some labs analyzing safety problems are:

  • UC Berkeley (especially Russell)
  • Cambridge (Adrian Weller)
  • ANU (Hutter)
  • Montreal Institute for Learning Algorithms
  • Oxford
  • Louisville (Yampolskiy)

In industry, Deepmind and OpenAI both have safety-focused teams.

Working on grad school or internships in any of these places (notwithstanding that you won't necessarily end up in a safety-focused team) would be a sweet step toward working on AI Safety as a career.

Feel free to reach out by email at (my first name) at intelligence.org with further questions, or for more personalized suggestions. (And the same offer goes to similarly interested readers)

I like Ryan's suggestions. (I also work at MIRI.) As it happens, we also released a good intro talk by Eliezer last night that talks more about 'what does alignment research look like?': link.

Any details on safety work in Montreal and Oxford (other than FHI I assume)? I might use that for an application there.

At Montreal, all I know is that the PhD student, David Krueger, is currently in discussions about what work could be done. At Oxford, I have in mind the work of folks at FHI like Owain Evans and Stuart Armstrong.

If you are interested in applying to Oxford, but not FHI, then Michael Osbourne is very sympathetic to AI safety, but doesn't currently work on it. He might be worth chatting to. Also, Shimon Whiteson does lots of relevant seeming work in the area of deep RL, but I don't know if he is at all sympathetic.

EDIT: I forgot to link to the Google group: https://groups.google.com/forum/#!forum/david-kruegers-80k-people

Hi! David Krueger (from Montreal and 80k) here. The advice others have given so far is pretty good.

My #1 piece of advise is: start doing research ASAP!
Start acting like a grad student while you are still an undergrad. This is almost a requirement to get into a top program afterwards. Find a supervisor and ideally try to publish a paper at a good venue before you graduate.

Stats is probably a bit more relevant than CS, but some of both is good. I definitely recommend learning (some) programming. In particular, focus on machine learning (esp. Deep Learning and Reinforcement Learning). Do projects, build a portfolio, and solicit feedback.

If you haven't already, please check out these groups I created for people wanting to get into AI Safety. There are a lot of resources to get you started in the Google Group, and I will be adding more in the near future. You can also contact me directly (see https://mila.umontreal.ca/en/person/david-scott-krueger/ for contact info) and we can chat.

Also, what should I be doing to increase my chances of being able to work on this as a career?

A PhD in computer science would probably be ideal, though lengthy. Research oriented master's degrees, and programs in neuroscience and math can be very good as well.

There's another reading list here: http://humancompatible.ai/bibliography but like the 80k one, I think the philosophy section should be modified with Reasons and Persons (Parfit), either Ethical Theory: An Anthology or Conduct and Character: Readings in Moral Theory, and articles on plato.stanford.edu as the most important readings.

Further study is best option, you can also look online for suggestions.

Curated and popular this week
 ·  · 13m read
 · 
Notes  The following text explores, in a speculative manner, the evolutionary question: Did high-intensity affective states, specifically Pain, emerge early in evolutionary history, or did they develop gradually over time? Note: We are not neuroscientists; our work draws on our evolutionary biology background and our efforts to develop welfare metrics that accurately reflect reality and effectively reduce suffering. We hope these ideas may interest researchers in neuroscience, comparative cognition, and animal welfare science. This discussion is part of a broader manuscript in progress, focusing on interspecific comparisons of affective capacities—a critical question for advancing animal welfare science and estimating the Welfare Footprint of animal-sourced products.     Key points  Ultimate question: Do primitive sentient organisms experience extreme pain intensities, or fine-grained pain intensity discrimination, or both? Scientific framing: Pain functions as a biological signalling system that guides behavior by encoding motivational importance. The evolution of Pain signalling —its intensity range and resolution (i.e., the granularity with which differences in Pain intensity can be perceived)— can be viewed as an optimization problem, where neural architectures must balance computational efficiency, survival-driven signal prioritization, and adaptive flexibility. Mathematical clarification: Resolution is a fundamental requirement for encoding and processing information. Pain varies not only in overall intensity but also in granularity—how finely intensity levels can be distinguished.  Hypothetical Evolutionary Pathways: by analysing affective intensity (low, high) and resolution (low, high) as independent dimensions, we describe four illustrative evolutionary scenarios that provide a structured framework to examine whether primitive sentient organisms can experience Pain of high intensity, nuanced affective intensities, both, or neither.     Introdu
 ·  · 2m read
 · 
A while back (as I've just been reminded by a discussion on another thread), David Thorstad wrote a bunch of posts critiquing the idea that small reductions in extinction risk have very high value, because the expected number of people who will exist in the future is very high: https://reflectivealtruism.com/category/my-papers/mistakes-in-moral-mathematics/. The arguments are quite complicated, but the basic points are that the expected number of people in the future is much lower than longtermists estimate because: -Longtermists tend to neglect the fact that even if your intervention blocks one extinction risk, there are others it might fail to block; surviving for billions  (or more) of years likely  requires driving extinction risk very low for a long period of time, and if we are not likely to survive that long, even conditional on longtermist interventions against one extinction risk succeeding, the value of preventing extinction (conditional on more happy people being valuable) is much lower.  -Longtermists tend to assume that in the future population will be roughly as large as the available resources can support. But ever since the industrial revolution, as countries get richer, their fertility rate falls and falls until it is below replacement. So we can't just assume future population sizes will be near the limits of what the available resources will support. Thorstad goes on to argue that this weakens the case for longtermism generally, not just the value of extinction risk reductions, since the case for longtermism is that future expected population  is many times the current population, or at least could be given plausible levels of longtermist extinction risk reduction effort. He also notes that if he can find multiple common mistakes in longtermist estimates of expected future population, we should expect that those estimates might be off in other ways. (At this point I would note that they could also be missing factors that bias their estimates of
 ·  · 7m read
 · 
The company released a model it classified as risky — without meeting requirements it previously promised This is the full text of a post first published on Obsolete, a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race to Build Machine Superintelligence. Consider subscribing to stay up to date with my work. After publication, this article was updated to include an additional response from Anthropic and to clarify that while the company's version history webpage doesn't explicitly highlight changes to the original ASL-4 commitment, discussion of these changes can be found in a redline PDF linked on that page. Anthropic just released Claude 4 Opus, its most capable AI model to date. But in doing so, the company may have abandoned one of its earliest promises. In September 2023, Anthropic published its Responsible Scaling Policy (RSP), a first-of-its-kind safety framework that promises to gate increasingly capable AI systems behind increasingly robust safeguards. Other leading AI companies followed suit, releasing their own versions of RSPs. The US lacks binding regulations on frontier AI systems, and these plans remain voluntary. The core idea behind the RSP and similar frameworks is to assess AI models for dangerous capabilities, like being able to self-replicate in the wild or help novices make bioweapons. The results of these evaluations determine the risk level of the model. If the model is found to be too risky, the company commits to not releasing it until sufficient mitigation measures are in place. Earlier today, TIME published then temporarily removed an article revealing that the yet-to-be announced Claude 4 Opus is the first Anthropic model to trigger the company's AI Safety Level 3 (ASL-3) protections, after safety evaluators found it may be able to assist novices in building bioweapons. (The