I joined the psychology department at UCLA as an Assistant Professor in July 2023. Prior to that, I was an independent research group leader at the MPI for Intelligent Systems in Tübingen. I completed my Ph.D. in the Computational Cognitive Science Lab at UC Berkeley in 2013, obtained a master’s degree in Neural Systems and Computation from ETH Zurich, and completed two simultaneous bachelor's degrees in Cognitive Science and Mathematics/Computer Science at the University of Osnabrück.
I also find it problematic that they end the paragraph with "QED." "QED" is a technical term used to indicate that a mathematical theorem has been proven. The quoted verbal argument clearly does not meet the rigorous standards of mathematical proof. This looks like an attempt to exploit superficial, intuitive heuristics to persuade readers to believe the conclusion with a level of confidence that is unwarranted by the information in the quoted paragraph.
Contrary to their claim that "it would have been very hard to predict that humans would like ice cream, sucralose, or sex with contraception," I think it was predictable that these preferences would likely result from natural selection under constraints. In each of these examples, a mechanism that evolved to detect the achievement of an instrumentally important subgoal is triggered by a stimulus that is i) very similar to the stimuli an animal would experience when the subgoal is achieved, ii) did not exist in the evolutionary environment. We should expect any (partially or fully) optimized bounded agent to have detectors for the achievement of instrumentally important subgoals. We should expect these detectors to only analyze a limited number of features with limited precision. And we should expect the limited number of comparisons they perform precisely to be optimized for distinctions that were important for success on the training data.
Given that these failures were predictable, it should be possible to systematically predict many analogous failures that might result from training AI systems on specific data sets or (simulated) environments. If we can predict such failures of generalization beyond the training data, then we might be able to either prevent them, mitigate them, or regulate real-world applications so that AI systems won't be applied to inputs where misclassification is likely and problematic. The latter approach is analogous to outlawing highly addictive drugs that mimic neurotransmitters signalling the achievement of instrumentally important subgoals.
Morality is Objective
I believe that the purpose of morality is to promote everyone's well-being. We can use the scientific method to determine how each action, rule, trait, and motive affects overall well-being. Science is objective. Therefore, it is possible to make objective statements about the morality of actions, motives, and traits.
Leverage Research, including partial takeover of CEAI am very shocked. What exactly happened? How could this happen? How could the CEA possibly let itself be infiltrated by a cult striving to take over the world? And how could an organization founded by academics fail to scrutinize Leverage's pseudo-scientific and manipulative use of concepts and techniques related to psychotherapy and rationality? Did CEA ever consult an independent psychological scientist or psychotherapy researcher to assess the ethicality of what Leverage was doing, the accuracy of their claims, or the quality of their "research"? Didn't it raise any red flags that the people inventing new methods of "psychotherapy" had no training in clinical psychology?
Thank you for your feedback, Stan!
I think the appropriateness of E[CE] as a prioritization criterion depends on the nature of the decision problem.
I think the expected value of the cost-effectiveness ratio is the appropriate prioritization criterion for the following scenario: i) a decision-maker is considering which organization should receive a given fixed amount of money (m), and ii) each organization (i) turns every dollar it receives into some uncertain amount of value (CE_i). In that case, the expected utility of giving the money to organization i is E[U_i]= m*E[CE_i]. Therefore, the way to maximize expected utility is to give the money to the organization with the highest expected cost-effectiveness. In this scenario, the consequences of contributing $1 to a project with an expected cost-effectiveness of 1 WELLBY/$ are almost identical in both scenarios. Most of the expected utility comes from the possibility that the project might be highly cost-effective. If the project is not highly cost-effective, then the $1 contribution accomplishes very little, regardless of whether the project costs $10,000, $100,000, or $1,000,000.
In my view, your example illustrates that the expected cost-effectiveness ratio is an inappropriate prioritization criterion if the funder has to decide whether to pay 100% of the project's costs without knowing how much that will be. In that scenario, I think the appropriate prioritization criterion would be E[B]-E[CE_alt]*E[C], where E[CE_alt] is the expected cost-effectiveness of the most promising project that the funder could fund instead.
I think the second decision problem describes the situation of a researcher or funder who is committed to seeing their project through until the end. By contrast, the first decision problem corresponds to a researcher/funder intending to allocate a fixed amount of time/money to one project or another (e.g., 3 years of personal time or 1 million dollars) and then move on to another project after that.
I didn't realize the quoted text was a paraphrase rather than an exact quote. I only commented on the paraphrase, not on the book itself. I apologize for the oversight.