Hide table of contents

In my more contemplative moments, when I really ask myself how I should act or what world I want to live in, I find it almost impossible. Not only do I have to make descriptive predictions about complex non-linear systems, I have to make normative judgements about which of those outcomes to prefer.  The more I think about it, the less I’m sure that any definite conclusions can be drawn. 

At the same time, it still seems possible to make progress. Utilitarians, for example, were among the first in the west to argue for gender equality, decriminalization of homosexuality, and animal rights. I’m much more sure than not that these were good developments for the world. This seems to point to utilitarianism being a useful tool in at least some contexts. 

In the area of prediction, we are also starting to make progress. Research on forecasting has found ways to hold experts accountable for their predictions, and allowed us to identify superforecasters who can make much better predictions than the average expert.

Given this progress, it might be tempting to try to find the normative equivalent of forecasting. But while forecasts can be judged against the actual outcome they’ve predicted, there is no observable “ground truth” to normative questions. It’s intuitions all the way down.  

This leaves us in a situation somewhat analogous to what Ought has described as process rather than outcome.  This post focuses specifically on the process of reaching what Rawls has called a reflective equilibrium. While I don’t think it’s actually possible to reach such an equilibrium, I do believe it is at least possible to get closer

This blog post is part survey paper, part research plan, and part request for feedback. I’m by no means an expert in any of these things (my background is in computer science), and I’ll resort to examples more than formal arguments to try to get my point across. I’m not sure exactly what here is original as both Ought and Irving & Askell  seem to be thinking along fairly similar lines, but I haven’t seen it written up in this way and hope this post can serve as the start to a discussion. And while my main focus will be on moral theories, anything I say should apply to all forms of normative theories and so I tend to use the words interchangeably here. 


The rest of this post will focus on the following:  

  1. Why reflective equilibrium is hard to reach
  2. How pragmatism can help us
  3. Attempts to formalize these ideas


Reflective Equilibrium is Hard to Reach

To describe the methodology of Reflective Equilibrium, I’ll borrow the description from utilitarianism.net:

[Reflective Equilibrium] involves balancing two broad kinds of evidence as applied to moral theories:

  1. Intuitions about specific cases (thought experiments).
  2. General theoretical considerations, including the plausibility of the theory's principles or systematic claims about what matters.

General principles can be challenged by coming up with putative counterexamples, or cases in which they give an intuitively incorrect verdict. In response to such putative counterexamples, we must weigh the force of the case-based intuition against the inherent plausibility of the principle being challenged. This could lead you to either revise the principle to accommodate your intuitions about cases or to reconsider your verdict about the specific case, if you judge the general principle to be better supported.


I think this process is a good one, but I believe the gap between intuitions and pure theory is large. As large as, say, the gap between quantum physics and aerospace engineering. Although I value thought experiments as a useful philosophical tool, real intuitions come from real experience in the messy, complicated world.

Take, for example, Robert Nozick’s famous Wilt Chamberlain argument, which I am simplifying slightly for brevity. Nozick postulates a hypothetical society in which everyone has equal access to resources. In this society, people willingly pay to attend a basketball game in which Wilt Chamberlain is playing. This results in Chamberlain getting more money than any of the spectators, moving us from a society of equal distribution to one of unequal distribution. 

Nozick argues that because the starting society was equal and people willingly chose the method of distribution, then the resulting society must be a just one. And within the scope of this thought experiment, I agree with him. But there are a few problems with it. For one, Libertarians will often conveniently ignore the egalitarian starting point in the argument and use this example to argue that we should allow any contract that is freely entered in our current society. More important to the context of this post, though, this “free exchange” could lead to outcomes no one would agree with. If a world of free exchange led to a single company having a complete monopoly on all our resources, would we really care about the Wilt Chamberlain example? 

My point is not that it is wrong to willingly give Wilt Chamberlain money, or that there are not many related cases where free exchange of property is okay. My point is that we should be able to note contradictory pieces of normative intuitions without having to immediately generalize them to first principles.  A thought experiment is a data point, and we need lots of data points in order to build good theories. 

When I interact with my friends, I don’t rely on psychology research or expected-value calculations to guide my actions. I think about their needs and balance them with my own so we can continue to have a good friendship. This doesn’t contradict utilitarianism. Given the complex social machinery in my brain, the holistic approach to human interactions seems more likely to “maximize utility” for the time I spend with my friends. It’s the right tool for the job. I can then step back and ask, in a utilitarian (and hopefully guilt-free) way, whether I’m spending too much time hanging out. I believe moving between these different tools is better than applying a single tool at all levels.


Pragmatic, Modular, and Interdisciplinary  

This idea of using “the right tool for the job” is not new to philosophy. It is common to the school of pragmatism, which wikipedia defines as:

[The] philosophical tradition that considers words and thought as tools and instruments for prediction, problem solving, and action, and rejects the idea that the function of thought is to describe, represent, or mirror reality.

That last part might raise some eyebrows, but I think going into the subtleties of this definition of truth might be outside the scope of this post. For now, you’ll have to trust me that this is not some hand-wavey “truth is meaningless” type of philosophy. It is a serious analytic school that is practiced by logicians and philosophers of science from Quine to Putnam to Peirce. 

One pragmatist, Elizabeth Anderson, has built her approach to ethics based on analogy to the philosophy of science: 

Philosophers of science don’t think we can come up with principles of science outside of actual empirical investigation  … So why should we think we can come up with actual principles of ethics without looking at the actual problems that people confront in their actual lives, and how that changes historically as we have to meet new challenges?

Anderson is maybe most famous for her theory on equality, which focuses on the relationships between people in a society rather than to the distribution of resources or application of specific deontological rules. This definition is difficult to link up to any more theoretical moral theory, but that does not mean it contradicts them. For example, If a society that adopted her definition of equality was found to have an overall happier population, we would probably consider it a good practice of utilitarianism. 

We can keep Anderson's analogy to science going. Before Newton, physicists were able to find mathematical laws about pendulums and planetary orbits. These laws are not “hard-coded” into the universe, they are intermediate instances of the more fundamental law of gravity, which is in turn an instance of even more fundamental laws. In a similar way, we can look to sociologists and on-the-ground philosophers to try to find intermediate moral theories that we can then test against more abstract foundational ones.  This is an technique that has been advocated elsewhere, including research on technology studies and economic analysis.

A key difference between the physics analogy and the normative questions we face is that physics can make explicit predictions. If our model of pendulums is wrong, it will yield bad predictions, but what does it mean for a normative theory to be wrong? 

This is where another pragmatist, Susan Haack can be helpful. Haack has made the analogy between knowledge and a crossword puzzle, where:

 Finding an answer using a clue is analogous to a foundational source (grounded in empirical evidence). Making sure that the interlocking words are mutually sensible is analogous to justification through coherence. 


By looking at different types of real world ethical “data”, such as people’s intuitions, we can come up with intermediate theories. If these theories “cohere” meaning they point towards the same more formal theory (say, utilitarianism) that is good evidence for the validity of that formal theory,  If they disagree, that might be evidence to study the intermediate theories in more detail.  By moving back and forth from normative intuitions to intermediate theories to more formal theories, we can come up with better and better approximations of normative theories. 

A fitting analogy might be found in Derek Parfit’s last book, “On What Matters”, which attempted to show that the major theories of utilitarianism, contractarianism, and deontology all converge rather than disagree. In his words, each of these theories were climbing the same mountain from different sides.


Having sketched out my views on the role of pragmatism in ethics, I’ll conclude this post with a discussion about how to use more formal methods to incorporate these views. 


The Path to Formalization

There are many ways one could use the pragmatism described above, and not all of them should be mathematical. At the same time, Bayesian Reasoning and techniques from forecasting have done a lot to help clarify our own thinking. Finding a formalism for measuring coherence and "back and forth" between normative theories could help us find our own biases and clarify what is important to us.   

The tool I will describe here, I have tentatively called normative maps. The goal of these maps is not to have something to point to and say “See? We found the right answer using math”. I expect these maps to help us update more on orders of magnitude rather than fine-grained levels of certainty.  I hope that anyone using them will be starting from a point of deep humility about the scope and complexity of the problems we are facing.  Irving & Askell, in their paper on using Social Science in AI Safety, made a relevant analogy to Condorcet’s Jury Theorem, which states that by taking the majority vote of independent people slightly likely to be correct we get a judgment that is very likely to be correct, but the majority vote of people who are slightly likely to be wrong will be very likely to be wrong. In the same way, the formalism I will describe here could get a diverse group of thoughtful people to much better judgements, but could also help people find post-hoc justifications for their own biases.  

Here, I’ll sketch out a few steps in a project that I believe could do this. I will order them from most formal to most practical. 


(1) Updating Credences in Normative Theories:

The book Moral Uncertainty by Bykvist, Ord, and MacAskill discusses methods of making decisions under uncertainty about the “truth” of different moral theories. These decisions are mostly built around the idea of maximal expected choiceworthiness, which is an extension of maximizing expected utility to moral domains.  This book assumes that credences in different moral theories are given and shows how to go from those credences to individual decisions. 

I consider this to be very important work, but I've seen much less work on how these credences should be formed. By attempting to formalize this process, we can allow philosophers to check their own biases against the real world “data” they are getting.  I believe this will involve a process very similar to bayesian updating. Unlike the moral uncertainty work, determining credences in theories will rely on credences in intuitions and the choiceworthiness of actions. 

The “right” formalization here will likely be very difficult to find, but there are a few things we could expect it to have:

  • If a moral theory leads to a very unintuitive consequence, I should be both less sure of the moral theory and slightly more sure of the consequence.
  • All other things being equal, there should be higher priors in easier-to-state theories (this is essentially, Occam’s Razor).
  • As more theories/evidence point towards the same consequence, credence in the consequence should increase. The less correlated these theories/evidence are with each other, the more the consequence should increase.

There would also need to be research into how to carry over credences when splitting up theories into subtheories (If we're 80% certain in some form of utilitarianism, how certain are we in preference utilitarianism? How should the arguments for utilitarianism map over?). In a sense every moral question we have, down to intuitions in a specific situation or thought-experiment are just subtheories of some other theory. 

(2) Network Formalism:

Given a way of updating credences, we’d then need to find a formalism that allows us to map the large space of intuitions and evidence we might encounter. I expect this will look a lot like Bayesian Networks and would have the following components: 

  • Nodes: This will be anything you can have a credence in such as: people, theories, statements, intuitions, actions. They will have both a natural-language description and a machine-checkable logical statement. The underlying semantics of the statement will likely be a set of preferences or a function onto real numbers, similar to the semantics of theories in Moral Uncertainty.  
  • Edges: The influence certain nodes have on other nodes, likely represented as a number. This might incorporate ideas of “applicability/closeness” between concepts. For example, I trust Peter Singer’s opinions on animal welfare more than I trust his opinions on disability issues, so an edge from Peter Singer to a policy he recommends would be larger if it is an animal rights policy.
  • Credences: A number between 0 and 1 indicating certainty in the truth of a node. It might be useful to incorporate more complicated probabilities such as those in Dempster-Shafer Theory or Probability Kinematics. Credences could come from a wide range of sources from personal opinions to peer-reviewed research.

There are also some components which are more difficult to define, and might be approximable using the above three components:

  • Measure of cost: How much money or time or people would it take to accomplish something? How likely are our credences/nodes/edges to change given an input of resources? This latter question relates to the Moral Information chapter of Moral Uncertainty.
  • Continuous/Large Variables: Sometimes the space of nodes or edges might have many values or be continuous. For example, a node for utilitarianism might not have a single utility function. It might have a continuous (or very large) space of different utility functions with different credences. Representing these maps might use techniques similar to those of symbolic model checking.


(3) Software:

Given this formalism, we’d then want to build a software system that allows us to map out our current understanding of a moral situation. This software will likely be built off of one of the many mind-mapping projects out there. 

One of the main features of this software will be the ability to “check” our maps for potential conflicts and biases. For example, if I state that I am a utilitarian, but I am giving more resources to something with less expected utility, it will bring it to my attention. It might be that I am falling prey to a classical cognitive bias. Or maybe it is a bias I am alright with, such as spending time helping a friend or family member. The important thing is not that all biases are wrong, it’s that bringing them to our attention will allow us to reflect on them and make better decisions. 

This software will also allow us to identify “bottlenecks”, where our reasoning is dependent on very few sources. For example, if we cite several people all with the same opinion, that is good evidence for that opinion. If their opinion is all based on the same study or example, however, we’d be less sure. The software could point this out to us and allow us to judge how to better change our opinions. For this reason, it is likely that users will need to chose many of their labels from database of different concepts such as “Peter Singer” or “Animal Liberation”.  The field of “social epistemology” will likely have many more examples of things to look out for. 

(4) Collection of Data and Reflection:

Finally, it will be important to use the system to collect data and user feedback. This data could allow us to find more cognitive biases or better priors for different theories. It’s likely we won’t be able to figure out what is interesting about the data until we see it. We can then take what we find and go back and alter anything as we see fit.


Potential Problems & Applications 

I’ll conclude this post with a few last thoughts on potential problems and applications with the work. 

Commitment and Accountability

Philip Tetlock's on Forecasting work is able to lessen the normal biases of human reasoning largely by keeping forecasters accountable.   Having forecasters commit to specific numerical credences to their predictions keeps them from using vague language in their predictions and claiming after-the-fact that they were right all along. 

When filling in credences to these normative maps, we may be biased by knowing the credences we choose will influence us towards outcomes we don’t want. There should be some form of "commitment" action where we lock in our own credences and map structure, before testing them in some way (say, by asking a friend for their credences). The psychological research in dishonesty can likely be useful here in keeping us from lying to ourselves. 

The Role of Trust

I mentioned that the formalism I’m considering involves rating credences in different people. This might be considered a level of “trust” in that person. For someone like Peter Singer, this might not be a problem; he’s consciously chosen to be a public figure. But rating our level of trust in people we know might be psychologically damaging, especially if a person should find themselves given low-trust ratings by everyone in their community. For now, I would just say make these ratings private, but it will definitely be worth thinking about this in more detail. 

Rating trust is also important because it will force us to justify who we do and don’t trust. Researchers in Epistemic Injustice have identified many ways in which we are biased in who we do and don’t trust. At the same time, some imbalance in trust is valuable in some situations; I trust a doctor more than the average person to tell me which vaccines I should take. By explicitly writing down these levels, we can be forced to justify them or change them.  Checking our credences could involve, for example, performing a minimal trust investigation as outlined by Karnofsky.

Over-Reliance on Certain Maps

If a particular map a person produces becomes very popular, it could become much more influential than is justified. This could be a worse outcome than if the map didn’t exist at all, since people who would otherwise trust their own opinion would instead trust the map.


Research on Social Choice Theory usually starts from a set of intuitive axioms and tries to find voting systems that satisfies these axioms. These axioms are often created by economics research without much social science to back them up. Using these maps, researchers can map out their axioms and try to test them against different sociological studies and qualitative judgements, leading to better axioms. 

Research in Deliberative Democracy, rather than merely polling voters, has them deliberate together in order to try to come to a consensus on a particular political question. These maps could serve as a way of aggregating the opinions of the participants during this process. 

AI Safety

This work could potentially have applications in AI safety. If these maps were to become more machine-understandable (likely through work in language processing), it's possible they could be given to an AI to bound the range of acceptable behaviors. 

What's more likely, is that they could serve a role in solving the Pointer's Problem by helping humans find latent variables in their value system. For example, if people value security, they could create a map that specifies that as few people as possible should be killed.  If they tried to incorporate policy that implemented complete government surveillance, the map might predict the user would approve. If the user did not, this could force them to incorporate a new value into the map, such as 'privacy' or 'freedom'. Or they could disagree without being able to articulate why.  Either case could help in the discovery of human values. And although this is already being done by other researchers, the method described here could help to find more fine-grained values that would not appear in larger data sets, or to find difficult-to-articulate values that might not appear in language models. 

Finally, these maps could help users delegate the creation of utility functions as described in the Indirect Normativity section of this post on Eliciting Latent Knowledge. The authors mention the possibility of delegating utility functions to 'future selves' or other people. The measure of 'trust' and 'applicability' as described in this post could help users determine who to delegate what tasks to. 



Thank you for reading. As I mentioned above, this is all in a pretty early stage and I’d appreciate any feedback. I'm in the process of looking for a place to continue this work and for potential collaborators or funding. 






More posts like this

Sorted by Click to highlight new comments since:

I think the intersection with recommender algorithms - both in terms of making them, and in terms of efforts to empower people in the face of them - is interesting.

Suppose you have an interface that interacts with a human user by recommending actions (often with a moral component) in reaction to prompting (voice input seems emotionally powerful here), and that builds up a model of the user over time (or even by collecting data about the user much like every other app). How do you build this to empower the user rather than just reinforcing their most predictable tendencies? How to avoid top-down bias pushed onto the user by the company / org making the app?

Curated and popular this week
Relevant opportunities