The rule of law is one of our most important political inventions. Written rules help address problems with biased, corrupt or otherwise untrustworthy people in power. Who shouldn’t be trusted with arbitrary power? Everyone. We’re all fallible. We all have biases. We all make mistakes. If you’re in a position of authority over other people, you should not trust yourself. You should want written rules to constrain and guide your actions. And you should transparently share information about what you’re doing, so others can check for mistakes and hold you accountable.
The basic concept of the rule of law should be used for personal rationality too. Don’t trust yourself to be unbiased or rational. Expect to fail sometimes and plan for that; design policies that will reduce the harm of your failures. Write down rules for yourself, follow them, and provide transparency.
Even that won’t always work. This stuff is hard. Government officials abuse power often. The rule of law doesn’t solve all the problems. It just makes things better. Rationality policies combined with trying your best to be rational will give you a better chance of success than trying without policies. Policies will make it harder to fail in some ways, and push you in some of the right directions.
The perspective that you know you’re trustworthy is a mistake. The rule of law isn’t just for leaders who think “I know I’m trustworthy, but other people don’t know it, so I have to go out of my way for their peace of mind.” It’s not primarily about asymmetric information. Instead, as counter-intuitive as you may find it, your perspective should be: “I’m fallible. I’m making many mistakes that I’m blind to. I have biases that I’m not aware of. This applies to ideas I’m highly confident about.” You should do many things to counter that problem. You should expend major effort, in multiple ways, trying to deal with that difficult situation – that you may be wrong about issues where you feel strongly confident that you’re right. Rationality policies are one tool that can help, if used well. You have to know what you’re doing though; not all rationality policies are beneficial; bad policies can use effort but make things worse.
These examples are intended to give people some ideas of some things that rationality policies can be like.
Most of these policies can and should have some sort of transparency mechanism added on to them. I only specified that occasionally. Basically you post the policy itself publicly, take responsibility for following it, and then also share documentation so people can see whether you’re following it. Also, you should have some kind of policy related to debate or at least listening to feedback, or else the transparency might not do much good. It helps if people can criticize your policies or actions where others can see. If you have an email newsletter with 10,000 readers, and they can email complaints to you, but no one else can see any of the complaints, that doesn’t work very well for transparency even if you share a bunch of documentation of what you’re doing. People need some reasonable way to correct you or at least get visibility for what they say, like a public forum. And the available ways to correct you need to look fairly appealing, reliable, effective, etc., from their perspective – a lot of people won’t want to waste their time correcting you if they doubt you’ll listen and you have no written policy guarantees about listening to criticism.
A general pattern for policies is: trigger condition + action to take + measurable metric. The policy tells you when to do it and what to do, plus it offers some measurements related to both parts. And you should document your policy and actions publicly, and allow public comments, for transparency. So there are four main design elements I’m using for these policies. I bet it’s possible to make a useful policy with one of these design elements missing. I also bet there are other important, reusable design elements besides these four.
Measuring something helps with objectivity. You can use something pretty objective, which requires little judgment, for the starting conditions and for specifying what is doing enough of a task. When you can specify numbers or amounts, there’s less judgment involved. “Every 5 books you read, write at least 5000 words of notes” uses something measurable for both the starting condition and the action to be taken.
“Every 5 books you read…” could be written in a more airtight way to avoid loopholes, but if you’re trying to find loopholes you’re in big trouble anyway. For example, someone could stop reading books with 1 page left so they don’t count as completed books in order to avoid having read 5 books. That’s an example of trying to avoid doing the policy by exploiting a technicality in bad faith. There’s also a lot of room for problems that happen in good faith. Some people don’t finish the majority of books they start. Maybe those people should count books if they get more than 20% of the way through or if they finish the first chapter.
It can be OK to include less objective statements in policies. For example, a policy may say “When making an important decision…” Which decisions are important? You have to judge that. There’s no easy answer. However, some decisions would appear pretty obviously important to most people, e.g. who to hire, fire or marry. If you have transparency, people could question your judgment when you decide to discount as unimportant some decisions that they care about.
Often, transparency is only partial, e.g. focused on intellectual activities or decisions that affect people who aren’t your close friends or family. Decisions affecting employees might only be transparent to a subset of people at the company. Transparency about personal issues like who to marry would be unusual. I wouldn’t want to dismiss it out of hand as necessarily a bad idea, but it’s not what I’m recommending people do. Even when you aren’t transparent about something, you can often still follow your policy. If you have to write 10 paragraphs about important decisions, and you provide no transparency about your marriage, you could still see for yourself whether you wrote the 10 paragraphs or not (you could also report publicly that you did it without including the actual text). It’s reasonably easy to avoid fooling yourself about simple things like whether or not you wrote 10 paragraphs, even without transparency.
When dealing with actions like thinking about something, analyzing something, considering if you might be wrong, etc., it’s hard to write rules that evaluate whether someone did a good job. So some good faith effort is needed. But we can write rules that check if we did it at all. E.g. you could require that you write at least 2 paragraphs, 1 sentence, 300 words, or 5 brainstormed bullet points . Or you could require thinking about it for 5 minutes by the clock, to put it in terms Eliezer Yudkowsky recommended in Rationality: From AI to Zombies (bold added):
Page 218:
Which leads into another good question to ask yourself straight out: Did I spend five minutes with my eyes closed, brainstorming wild and creative options, trying to think of a better alternative? It has to be five minutes by the clock, because otherwise you blink—close your eyes and open them again—and say, “Why, yes, I searched for alternatives, but there weren’t any.” Blinking makes a good black hole down which to dump your duties. An actual, physical clock is recommended.
Page 322:
The moral is that the decision to terminate a search procedure (temporarily or permanently) is, like the search procedure itself, subject to bias and hidden motives. You should suspect motivated stopping when you close off search, after coming to a comfortable conclusion, and yet there’s a lot of fast cheap evidence you haven’t gathered yet—there are websites you could visit, there are counter-counter arguments you could consider, or you haven’t closed your eyes for five minutes by the clock trying to think of a better option. You should suspect motivated continuation when some evidence is leaning in a way you don’t like, but you decide that more evidence is needed—expensive evidence that you know you can’t gather anytime soon, as opposed to something you’re going to look up on Google in thirty minutes—before you’ll have to do anything uncomfortable.
Page 1634:
When AI folk say to me, “Friendly AI is impossible,” I’m pretty sure they haven’t even tried for the sake of trying. But if they did know the technique of “Try for five minutes before giving up,” and they dutifully agreed to try for five minutes by the clock, then they still wouldn’t come up with anything. They would not go forth with true intent to solve the problem, only intent to have tried to solve it, to make themselves defensible.
(By the way, I thought that Yudkowsky mentioned five minutes by the clock more than three times in the book. Based on searching now, I was wrong. I over-emphasized it in my memory because I thought it was important and useful. That led me to incorrectly estimate how many times it featured in the book, which has consequences like potentially overestimating how similar my thinking is to Yudkowsky’s.)
(I could have written down a prediction for how many times it was mentioned in the book before I searched, and maybe I should have. I think writing down that prediction in advance is the kind of thing Yudkowsky would like. The basic point of writing the prediction down, like the point of using an actual clock for five minutes, is that you shouldn’t trust yourself. Writing down a prediction also makes you be specific. I did not have in mind any particular number of times I thought it was in the book. It’s hard to know what I would have guessed in advance, but possibly 6.)
(In this case, I managed to avoid the primary danger that writing the prediction down is meant to combat – the danger of not realizing I was wrong – but presumably in some other cases I don’t avoid that danger, so I should write down predictions more often as a general policy. A policy I do have, more consistently than writing predictions, is emphasizing my errors. Like I’m writing this aside, and I think it’s important enough to include in the article, because I care about mistakes and I routinely try to bring them up and draw attention to them. Similarly, I often do written post mortem analysis of mistakes. Whereas most people try to downplay mistakes, change the subject, not admit to having been mistaken, reduce the attention mistakes get, etc.)
Measurable metrics don’t prevent bad faith. You can easily time yourself for five minutes and think of nothing. You can wait and run out the clock without trying to think of anything if you choose to do that. You can also often find ways to game metrics. But metrics make it harder to be biased, especially if you’re making a good faith effort. Also, I think those Yudkowsky passages, and many others, show that he sees the value in following explicit, objective policies, and using some measurable aspects to limit fooling yourself, even if he didn’t explain it in the same way that I’m approaching it. Although I’m more influenced by other sources, and have had some of these ideas for a long time, I do think Yudkowsky’s five minutes by the clock idea, and some of his other ideas, helped influence my ideas about rationality policies a little bit.
I think people who act in intentional bad faith are basically lost causes, although the harm they do (in roles like king, president, judge, policeman, prison guard, lawmaker, accountant or CEO) can be limited if they have to publicly appear to follow written policies.
My concern is primarily with people who are making some effort to be reasonable, but who sometimes fool themselves, rationalize things, are blind to their own biases, etc. Those people who want to be rational can be helped a lot with policies because they will at least partially follow the spirit of the policies in addition to the letter.
It’s very important to always follow your policies exactly as written with no exceptions whatsoever and also to make a good faith effort to follow the intent or spirit of the policy. You need both literal rule-following and also to want and like the purpose of the policy. If either is missing, it won’t work well.
If you don’t want to follow a particular policy exactly as written, don’t post it as your public policy, and don’t tell people you’ll follow it. Don’t give and break your word. Rewrite it or consider some other policy with softer rules.
If you have a policy and run into a problem, change it later. E.g. follow it this time, then wait a week to get some distance, then think it over, then (if you think it’s best) change it for the future. Suspend it during the pause week if necessary so it can’t be used again. If you violate policies when they’re inconvenient, then you’re breaking your word and defeating the purpose of having policies.
Your policies should be fairly stable and infrequently changed. They should be written carefully and thoughtfully to enable this. Keep in mind that the public may see a policy and plan ahead. If you guarantee to debate under certain conditions, someone might spend months researching a debate topic and meeting the conditions before asking you for a debate. Your policies have to be stable on a multi-year timeframe or other people who work on long timeframes and make large efforts (in other words, some of the best people) will find you unreliable.
If you want to have policies but you’re unsure, label them as tentative policies that you might change. Say it’s a beta test. If you put disclaimers on them that they’re just experimental ideas that you might not follow, then it’s fine to change your mind frequently. As long as you don’t pretend your policies are reliable, and fool yourself or others, then it’s OK if they aren’t reliable. You can also write down a candidate policy privately then pay attention to what following it would be like and try following it as long as that works OK for you. That’s kind of like playing the stock market with fantasy money before using real money.
It’s a good idea to start with tentative policies and beta test them. In general, you should only give your word that you’ll follow policies if you’ve tested them out for a significant time period first and found that they’ve worked OK for you.
Here are some examples of conditions you might require before accepting a debate with someone, rather than accepting debates with absolutely anyone (which might be too many debates or debates which are too low quality).
In general, any debate can stop due to unanimous consent. Taking that option away would be a special case (and would probably involve a short time limit). The stopping rules below are meant as alternatives for when unanimous consent is not achieved and someone wants to stop. More than one of these could be allowed. Also, each condition means you stop if anyone actually wants to stop. They let one person unilaterally stop the debate, but they don’t require you to stop.
Policy: Read one article from a rival tribe every week. For transparency, after you read an article, post a link or cite, with the date, to a public list you maintain.
The goal is to engage with other perspectives and ideas. You could add extra detail to the policy, e.g.:
You can’t use the same tribe twice in one month.
This would help with the concern that you find a couple rival tribes that you like and ignore other ones that challenge your perspective more. Part of the goal is to expose yourself to a wide variety of ideas.
These policies aren’t foolproof. You could find some loophole or follow them in bad faith. There are two things that could keep you honest. First, your integrity. Second, other people could offer criticism since you’re providing transparency.
Don’t expect this to work well unless you actually like the policy. If you do this policy because you feel like you “should”, or because other people in your social group do it, then there’s a significant risk that it won’t help you (but it also might help; there’s still a chance it works well). Similarly, if you read the articles begrudgingly, as a painful duty, it’s not going to expand your mind well. But people occasionally start doing something with a terrible attitude and then get drawn in, so that’s not hopeless either.
It’s problematic to use generic rationality policies off a list. If you’re going to try to be rational and use policies to help you, you need to be actively involved in that process and have personalized, customized, individualized ideas about what will work for you. You need policies that you understand well, see the point of, care about, and will actually do. Writing your own policies is generally related to those things actually being true. If you had a personal friend, tutor or mentor who helped you write policies, that would also work much better than generic policies because they’d be designed to fit your life.
Generic policies can serve as examples and inspiration. You can create similar policies for yourself. You can reuse elements of generic policies. You can also look at example policies to find patterns and better understand what a policy can be like. But this topic is under-explored, so don’t assume my examples cover all the types of policies that could be standard, common or useful; I don’t think they do.
Fallibilism, Bias, and the Rule of Law and My Experience with My Debate Policy.