This is a special post for quick takes by Ben Snodin. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Sorted by Click to highlight new quick takes since:

I (with lots of help from my colleague Marie Davidsen Buhl) made a database of resources relevant nanotechnology strategy research, with articles sorted by relevance for people new to the area. I hope it will be useful for people who want to look into doing research in this area.

My Notes on Certificates of Impact

Introduction & purpose of post

This post contains some notes that I wrote after ~ 1 week of reading about Certificates of Impact as part of my work as a Research Scholar at the Future of Humanity Institute, and a bit of time after that thinking and talking about the idea here and there.

In this post, I

  • describe what Certificates of Impact are, including a concrete proposal,
  • provide some lists of ways that it might be good or bad, and reasons it might or might not work,
  • provide some other miscellaneous thoughts relevant to future work on Certificates of Impact, and
  • provide some links to relevant resources.

I’m sharing this here in case it’s useful - the intended audience is people who are curious about what Certificates of Impact are, and (to some extent) people who are thinking seriously about Certificates of Impact.

Note that, since I haven’t invested much time thinking about Certificates of Impact, my understanding of this area is fairly shallow. I’ve tried to include appropriate caveats in the text to reflect this, but I might not have always succeeded, so please bear this in mind.

What Certificates of Impact are

Within this document, I’m using Certificates of Impact to refer to the general idea about creating a market in altruistic impact. I think that the general idea is also referred to as Impact Certificates, Tradeable Altruistic Impact, and Impact Purchases.

Certificates of Impact is an idea that's been floating around in the Effective Altruism community for some time. Paul Christiano and Katja Grace ran an experiment with Certificates of Impact about 5 years ago. I've seen various EA forum posts about Certificates of Impact too (see the final section of this post for some links).

By a market in altruistic impact, I mean something like the following: we imagine a future where there are people who want to donate to charity, and there are people who are doing high impact projects, and rather than them making the effort to seek each other out, they connect through this market. In the market, the individuals or organisations doing the projects issue Certificates of Impact, and donors buy them. And maybe as a donor you don't need to try so hard to find the best project, you just buy some certificates from some marketplace; and as someone doing a high impact project, you don't have to work so hard to connect to donors, because you find that there are profit seeking organisations that are willing to buy your certificates, and that's your source of funding.

Note both that the above is quite vague, and also there are probably some aspects you could change and still have something that could fall under Certificates of Impact.

A semi-concrete proposal

There are lots of varieties of Certificates of Impact-type systems that could be tried. To make things easier, from now on in this document I’ll assess a concrete proposal called Certificates of Impact with Dedication (idea due to Owen Cotton-Barratt):

A market is created / exists where someone can issue a Certificate for work they believe to be altruistically impactful. We call this person the Issuer. There is a statute of limitations on issuing Certificates of two years (i.e. Certificates can’t be issued for work more than two years old). The Certificate is assessed by a Validator who confirms that the work specified on the Certificate has in fact been done. The Issuer then sells the Certificate in the market, maybe via an auction mechanism. Note that the Certificate can refer to some percentage of the project, so for example it might represent 40% of the altruistic impact of a project, while the Issuer keeps the other 60%. The Certificate is traded on the secondary market by professional traders and then bought by an Ultimate Buyer who is the ultimate consumer of the Certificate. The Ultimate Buyer then Dedicates the Certificate, possibly to themselves so that they get the credit for the altruistic impact.

Whoever the certificate gets Dedicated to is the one who gets the credit for the counterfactual altruistic impact of the project that the certificate refers to. And once a certificate has been Dedicated it can't be traded anymore, so Dedication is its end point. Importantly (in my view), if you don't have a Dedication mechanism it's not clear whether people who own Certificates of Impact have bought them so that they can resell them at a profit, or because they want to have altruistic impact.

Brainstorm-style lists of ways Certificates of Impact might go well or badly

In this section, I list ways Certificates of Impact might go well or badly, or why it might or might not work. Generally, I’ve tried to err on the side of including things even where I consider them to be very speculative.

Note that my opinions, where I give them, are pretty unstable: I can easily imagine myself changing my mind on reflection or after seeing new arguments.

How good might this be?

Let’s imagine that a well-functioning Certificates of Impact with Dedication system exists with a large number of active participants including profit-seeking intermediaries. I list the ways this could be good below.

Here is a summary of the possible benefits. For each one, I’ve put my opinion regarding how likely it is in brackets.

  • More efficient allocation of time and money dedicated to altruistically impactful work. (I think this is the most likely benefit)
  • Improved quality of people working on altruistically impactful work. (I think this is plausible)
  • Larger pool of donations towards altruistically impactful work. (I think this is plausible)
  • Assessments of impact can be deferred to the future. (I think this is guaranteed under the proposal I’m using)

Here’s more detail

  • Resources intended for altruistic impact are allocated more efficiently (in the sense of getting higher impact for the same quantity of input resources through better allocation)
    • This applies to both cash from donors (the Ultimate Buyers), and cash and non-cash resources, like hours of work, from people working to create altruistic impact (the Issuers).
    • This comes about from having a well-functioning market in Certificates of Impact. Markets are great at efficient allocation of resources.
    • In particular
      • Profit-seeking organisations bring prices to correct levels through expert analysis
        • Venture Capital
        • Financial security analysts
      • Funding (through ordinary finance world) for altruistically impactful projects / start ups at the start (possibly easier if you can issue certificates for a project before it starts, but this isn’t essential)
        • Lower the barrier for / correctly incentivise risky altruistic projects
      • Work intended for altruistic impact gets focussed on the best cause areas (through financial incentive / existence of funding)
        • I have a bit of a worldview that money is an incredibly effective signal and motivator for getting people to work on stuff - see the army of smart graduates going into law, banking, consultancy, accountancy, etc.
  • Funding makes altruistically impactful startups sexier -> easier to attract top talent
  • Larger pool of donations to effective charities through presence of large and salient Certificates of Impact market, which makes it more culturally usual to donate to effective charities
    • Probably second-order benefits too (get people to think about what is effective)
  • Some deference to the future since prices today are set by expectations of how things will be valued in the future
    • Good because it’s easier to assess things after the fact
    • Also good to the extent that we trust future people’s moral judgements more than our own

How feasible is this?

I list below considerations for thinking about how feasible it is to get to a state where this is big and being used by lots of people (whether it is actually achieving the desired outcomes of improved efficiency etc or not).

  • Some kind of critical mass will be necessary to get this off the ground.
    • Not clear to me what size is necessary
  • Related: lack of standardisation might make it hard to create a liquid market in the certificates.
  • Related: getting profit-seeking entities involved might require a huge, standardised market. I’m not sure whether the best/most likely version of Certificates of Impact includes profit-seeking entities.
  • People need to trust and understand what the Certificates of Impact are supposed to represent.
  • (maybe?) People need to feel that the Certificates of Impact really represent causal impact - buying a Certificate of Impact causes the impact to happen.
  • Assessing the value of the projects needs to be feasible.
  • This needs to work under a mix of altruistic preferences (I think this is probably not an issue, but I’m not sure)

How bad might this be?

I list below the ways a large (but maybe not well-functioning) market in Certificates of Impact with Dedication could fail to have a positive impact, or even be net negative.

Here is a summary of the possible issues / harms. For each one, I’ve put my opinion regarding how likely it is and/or how bad it might be in brackets.

  • The quality of the altruistic impact pricing would be inadequate, and this is net negative, e.g. because it’s just too hard to evaluate the altruistic impact of a project. (I think this is fairly likely but perhaps unlikely to make Certificates of Impact strongly net negative)
  • The presence of money / explicit valuing of projects would destroy intrinsic motivation for Issuers and generally make the whole thing very sordid and transactional. (I think this is unlikely to be a major issue)
  • The weirdness of the whole concept from the point of view of the Issuer and/or Ultimate Buyer would make this unworkable. (I think this is somewhat unlikely to happen)
  • Trying to motivate altruistic behaviour with money will be net negative due to fraud etc. (I think this is quite plausible and would be quite bad)
  • Explicit certification of rich people as causing large altruistic impact will be very unpopular. (unclear to me how likely this is or how bad it would be)
  • Poor market behaviour such as crashes and bubbles would be very damaging. (seems unlikely to be net-negative to me, but it’s probably still a concern)
  • Impact is expensive because you have to pay at the price at which you value the impact, rather than e.g. what the project costs. (unlikely to be a major issue in my view, but probably still a concern)
  • Infrastructure costs would outweigh the benefits (I think this is possibly an issue / closely related to feasibility)

Here’s more detail

  • Inadequate pricing of altruistic impact
    • If Ultimate Buyers have poor altruistic preferences (e.g. favouring cute animals over vast numbers of future people) (or perhaps even completely selfish preferences?) the right things won’t get rewarded and the market could be completely dysfunctional.
    • Some things are not easy to assess after the fact - an extreme case is an intervention that causes extinction.
    • Downside risks from projects are not accounted for - this incentivises low (or negative) expected impact projects because the price the Issuer can sell at is floored at zero, distorting the expected benefit for the Issuer
    • You’d be better off making a GiveWell for longtermist projects, giving money to the Open Philanthropy Project, etc
  • Undesirable effects of putting a price on everything
    • This would make the altruistic ecosystem very sordid and transactional (for everyone involved and/or for the general public).
    • This would remove intrinsic motivation for the Issuer.
  • The weirdness of the concept of Certificates of Impact makes this unworkable
    • People who want to donate won’t buy into the idea that buying a certificate causes the impact to happen. It’s too confusing and causally remote from the work the Issuer did.
    • It would be pretty weird for people who are altruistically motivated to do a project if they sell a certificate for all the work and consequently are not (deemed to be) causally responsible for the impact of their project.
      • You can sell a fraction of the work, but what fraction should you choose?
  • Looking at the current financial system helps see the flaws that markets can have. E.g. fraud by Issuer - even at a low level by creating incentives to exaggerate impact. Or any other issues with the Issuer optimising for the value of the certificate rather than optimising for altruistic impact. In the same way that public corporations now have an overwhelmingly strong “duty to shareholders” to maximise profits, even though this is very harmful to society.
    • Contrast “provide money to enable intrinsically motivated people to do good thing” with “pay for anyone to do things that appear high impact”. Maybe you don’t get great outcomes with the latter.
    • Contrast with the current status quo where things feel pretty cooperative (to me), at least inside Effective Altruism circles.
  • Ordinary people may be very unhappy with a system that explicitly certifies that a rich person caused lots of positive impact.
  • The weird things that markets can do - like crash or have bubbles - would be very damaging.
  • Unlike for a normal market, you have to pay at the price you value the impact at, rather than paying a cheaper price and benefitting from the consumer surplus as you do in ordinary product markets.
  • The infrastructure costs would outweigh the benefits: e.g. project validation, impact assessment, market infrastructure, etc

Other thoughts

  • Information value of experimentation
    • The negative points may not weigh so heavily if we think we can run small, reversible experiments to get more information.
  • Sources for outside views
    • How much have markets been beneficial in general?
      • Overall very good I think.
      • When have they been bad? E.g. natural monopoly
    • How have attempts to create a Certificates of Impact type system gone in the past (Paul Christiano and Katja Grace in 2015)?
    • How have attempts to create unusual markets in something gone in the past?
      • Online multiplayer game markets
      • Organ donation
      • Carbon credits / emissions trading
      • Gofundme etc
      • Cryptokitties?
    • How have similar things aimed at altruistic impact gone
      • Health Impact Certificates
      • Social/Development Impact Certificates
      • Prizes
  • There is surely an economics literature on when markets are good and when markets can be formed.

Relevant resources

FWIW I think you should make this a top level post.

Kind of surprised that this post doesn't link at all to Paul's post on altruistic equity:

EA "civilisational epistemics" project / org idea
Or:  an EA social media team for helping to spread simple and important ideas

Below I describe a not-at-all-thought-through idea for a high impact EA org / project. I am in no way confident that something like this is actually a good idea, although I can imagine it being worth looking into. Also, for all I know people have already thought about whether something like this would be good. Also, the idea is not due to me (any credit goes to others, all blame goes to me).

Motivating example (rough story which I haven't verified, probably some details are wrong): in the US in 2020 COVID testing companies had a big backlog of tests, and tests had a relatively short shelf life during which analysis would still be useful. Unfortunately, analysis was done with a "first in first out" queue, meaning that testing companies would analyse the oldest tests first, often meaning they wasted precious analysis capacity on expired tests. Bill Gates and others publicly noted that this was dumb and that testing companies should flip the queue and analyse the newest tests first (or maybe he just said that testing companies should be paid for "on time delivery of results" rather than just "results"). But very few people paid attention, and the testing companies didn't change their systems. This had very bad consequences for the spread of the pandemic in the US.

Claim: creating common knowledge of / social media lobbying pressure related to the right idea can make the right people know about it / face public pressure, causing them to implement the idea.

Further claim: this is a big enough deal that it's worth some EA effort to make this happen.

The suggested EA project / org: work to identify important, simple, and uncontroversial ideas as they arise and use social media to get people to pay attention to them.

Really good idea and I think spreading socially useful information is really underexplored. 

Maybe one could even think about more  broad generalizable bite-sized memes that are robustly good  for everyone to know and one should spread. 

Some examples:

  • Germ theory
  • Pigouvian Taxes
  • Personal finance (e.g. Index funds)
  • Cost-effectiveness analysis
  • Health behaviours

Maybe there should be a DMI-like organization that does that.

Maybe effective would be either very visual ways of spreading these messages within a few seconds (e.g. ). 

There's already Kurzgesagt, which is a bit further along the spectrum towards 'deep engagement' which I think is really good and gets funding from the Gates Foundation.


DMI-like organization

What does DMI stand for?

Maybe Development Media International? It was a standout Givewell charity for a while.

Development Media International (DMI) is a non-governmental organization with both non-profit and for-profit arms that "use[s] scientific modelling combined with mass media campaigns in order to save the greatest number of lives in the most cost-effective way".

Oh cool, thanks!

They are to many DMI meaning like "Development Media International", "Desktop Management Information",  "Deferred Maintenance Item"..

More of them look there :

The lobbying pressure seems more important than the common knowledge.

EA orgs already spend a lot of time identifying and sharing important and simple ideas — I wouldn't call them "uncontroversial", but few ideas are. (See "building more houses makes housing cheaper", which is a lot more controversial than I'd have expected before I started to follow that "debate".)

I do think it would be worth spending a few hours trying to come up with examples of ideas that would be good to spread + calculating very rough BOTECs for them. For example, what's the value of getting one middle-class American to embrace passive rather than active investment? What's the value of getting one more person vaccinated?

Development Media International is the obvious parallel, and the cost-effectiveness of using ridiculously cheap radio advertisements to share basic public health information seems hard to beat on priors. But there are a lot of directions you could go with "civilizational epistemics", and maybe some of them wind up looking much better, e.g. because working in the developed world = many more resources to redirect.

(Speaking of which, Guarding Against Pandemics is another example — their goal isn't just to reach a few specific politicians, but to reach people who will share their message with politicians.)

Is this the sort of thing where if we had, say, 10 - 100 EAs and a billion dollar / year budget, we could use that money to basically buy the eyeballs of a significant fraction of the US population? Are they for sale?

For a billion dollars, you can buy hundreds of millions of eyeballs.

As an extreme example, a 30-second Super Bowl advertisement costs just under $6 million and reaches almost 100 million people. And that can't be anywhere near the upper limit of efficiency (I'd guess those ads are wildly overpriced given the additional status/prestige they confer).

It depends what media type you're talking about (audio, video, display, ...) - $6m/100m is $60CPM ('cost per mille'), which is certainly above the odds for similar 'premium video' advertising, but only by maybe 2-5x. For other media like audio and display the CPMs can be quite a bit lower, and if you're just looking to reach 'someone, somewhere' you can get a bargain via programmatic advertising.

I happen to work for a major demand-side platform in real-time ad buying and I've been wondering if there might be a way to efficiently do good this way. The pricing can be quite nuanced. Haven't done any analysis at this point.

Takeaways from some reading about economic effects of human-level AI

I spent some time reading things that you might categorise as “EA articles on the impact of human-level AI on economic growth”. Here are some takeaways from reading these (apologies for not always providing a lot of context / for not defining terms; hopefully clicking the links will provide decent context).

If you're interested in more on this topic, I'd highlight Holden Karnofsky's recent blog series and Tom Davidson's recent Open Phil report as good places to start.

In case it’s useful for other people, here’s the main stuff I (at least partially) read / listened to:

This from Paul Christiano in 2014 is also very relevant (part of it makes similar points to a lot of the recent stuff from Open Philanthropy, but the arguments are very brief; it's interesting to see how things have evolved over the years): Three impacts of machine intelligence

Here are some notes I made while reading a transcript of a seminar called You and Your Research by Richard Hamming. (I'd previously read this article with the same name, but I feel like I got something out of reading this seminar transcript although there's a lot of overlap). 

  • On courage:
    • "Once you get your courage up and believe that you can do important problems, then you can"
    • In the Q&A he talks about researchers in the 40's and 50's naturally having courage after coming out of WW2.
  • Age makes you less productive because when you have prestigious awards you only work on 'big' problems
  • Bad working conditions can force you to be creative
  • You have to work very hard to succeed
    • "I spent a good deal more of my time for some years trying to work a bit harder and I found, in fact, I could get more work done"
    • "Just hard work is not enough - it must be applied sensibly."
  • On coping with ambiguity
    • "Great scientists… believe the theory enough to go ahead; they doubt it enough to notice the errors and faults so they can step forward and create the new replacement theory"
  • Get your subconscious to work for you.  Cf "shower thoughts", Paul Graham's The Top Idea in Your Mind
    • "If you are deeply immersed and committed to a topic, day after day after day, your subconscious has nothing to do but work on your problem"
    • "So the way to manage yourself is that when you have a real important problem you don't let anything else get the center of your attention - you keep your thoughts on the problem. Keep your subconscious starved so it has to work on your problem, so you can sleep peacefully and get the answer in the morning, free."
  • On thinking great thoughts
    • "Great Thoughts Time" from lunchtime on Friday
    • E.g. "What will be the role of computers in all of AT&T?", "How will computers change science?"
  • On having problems to try new ideas on:
    • Most great scientists "have something between 10 and 20 important problems for which they are looking for an attack. And when they see a new idea come up, one hears them say ``Well that bears on this problem.'' They drop all the other things and get after it."
  • Having an open office door is better:
    • "if you have the door to your office closed, you get more work done today and tomorrow, and you are more productive than most. But 10 years later somehow you don't know quite know what problems are worth working on; all the hard work you do is sort of tangential in importance."
  • On the importance of working hard
    • "The people who do great work with less ability but who are committed to it, get more done that those who have great skill and dabble in it, who work during the day and go home and do other things and come back and work the next day. They don't have the deep commitment that is apparently necessary for really first-class work"
  • On using commitment devices to create pressure on yourself to perform
    • "I found out many times, like a cornered rat in a real trap, I was surprisingly capable. I have found that it paid to say, ``Oh yes, I'll get the answer for you Tuesday,'' not having any idea how to do it"
  • On putting yourself under stress
    • "if you want to be a great scientist you're going to have to put up with stress. You can lead a nice life; you can be a nice guy or you can be a great scientist. But nice guys end last, is what Leo Durocher said. If you want to lead a nice happy life with a lot of recreation and everything else, you'll lead a nice life."
  • On "being alone"
    • "If you want to think new thoughts that are different, then do what a lot of creative people do - get the problem reasonably clear and then refuse to look at any answers until you've thought the problem through carefully how you would do it"
  • On been very successful over a long career
    • "Somewhere around every seven years make a significant, if not complete, shift in your field… When you go to a new field, you have to start over as a baby. You are no longer the big mukity muk and you can start back there and you can start planting those acorns which will become the giant oaks"
  • On vision and research management
    • "When your vision of what you want to do is what you can do single-handedly, then you should pursue it. The day your vision, what you think needs to be done, is bigger than what you can do single-handedly, then you have to move toward management"

Things I took away for myself

  • I like the idea of trying to have some kind of “great thoughts time” each week.
  • The “having an open office door is better” claim is interesting to think about when I’m considering whether / in what way to return to the office now that it’s an option.
  • I think a lot about optimal work hours for myself, and I take this talk as a data point in favour of “work long hours”
  • The vision + research management point kind of resonates / makes sense.
  • Maybe it could be interesting to try “overpromising” to boost productivity.

You might be interested in checking out Ingredients for creating disruptive research teams e.g. on vision, autonomy, spaces for interaction.

Also I noticed that Jess Whittlestone wrote some probably much better notes on this a few years ago

Here are some forecasts for near-term progress / impacts of AI on research. They are the results of some small-ish number of hours of reading + thinking, and shouldn't be taken at all seriously. I'm sharing in case it's interesting for people and especially to get feedback on my bottom line probabilities and thought processes. I'm pretty sure there are some things I'm very wrong about in the below and I'd love for those to be corrected.

  1. Deepmind will announce excellent performance from Alphafold2 (AF2) or some successor / relative for multi-domain proteins by end of 2023; or some other group will announced this using some AI scheme: 80% probability
  2. Deepmind will announce excellent performance from AF2 or some successor / relative for protein complexes by end of 2023; or some other group will announced this using some AI scheme: 70% probability
  3. Widespread adoption of a system like OpenAI Codex for data analysis will happen by end of 2023: 20% probability

I realise that "excellent performance" etc is vague, I choose to live with that rather than putting in the time to make everything precise (or not doing the exercise at all).

If you don't know what multi-domain proteins and protein complexes are, I found this Mohammed Al Quraishi blog very useful (maybe try ctrl-f for those terms), although maybe you need to start with some relevant background knowledge. I don't have a great sense for how big a deal this would be for various areas of biological science, but my impression is that they're both roughly the same order of magnitude of usefulness as getting excellent performance on single-domain proteins was (i.e. what AF2 has already achieved).

As for why:

80% chance that excellent AI performance on multi-domain proteins is announced by end of 2023

  • Top reasons for event happening
  • Top reasons against
    • Maybe they won't announce it, because it's not newsworthy enough; or they'll bundle it with some bigger announcement with lots of cool results (resulting in delayed announcement)
  • Other reasons against
    • In particular, the results from the next CASP competition will presumably be announced in December 2022; if they haven't cracked it by then, maybe we won't hear about it by end of 2023
      • They'd need to get there by April 2022  (I think that is the submission deadline for CASP)
    • Maybe it will turn out to be way less tractable than expected
    • Maybe Deepmind will have other, even more pressing priorities, or some key people will unexpectedly leave, or they'll lose funding, or something else unexpected happens
  • Key uncertainties
    • Are rival protein folding schemes targeting this?

70% chance that excellent AI performance on protein complexes is announced by end of 2023

  • Top reasons for event happening
  • Top reasons against
    • Maybe even if it's done by say mid 2023 it won't be announced until after 2023 because of Deepmind's media strategy
      • In particular, targeting a CASP would seem to require the high performance to be achieved by mid 2022; maybe this is the most likely scenario in worlds where Deepmind announces this before end of 2023
      • (although if Deepmind doesn't get there by CASP15, it seems like another group might announce something in say 2023)
    • Protein complexes are (maybe?) qualitatively different to single proteins
  • Other reasons against
    • Maybe the lack of data will be decisive
    • Maybe Deepmind's priorities will change, etc, as in noted above in the multi-domain case
  • Key uncertainties
    • Are rival schemes targeting this?

20% chance of widespread adoption of a system like OpenAI Codex for data analysis by end of 2023

(NB this is just about data analysis / "data science" rather than about usage of Codex in general)

  • My "best guess" scenario
    • OpenAI releases an API for data science that is cheap but not free. In its current iteration, the software is "handy" but not more than that. A later iteration, released in 2023, is significantly more powerful and useful. But by the end of 2023 it is still not yet "widely used".
  • Some reasons against event happening
    • Maybe Codex is currently not that useful in practice for data analysis
    • I think OpenAI won't release it for free so it won't become part of the "standard toolkit" in the same way that e.g. RStudio has
    • Things like RStudio take a long time to diffuse / become adopted
      • E.g. my guess is ~5 years to get 25% uptake of for ipython notebook or rstudio by data scientists, or something like that
  • Key uncertainties
    • How much are OpenAI going to push this on people?
      • How much are they pushing the data science aspect particularly?
    • Will this be ~free to use or will it be licensed?
    • How quickly will it improve? How often does OpenAI release improved versions of things?
    • How fast did ipython notebook/rstudio get adopted?
    • How much has the GPT-3 API been used so far?
  • Some useful links

Changing your working to fit the answer

I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. It is quite rambling and doesn't really have a clear point (but I think it's at least an interesting topic).

Say you want to come up with a model for AI timelines, i.e. the probability of transformative AI being developed by year X for various values of X. You put in your assumptions (beliefs about the world), come up with a framework for combining them, and get an answer out. But then you’re not happy with the answer - your framework must have been flawed, or maybe on reflection one of your assumptions needs a bit of revision. So you fiddle with one or two things and get another answer - now it looks much better, close enough to your prior belief that it seems plausible, but not so close that it seems suspicious.

Is this kind of procedure valid? Here’s one case where the answer seems to be yes: if your conclusions are logically impossible, you know that either there’s a flaw in your framework or you need to revise your assumptions (or both). 

A closely related case is where the conclusion is logically possible, but extremely unlikely. It seems like there’s a lot of pressure to revise something then too.

But in the right context revising your model in this way can look pretty dodgy. It seems like you’re “doing things the wrong way round” - what was the point of building the model if you were going to fiddle with the assumptions until you got the answer you expected anyway?

I think this is connected to a lot of related issues / concepts:

  • Model building
    • Option pricing models in finance: you start (both historically and conceptually) with the nice clean Black-Scholes model, which fails to explain actually observed option prices. Due to this, various assumptions are relaxed or modified, adding (arguably, somewhat ad hoc) complexity until, for the right set of parameters, the model gets all (sufficiently important) observed option prices right.
    • Regularisation / overfitting in ML: you might think of overfitting as “placing too much weight on getting the answer you expect”.
  • Arguments
    • “One person's modus ponens is another’s modus tollens”: if we’re presented with a logical argument, usually the person presenting it wants us to accept the premises and agree that the argument is valid, in which case we must accept the conclusion. If we don’t like the conclusion, we often focus on showing that the argument is invalid. But if you think the conclusion is very unlikely, you also have the option of acknowledging the argument as valid, but rejecting one of the premises. There are lots of fun examples of this from science and philosophy on Gwern’s page on the subject.
    • “Begging the question”: a related accusation in philosophy that seems to mean roughly “your conclusion follows trivially from your premises but I reject one of your premises (and by the way it should have been obvious that I’d reject one of your premises so it was a waste of both my time and yours that you made this argument)”
    • Reductio ad absurdum: disprove something by using it as an assumption that leads to an implausible (or maybe logically impossible) conclusion
    • “Proving too much”: an accusation in philosophy that is supposed to count against the argument doing the “proving”.
    • (Not) updating your beliefs from an argument that appears convincing on the face of it: if the conclusions are implausible enough, you might not update your beliefs too much the first time you encounter the argument, even if it appears watertight.
  • Research methods
    • Sanity checking your answer: check that the results of a complex calculation or experiment roughly match the result you get from a quick and crude approach.

Presumably, you could put this question of whether and how much to modify your model into some kind of formal Bayesian framework where on learning a new argument you update all your beliefs based on your prior beliefs in the premises, conclusion, and validity of the argument. I’m not sure whether there’s a literature on this, or whether e.g. highly skilled forecasters actually think like this.

In general though, it seems (to me) that there’s something important about “following where the assumptions / model takes you”. Maybe, given all the ways we fall short of being perfectly rational, we should (and I think that in fact we do) put more emphasis on this than a perfectly rational Bayesian agent would. Avoiding having a very strong prior on the conclusion seems helpful here.

Two papers I read on imprecise probabilities

I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public.

In this post I’m going to discuss two papers regarding imprecise probability that I read this week for a Decision Theory seminar. The first paper seeks to show that imprecise probabilities don’t adequately constrain the actions of a rational agent. The second paper seeks to refute that claim.

Just a note on how seriously to take what I’ve written here: I think I’ve got the gist of what’s in these papers, but I feel I could spend a lot more time making sure I’ve understood them and thinking about which arguments I find persuasive. It’s very possible I’ve misunderstood or misrepresented the points the papers were trying to make, and I can easily see myself changing my mind about things if I thought and read more.

Also, a note on terminology: it seems like “sharpness/unsharpness” and “precision/imprecision” are used interchangeably in these papers, as are “probability” and “credence”. There might well be subtle distinctions that I’m missing, but I’ll try to consistently use “imprecise probabilities” here.

Imprecise probabilities

I imagine there are (at least) several different ways of formulating imprecise probabilities. One way is the following: your belief state is represented by a set of probability functions, and your degree of belief in a particular proposition is represented by the set of values assigned to it by the set of probability functions. You also then have an imprecise expectation: each of your probability functions has an associated expected utility. Sometimes, all of your probability functions will agree on the action that has the highest expected value. In that case, you are rationally required to take that action. But if there’s no clear winner, that means there’s more than one permissible action you could take.

Subjective Probabilities should be Sharp

The first paper, Subjective Probabilities should be Sharp, was written in 2010 by Elga. The central claim is that there’s no plausible account of how imprecise probabilities constrain which choices are reasonable for a perfectly rational agent.

The argument centers around a particular betting scenario: someone tells you “I’m going to offer you bet A and then bet B, regarding a hypothesis H”:

Bet A: win $15 if H, else lose $10

Bet B: lose $10 if H, else win $15

You’re free to choose whether to take bet B independently of whether you choose bet A.

Depending on what you believe about H, it could well be that you prefer just one of the bets to both bets. But it seems like you really shouldn’t reject both bets. Taking both bets guarantees you’ll win exactly $5, which is strictly better than the $0 you’ll win if you reject both bets.

But under imprecise probabilities, it’s rationally permissible to have some range of probabilities for H, which implies that it’s permissible to reject both bet A and bet B. So imprecise probabilities permit something which seems like it ought to be impermissible.

Elga considers various rules that might be added to the initial imprecise probabilities-based decision theory, and argues that none of them are very appealing. I guess this isn’t as good as proving that there are no good rules or other modifications, but I found it fairly compelling on the face of it.

The rules that seemed most likely to work to me were Plan and Sequence. Both rules more or less entail that you should accept bet B if you already accepted bet A, in which case rejecting both bets is impermissible and it looks like the theory is saved. 

Elga tries to show that these don’t work by inviting us to imagine the case where a particular agent called Sally faces the decision problem. Sally has imprecise probabilities, maximises expected utility and has a utility function that is linear in dollars.

Elga argues that in this scenario it just doesn’t make sense for Sally to accept bet B only if she already accepted bet A - the decision to accept bet B shouldn’t depend on anything that came before. It might do if Sally had some risk averse decision theory, or had a utility function that was concave in dollars - but by assumption, she doesn’t. So Plan and Sequence, which had seemed like the best candidates for rescuing imprecise probabilities, aren’t plausible rules for a rational agent like Sally.

Should Subjective Probabilities be Sharp?

The 2014 paper by Bradley and Steele, Should Subjective Probabilities be Sharp? is, as the name suggests, a response to Elga’s paper. The core of their argument is that the assumptions for rationality implied by Elga’s argument are too strong and that it’s perfectly possible to have rational choice with imprecise probabilities provided that you don’t make these too-strong assumptions.

I’ll highlight two objections and give my view.

Objection 1:

  • Bradley and Steele give the label Retrospective Rationality to the idea that an agent’s sequence of decisions should not be dominated by another sequence the agent could have made. They seem to reject Retrospective Rationality as a constraint on rational decision making because “[it] is useless to an agent who is wondering what to do… [the agent] should be concerned to make the best decision possible at [the time of the decision]”. 
  • My view: I don’t find this a very compelling argument, at least in the current context - it seems to me that the agent should avoid foreseeably violating Retrospective Rationality, and in Elga’s betting scenario the irrationality of the “reject both bets” sequence of decisions seems perfectly foreseeable.

Objection 2:

  • Their second objection is that Elga is wrong to think that your current decision about whether to accept bet B should be unaffected by whether you previously accepted or rejected bet A (they make a similar point regarding the decision to take bet A with vs without the knowledge that you’re about to be offered bet B). 
  • My view: it’s true that, because in Elga’s betting scenario the outcomes of the bets are correlated, knowing whether or not you previously accepted bet A might well change your inclination to accept bet B, e.g. because of risk aversion or a non-linear utility function. But to me it seems right that for an agent whose decision theory doesn’t include these features, it would be irrational to change their inclination to accept bet B based on what came before - and Elga was considering such an agent. So I think I side with Elga here.

Summary and some thoughts

In summary, in Subjective Probabilities should be Sharp, Elga illustrates how imprecise probabilities appear to permit a risk-neutral agent with linear utility to make irrational choices. In addition, Elga argues that there aren’t any ways to rescue things while keeping imprecise probabilities. In Should Subjective Probabilities be Sharp?, Bradley and Steele argue that Elga makes some implausibly strong assumptions about what it takes to be rational. I didn't find these arguments very convincing, although I might well have just failed to appreciate the points they were trying to make.

I think it basically comes down to this: for an agent with decision theory features like Sally’s, i.e. no risk aversion and linear utility, the only way to avoid passing up opportunities like making a risk-free $5 by taking bet A and bet B is if you’re always willing to take one side of any particular bet. The problem with imprecise probabilities is that they permit you to refrain from taking either side, which implies that you’re permitted to decline the risk-free $5.

The fan of imprecise probabilities can wriggle out of this by saying that you should be allowed to do things like taking bet B only if you just took bet A - but I agree with Elga that this just doesn’t make sense for an agent like Sally. I think the reason this might look overly demanding on the face of it is that we’re not like Sally - we’re risk averse and have concave utility. But agents who are risk averse or have concave utility are allowed both to sometimes decline bets and to take risk-free sequences of bets, even according to Elga’s rationality requirements, so I don’t think this intuition pushes against Elga’s rationality requirements.

It feels kind of useful to have read these papers, because

  • I’ve been kind of aware of imprecise probabilities and had a feeling I should think about them, and this has given me a bit of a feel for what they’re about.
  • It makes further reading in this area easier.
  • It’s good to get an idea of what sort of considerations people think about when deciding whether a decision theory is a good one. Similarly to when I dug more into moral philosophy, I now have more of a feeling along the lines of “there’s a lot of room for disagreement about what makes a good decision theory”.
  • Relatedly, it’s good to get a bit of a feeling of “there’s nothing really revolutionary or groundbreaking here and I should to some extent feel free to do what I want”.

Causal vs evidential decision theory

I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. Decision theories are a pretty well-worn topic in EA circles and I'm definitely not adding new insights here. These are just some fairly naive thoughts-out-loud about how CDT and EDT handle various scenarios. If you've already thought a lot about decision theory you probably won't learn anything from this.

The last two weeks of the Decision Theory seminars I’ve been attending have focussed on contrasting causal decision theory (CDT) and evidential decision theory (EDT). This seems to be a pretty active area of discussion in the literature - one of the papers we looked at was published this year, and another is yet to be published.

In terms of the history of the field, it seems that Newcomb’s problem prompted a move towards CDT (e.g. in Lewis 1981). I find that pretty surprising because to me Newcomb’s problem provides quite a bit of motivation for EDT, and without weird scenarios like Newcomb’s problem I think I might have taken something like CDT to be the default, obviously correct theory. But it seems like you didn’t need to worry about having a “causal aspect” to decision theories until Newcomb’s problem and other similar problems brought out a divergence in recommendations from (what became known as) CDT and EDT.

I guess this is a very well-worn area (especially in places like Lesswrong) but anyway I can’t resist giving my fairly naive take even though I’m sure I’m just repeating what others have said. When I first heard about things like Newcomb’s problem a few years ago I think I was a pretty ardent CDTer, whereas nowadays I am much more sympathetic to EDT.

In Newcomb’s problem, it seems pretty clear to me that one-boxing is the best option, because I’d rather have $1,000,000 than $1000. Seems like a win for EDT. 

Dicing With Death is designed to give CDT problems, and in my opinion it does this very effectively. In Dicing With Death, you have to choose between going to Aleppo or Damascus, and you know that whichever you choose, death will have predicted your choice and be waiting for you (a very bad outcome for you). Luckily, a merchant offers you a magical coin which you can toss to decide where to go, in which case death won’t be able to predict where you go, giving you a 50% chance of avoiding death. The merchant will charge a small fee for this. However CDT gets into some strange contortions and as a result recommends against paying for the magical coin, even though the outcome if you pay for the magical coin seems clearly better. EDT recommends paying for the coin, another win for EDT.

To me, The Smoking Lesion is a somewhat problematic scenario for EDT. Still, I feel like it’s possible for EDT to do fine here if you think carefully enough. 

You could make the following simple model for what happens in The Smoking Lesion: in year 1, no-one knows why some people get cancer and some don’t. In year 2, it’s discovered that everyone who smokes develops cancer, and furthermore there’s a common cause (a lesion) that causes both of these things. Everyone smokes iff they have the lesion, and everyone gets cancer iff they have the lesion. In year 3, following the publication of these results, some people who have the lesion try not to smoke. Either (i) none of them can avoid smoking because the power of the lesion is too strong; or (ii) some of them do avoid smoking, but (since they still have the lesion) they still develop cancer. In case (i), the findings from year 2 remain valid even after everyone knows about them. In case (ii), the findings from year 2 are no longer valid: they just tell you about how the world would have been if the correlation between smoking and cancer wasn’t known.

The cases where you use the knowledge about the year 2 finding to decide not to smoke are exactly the cases where the year 2 finding doesn’t apply. So there’s no point in using the knowledge about the year 2 finding to not smoke: either your not smoking (through extreme self-control etc) is pointless because you still have the lesion and this is a case where the year 2 finding doesn’t apply, or it’s pointless because you don’t have the lesion.

So it seems to me like the right answer is to smoke if you want to, and I think EDT can recommend this by incorporating the fact that if you choose not to smoke purely because of the year 2 finding, this doesn’t give you any evidence about whether you have the lesion (though this is pretty vague and I wouldn’t be that surprised if making it more precise made me realise it doesn’t work).

In general it seems like these issues arise from treating the agent’s decision making process as being removed from the physical world - a very useful abstraction which causes issues in weird edge cases like the ones considered above.

Are you familiar with MIRI's work on this? One recent iteration is Functional Decision Theory, though it is unclear to me if they made more recent progress since then. 

It took me a long time to come around to it, but I currently buy that FDT is superior to CDT in the twin prisoner's dilemma case, while not falling to evidential blackmail (the way EDT does), as well as being notably superior overall in the stylized situation of "how should an agent relate to a world where other smarter agents can potentially read the agent's source code"

Thanks that's interesting, I've heard of it but I haven't looked into it.

Some initial thoughts on "Are We Living At The Hinge Of History"?

In the below I give a very rough summary of Will MacAskill’s article Are We Living At The Hinge Of History? and give some very preliminary thoughts on the article and some of the questions it raises.

I definitely don’t think that what I’m writing here is particularly original or insightful: I’ve thought about this for no more than a few days, any points I make are probably repeating points other people have already made somewhere, and/or are misguided, etc. This seems like an incredibly deep topic which I feel like I’ve barely scratched the surface of. Also, this is not a focussed piece of writing trying to make a particular point, it’s just a collection of thoughts on a certain topic.

(If you want to just see what I think, skip down to "Some thoughts on the issues discussed in the article")

A summary of the article

(note that the article is an updated version of the original EA Forum post Are we living at the most influential time in history?)

Definition for the Hinge of History (HH)

The Hinge of History claim (HH): we are among the most influential people ever (past or future). Influentialness is, roughly, how much good a particular person at a particular time can do through direct expenditure of resources (rather than investment)

Two prominent longtermist EA views imply HH

Two worldviews prominent in longtermist EA imply that HH is true:

  • Time of Perils view: we live at a time of unusually high extinction risk, and we can do an unusual amount to reduce this risk
  • Value Lock-In view: we’ll soon invent a technology that allows present-day agents to assert their values indefinitely into the future (in the Bostrom-Yudkowsky version of this view, the technology is AI)

Arguments against HH

The base rates argument

Claim: our prior should be that we’re as likely as anyone else, past or present, to be the most influential person ever (Bostrom’s Self-Sampling Assumption (SSA)). Under this prior, it’s astronomically unlikely that any particular person is the most influential person ever.

Then the question is how much should we update from this prior

  • The standard of evidence (Bayes factor) required to favour HH is incredibly high. E.g. we need a Bayes factor of ~107 to move from a 1 in 100 million credence to a 1 in 10 credence. For comparison, a p=0.05 result from a randomised controlled trial gives a Bayes factor of 3 under certain reasonable assumptions.
  • The arguments for Time of Perils or Value Lock-In might be somewhat convincing; but hard to see how they could be convincing enough
  • E.g. our track record of understanding the importance of historical events is very poor
  • When considering how much to update from the prior, we should be aware that there are biases that will tend to make us think HH is more likely than it really is

Counterargument 1: we only need to be at an enormously influential time, not the most influential, and the implications are ~the same either way

  • Counter 1 to counterargument 1: the Bostrom-Yudkowsky view says we’re at the most influential time ever, so you should reject the Bostrom-Yudkowsky view if you’re abandoning the idea that we’re at the most influential time ever. So there is a material difference between “enormously influential time” and “most influential time”.
  • Counter 2 to counterargument 1: if we’re not at the most influential time, presumably we should transfer our resources forward to the most influential time, so the difference between “enormously influential time” and “most influential time” is highly action-relevant.

Counterargument 2: the action-relevant thing is the influentialness of now compared to any time we can pass resources on to

  • Again the Bostrom-Yudkowsky view is in conflict with this
  • But MacAskill concedes that it does seem right that this is the action-relevant thing. So e.g. we could assume we can only transfer resources 1000 years into the future and define Restricted-HH: we are among the most influential people out of the people who will live over the next 1000 years

The inductive argument

  • Claim: The influentialness of comparable people has been increasing over time, and we should expect this to continue, so the influentialness of future people who we can pass resources onto will be greater
  • Evidence: if we consider the state of knowledge and ethics in 1600 vs today, or in 1920/1970 vs today, it seems clear that we have more knowledge and better ethics now than we did in 1600 or in 1920/1970
  • And seems clear that there are huge gaps in our knowledge today (so doesn’t seem that we should expect this trend to break)

Arguments for HH

Single planet

Argument 1: we’re living on a single planet, implying greater influentialness

  • Implies particular vulnerabilities e.g. asteroid strikes
  • Implies individual people have an unusually large fraction of total resources
  • Implies instant global communication


  • Asteroids are not a big risk
  • For other prominent risks like AI or totalitarianism, being on multiple planets doesn’t seem to help
  • We might well have quite a long future period on earth (1000s or 10,000s of years), which makes being on earth now less special
    • And in the early stages of space settlement the picture isn’t necessarily that relevantly different to the single planet one

Rapid growth

Argument 2: we’re now in a period of unusually fast economic and tech progress, implying greater influentialness. We can’t maintain the present-day growth rate indefinitely.

MacAskill seems sympathetic to the argument, but says it implies not that today is the most important time, but that the most important time is some time might be in the next few thousand years

  • Also, maybe longtermist altruists are less influential during periods of fast economic growth because rapid change makes it harder to plan reliably
  • And comparing economic power across long timescales is difficult

Other arguments

A few other arguments for HH are briefly touched on in a footnote: that existential risk / value lock-in lowers the number of future people in the reference class for the influentialness prior; that we might choose other priors that are more favourable for HH, and that earlier people can causally affect more future people

Some quick meta-level thoughts on the article

  • I wish it had a detailed discussion about choosing a prior for influentialness, which I think is really important.
  • There’s a comment that the article ignores the fact that the annual risk of extinction or lock-in in the future has implications for present-day influentialness because in Trammell’s model this is incorporated into the pure rate of time preference. I find that pretty weird. Trammell’s model is barely referenced elsewhere in the paper so I don’t really see why we should neglect to discuss something just because it happens to be interpreted in a certain way within his model. Maybe I missed the point here.
  • I think it’s a shame that MacAskill doesn’t really give numbers in the article for his prior and posterior, either on HH or restricted-HH (this EA Forum comment thread by Greg Lewis is relevant).

Some thoughts on the issues discussed in the article

Two main points from the article

It kind of feels like there are two somewhat independent things that are most interesting from the article:

  • 1. The claim: we should reject the Time of Perils view, and the Bostrom-Yudkowsky view, because in both cases the implication for our current influentialness is implausible
  • 2. The question: what do high level / relatively abstract arguments tell us about whether we can do the most good by expending resources now or by passing resources on to future generations?

Avoiding rejecting the Time of Perils and Bostrom-Yudkowsky views

I think there are a few ways we can go to avoid rejecting the Time of Perils and Bostrom-Yudkowsky views

  • We can find the evidence in favour of them strong enough to overwhelm the SSA prior through conventional Bayesian updating
  • We can find the evidence in favour of them weaker than in the previous case, but still strong enough that we end up giving them significant credence in the face of the SSA prior, through some more forgiving method than Bayesian updating
  • We can use a different prior, or claim that we should be uncertain between different priors
  • Or we can just turn the argument (back?) around, and say that the SSA prior is implausible because it implies such a low probability for the Time of Perils and Bostrom-Yudkowsky views. Toby Ord seems to say something like this in the comments to the EA Forum post (see point 3).

A nearby alternative is to modify the Time of Perils and Bostrom-Yudkowsky views a bit so that they don’t imply we’re among the most influential people ever. E.g. for the Bostrom-Yudkowsky view we could make the value lock-in a bit “softer” by saying that for some reason, not necessarily known/stated, the lock-in would probably end after some moderate (on cosmological scales) length of time. I’d guess that many people might find a modified view more plausible even independently of the influentialness implications.

I’m not really sure what I think here, but I feel pretty sympathetic to the idea that we should be uncertain about the prior and that this maybe lends itself to having not too strong a prior against the Time of Perils and Bostrom-Yudkowsky views.

On the question of whether to expend resources now or later

The arguments MacAskill discusses suggest that the relevant time frame is the next few thousand years (because the next few thousand years seem (in expectation) especially high influentialness and because it might be effectively impossible to pass our resources further into the future).

It seems like the pivotal importance of priors on influentialness (or similar) then evaporates: it no longer seems that implausible on the SSA prior that now is a good time to expend resources rather than save. E.g. say there’ll be a 20 year period in the next 1000 years where we want to expend philanthropic resources rather than save them to pass on to future generations. Then a reasonable prior might be that we have a 20/1000 = 1 in 50 chance of being in that period. That’s a useful reference point and is enough to make us skeptical about arguments that we are in such a period, but it doesn’t seem overwhelming. In fact, we’d probably want to spend at least some resources now even purely based on this prior.

In particular, it seems like some kind of detailed analysis is needed, maybe along the lines of Trammell’s model or at least using that model as a starting point. I think many of the arguments in MacAskill’s article should be part of that detailed analysis, but, to stress the point, they don’t seem decisive to me.

This comment by Carl Shulman on the related EA Forum post and its replies has some stuff on this.

The importance of the idea of moral progress

In the article, the Inductive Argument is supported by the idea of moral progress: MacAskill cites the apparent progress in our moral values over the past 400 years as evidence for the idea that we should expect future generations to have better moral values than we do. Obviously, whether we should expect moral progress in the future is a really complex question, but I’m at least sympathetic to the idea that there isn’t really moral progress, just moral fashions (so societies closer in time to ours seem to have better moral values just because they tend to think more like us).

Of course, if we don’t expect moral progress, maybe it’s not so surprising that we have very high influentialness: if past and future actors don’t share our values, it seems very plausible on the face of it that we’re better off expending our resources now than passing them off to future generations in the hope they’ll carry out our wishes. So maybe MacAskill’s argument about influentialness should update us away from the idea of moral progress?

But if we’re steadfast in our belief in moral progress, maybe it’s not so surprising that we have high influentialness because we find ourselves in a world where we are among the very few with a longtermist worldview, which won’t be the case in the future as longtermism becomes a more popular view. (I think Carl Shulman might say something like this in the comments to the original EA Forum post)

My overall take

  • I think “how plausible is this stuff under an SSA prior” is a useful perspective
  • Still, thinking about this hasn’t caused me to completely dismiss the Time of Perils View or the Bostrom-Yudkowsky view (I probably already had some kind of strong implausibility prior on those views).
  • The arguments in the article are useful for thinking about how much (e.g.) the EA longtermist community should be spending rather than saving now, but a much more detailed analysis seems necessary to come to a firm view on this.

A quote to finish

I like the way the article ends, providing some motivation for the Inductive Argument in a way I find appealing on a gut level:

Just as our powers to grow crops, to transmit information, to discover the laws of nature, and to explore the cosmos have all increased over time, so will our power to make the world better — our influentialness. And given how much there is still to understand, we should believe, and hope, that our descendents look back at us as we look back at those in the medieval era, marvelling at how we could have got it all so wrong.

"Effective Altruism out of self interest"

I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public.

I recently finished listening to Kevin Simler and Robin Hanson’s excellent The Elephant in the Brain. Although I’d probably been exposed to the main ideas before, it got me thinking more about people’s hidden motivations for doing things.

In particular, I’ve been thinking a bit about the motives (hidden or otherwise) for being an Effective Altruist.

It would probably feel really great to save someone’s life by rescuing them from a burning building, or to rescue a drowning child as in Peter Singer’s famous drowning child argument, and so you might think that the feeling of saving a life is reward enough. I do think it would feel really great to pull someone from a burning building or to save a drowning child - but does it feel as great to save a life by giving $4500 to AMF? Not to me.

It’s not too hard to explain why saving someone from a burning building would feel better - you get to experience the gratitude from the person, their loved ones and their friends, for example. Simler and Hanson give an additional reason, or maybe the underlying reason, which I find quite compelling: when you perform a charitable act, you experience a benefit by showing others that you’re the kind of person who will look out for them, making people think that you’d make a good ally (friend, romantic partner, and so on). To be clear, this is a hidden, subconscious motive - according to the theory, you will not be consciously aware that you have this motive.

What explains Effective Altruism, then? Firstly I should say that I don’t think Simler and Hanson would necessarily argue that “true altruism” doesn’t exist - I think they’d say that people are complicated, and you can rarely use a single motive (hidden or not) to explain the behaviour of a diverse group of individuals. So true altruism may well be part of the explanation, even on their view as I understand it. Still, presumably true altruism isn’t the only motive even for really committed Effective Altruists.

One thing that seems true about our selfish, hidden motives is that they only work as long as they can remain hidden. So maybe, in the case of charitable behaviour, it’s possible to alert everyone to the selfish hidden motive: “if you’re donating purely because you want to help others, why don’t you donate to the Against Malaria Foundation, and do much more good than you do currently by donating to [some famous less effective charity]?” When everyone knows that there’s a basically solid argument for only donating to effective charities if you want to benefit others, when people donate to ineffective charities it’ll transparently be due to selfish motives.

Thinking along these lines, joining the Effective Altruism movement can be seen as a way to “get in at the ground floor”: if the movement is eventually successful in changing the status quo, you will get brownie points for having been right all along, and the Effective Altruist area you’ve built a career in will get a large prestige boost when everyone agrees that it is indeed effectively altruistic.

And of course many Effective Altruists do want and expect the movement to grow. E.g. The Global Priorities Institute’s mission is (or at least was officially in 2017) to make Effective Altruist ideas mainstream within academia, and Open Philanthropy says it wants to grow the Effective Altruism community.

One fairly obvious (and hardly surprising) prediction you would make from this is that if Effective Altruism doesn’t look like it will grow further (either through community growth or through wider adoption of Effective Altruist ideas), you would expect Effective Altruists to feel significantly less motivated. 

This in turn suggests that spreading Effective Altruist ideas might be important purely for maintaining motivation for people already part of the Effective Altruist community. This sounds pretty obvious, but I don’t really hear people talking about it. 

Maybe this is a neglected source of interventions. This would make sense given the nature of the hidden motives Simler and Hanson describe - a key feature of these hidden motives is that we don’t like to admit that we have them, which is hard to avoid if we want to use them to justify interventions.

In any case, I don’t think that the existence of this motive for being part of the Effective Altruism movement is a particularly bad thing. We are all human, after all. If Effective Altruist ideas are eventually adopted as common sense partly thanks to the Effective Altruism movement, that seems like a pretty big win to me, regardless of what might have motivated individuals within the movement.

It would also strike me as a pretty Pinker-esque story of quasi-inevitable progress: the claim is that these (true) Effective Altruist beliefs will propagate through society because people like being proved right. Maybe I’m naive, but in this particular case it seems plausible to me.

Thinking along these lines, joining the Effective Altruism movement can be seen as a way to “get in at the ground floor”: if the movement is eventually successful in changing the status quo, you will get brownie points for having been right all along, and the Effective Altruist area you’ve built a career in will get a large prestige boost when everyone agrees that it is indeed effectively altruistic.

Joining EA seems like a very suboptimal way to get brownie points from society at large and even from groups which EA represents the best (students/graduates of elite colleges). Isn't getting into social justice a better investment? What are the subgroups you think EAs try hard to impress?

I guess I'm saying that getting into social justice is more like "instant gratification", and joining EA is more like "playing the long game" / "taking relative pain now for a huge payoff later".

Also / alternatively, maybe getting into social justice is impressing one group of people but making another group of people massively dislike you (and making a lot of people shrug their shoulders), whereas when the correctness of EA is known to all, having got in early will lead to brownie points from everyone.

So maybe the subgroup is "most people at some future time" or something?

(hopefully it's clear, but I'm ~trying to argue from the point of view of the post; I think this is fun to think about but I'm not sure how much I really believe it)

When everyone knows that there’s a basically solid argument for only donating to effective charities if you want to benefit others, when people donate to ineffective charities it’ll transparently be due to selfish motives.

I'm not sure that's necessarily true. People may have motives for donating to ineffective charities that are better characterised as moral but not welfare-maximising (special obligations, expressing a virtue, etc).

Also, if everyone knows that there's a solid argument for only donating to effective charities, then it seems that one would suffer reputationally for donating to ineffective charities. That may, in a sense, rather provide people with a selfish motive to donate to effective charities, meaning that we might expect donations to ineffective charities to be due to other motives.

I also wanted to share a comment on this from Max Daniel (also from last Autumn) that I found very interesting.

But many EAs already have lots of close personal relationships with other EAs, and so they can already get social status by acting in ways approved by those peers. I'm not sure it helps if the number of distant strangers also liking these ideas grow.

I actually think that, if anything, 'hidden motives' on balance cause EAs to _under_value growth: It mostly won't feel that valuable because it has little effect on your day-to-day life, and it even threatens your status by recruiting competitors.

This is particularly true for proposed growth trajectories that would chance the social dynamics of the movement. Most EAs enjoy abstract, intellectual discussions with other people who are smart and are politically liberal, so any proposal that would dilute the 'quality' of the movement or recruit a lot of conservatives is harmful for the enjoyment most current EAs derive from community interactions. (There may also be impartial reasons against such growth trajectories of course.)

My reaction to this:

  • Actually I think what distant strangers think can matter a lot to someone, if it corresponds to what they do being highly prestigious. The person experiences that directly through friends/family/random people they meet being impressed (etc).
    • I guess it's true that, if most of your friends/people you interact with already think EA is great, the effect is at least a bit weaker (maybe much weaker).
  • I like the point about "diluting the 'quality' of the movement" as being something that potentially biases people against movement growth, it wouldn't have occurred to me.
    • This still seems like a weaker effect to me than the one I described, but I guess this at least depends on how deeply embedded in EA the person we're thinking about is. And of course being deeply embedded in EA correlates strongly with being in a position to influence movement growth.

I recently spent some time trying to work out what I think about AI timelines. I definitely don’t have any particular insight here; I just thought it was a useful exercise for me to go through for various reasons (and I did find it very useful!).

As it came out, I "estimated" a ~5% chance of TAI by 2030 and a ~20% chance of TAI by 2050 (the probabilities for AGI are slightly higher). As you’d expect me to say, these numbers are highly non-robust.

When I showed them the below plots a couple of people commented that they were surprised that my AGI probabilities are higher than my TAI ones, and I now think I didn’t think about non-AGI routes to TAI enough when I did this. I’d now probably increase the TAI probabilities a bit and lower the AGI ones a bit compared to what I’m showing here (by “a bit” I mean ~maybe a few percentage points).

I generated these numbers by forming an inside view, an outside view, and making some heuristic adjustments. The inside and outside views are ~weighted averages of various forecasts. My timelines are especially sensitive to how I chose and weighted forecasts for my outside view.

Here are my timelines in graphical form:

And here they are again alongside some other timelines people have made public:

If you want more detail, there’s a lot more in this google doc. I’ll probably write another shortform post with some more thoughts / reflections on the process later.

Some initial thoughts on moral realism vs anti-realism

I wrote this last Summer as a private “blog post” just for me. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. These thoughts come from my very naive point of view (as it was in the Summer of 2020; not to suggest my present day point of view is much less naive). In particular if you’ve already read lots of moral philosophy you probably won’t learn anything from reading this. Also, I hope my summaries of other people’s arguments aren’t too inaccurate.

Recently, I’ve been trying to think seriously about what it means to do good. A key part of Effective Altruism is asking ourselves how we can do the most good. Often, considering this question seems to be mostly an empirical task: how many lives will be saved through intervention A, and how many through intervention B? Aside from the empirical questions, though, there are also theoretical ones. One key consideration is what we mean by doing good.

Moral Philosophy

There is a branch of philosophy called moral philosophy which is (partly) concerned with answering this question. 

It’s important to me that I don’t get too drawn into the particular framings that have evolved within the academic discipline of moral philosophy, which are, presumably, partly due to cultural or historical forces, etc. This is because I really want to try to come up with my own view, and I think that (for me) the best process for this involves not taking other people’s views or existing works too seriously, especially while I try to think about these things seriously for the first time. 

Still, it seems useful to get familiar with the major insights and general way of thinking within moral philosophy, because

  • I’ll surely learn a lot of useful stuff
  • I’ll be able to communicate with other people who are familiar with moral philosophy (which probably includes most of the most interesting people to talk to on this topic).

Moral realism

I’ve read a couple of Stanford Encyclopedia of Philosophy articles, and a series of posts by Lukas Gloor arguing for moral anti-realism.

I found the Stanford Encyclopedia of Philosophy articles fairly tough going but still kind of useful. I thought the Gloor posts were great.

The Gloor posts have kind of convinced me to take the moral anti-realist side, which, roughly, denies the existence of definitive moral truths.

While I suppose I might consider my “inside view” to be moral anti-realist at the moment, I can easily see this changing in the future. For example, I imagine that if I read a well-argued case for moral realism, I might well change my mind. 

In fact, prior to reading Gloor’s posts, I probably considered myself to be a moral realist. I think I’d heard arguments, maybe from Will MacAskill, along the lines that i) if moral anti-realism is true, then nothing matters, whereas if realism is true, you should do what the true theory requires you to do, and ii) there’s some chance that realism is true, therefore iii) you should do what the true theory requires you to do.

Gloor discusses an argument like this in one of his posts. He calls belief in moral realism founded on this sort of argument “metaethical fanaticism” (if I’ve understood him correctly).

I’m not sure that I completely understood everything in Gloor’s posts. But the “fanaticism” label does feel appropriate to me. It feels like there’s a close analogy with the kinds of fanaticism that utilitarianism is susceptible to, for example. An example of that might be a Pascal’s wager type argument - if there’s a finite probability that I’ll get infinite utility derived from an eternal life in a Christian heaven, I should do what I can to maximise that probability. 

It feels like something has gone wrong here (although admittedly it’s not clear what), and this Pascal’s wager argument doesn’t feel at all like a strong argument for acting as if there’s a Christian heaven. Likewise, the “moral realist wager” doesn’t feel like a strong argument for acting as if moral realism is true, in my current view.

Moral anti-realism

Gloor also argues that we don’t lose anything worth having by being moral anti-realists, at least if you’re his brand of moral anti-realist. I think he calls the view he favours “pluralistic moral reductionism”. 

On his view, you take any moral view (or maybe combination of views) you like. These can (and maybe for some people, “should”) be grounded in our moral intuitions, and maybe use notions of simplicity of structure etc, just as a moral realist might ground their guess(?) at the true moral theory in similar principles. Your moral view is then your own “personal philosophy”, which you choose to live by.

One unfortunate consequence of this view is that you don’t really have any grounds to argue with someone else who happens to have a different view. Their view is only “wrong” in the sense that it doesn’t agree with yours; there’s no objective truth here.

From this perspective, it would arguably be nicer if everyone believed that there was a true moral view that we should strive to follow (even if we don’t know what it is). Especially if you also believe that we could make progress towards that true moral view.

I’m not sure how big this effect is, but it feels like more than nothing. So maybe I don’t quite agree that we don’t lose anything worth having by being moral anti-realists.

In any case, the fact that we might wish that moral realism is true doesn’t (presumably) have any bearing on whether or not it is true.


I already mentioned that reading Gloor’s posts has caused me to favour moral anti-realism. Another effect, I think, is that I am more agnostic about the correct moral theory. Some form of utilitarianism, or at least consequentialism, seems far more plausible to me as the moral realist “one true theory” than a deontological theory or virtue ethics theory. Whereas if moral anti-realism is correct, I might be more open to non-consequentialist theories. (I’m not sure whether this new belief would stand up to a decent period of reflection, though - maybe I’d be just as much of a convinced moral anti-realist consequentialist after some reflection).

A moral philosophy free-for-all

I wrote this last Summer as a private “blog post” just for me. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. These rambling thoughts come from my very naive point of view (as it was in the Summer of 2020; not to suggest my present day point of view is much less naive). In particular if you’ve already read lots of moral philosophy you probably won’t learn anything from reading this.

The free-for-all

Generally, reading various moral philosophy writings has probably made me (even) more comfortable trusting my own intuitions / reasoning regarding what “morality” is and what the “correct moral theory” is.

I think that, when you start engaging with moral philosophy, there’s a bit of a feeling that when you’re trying to reason about things like what’s right and wrong, which moral theory is superior to the other, etc, there are some concrete rules you need to follow, and (relatedly) certain words or phrases have a solid, technical definition that everyone with sufficient knowledge knows and agrees on. The “certain words or phrases” I have in mind here are things like “morally right”, “blameworthy”, “ought”, “value”, “acted wrongly”, etc.

To me right now, the situation seems a bit more like the following: moral philosophers (including knowledgeable amateurs, etc) have in mind definitions for certain words, but these definitions may be more or less precise, might change over time, and differ from person to person. And in making a “moral philosophy” argument (say, writing down an argument for a certain moral theory), the philosopher can use flexibility of interpretation as a tool to make their argument appear more forceful than it really is. Or, the philosopher’s argument might imply that certain things are self-evidently true, and the reader might be (maybe unconsciously) fooled into thinking that this is the case, when in fact it isn’t.

It seems to me now that genuinely self-evident truths are in very short supply in moral philosophy. And, now that I think this is the case, I feel like I have much more licence to make up my own mind about things. That feels quite liberating.

But it does also feel potentially dangerous. Of course, I don’t think it’s dangerous that *I* have freedom to decide what “doing good” means to me. But I might find it dangerous that others have that freedom. People can consider committing genocide to be “doing what is right” and it would be nice to have a stronger argument against this than “this conflicts with my personal definition of what good is”. And, of course, others might well think it’s dangerous that I have the freedom to decide what doing good means.

What does morality even mean?

Now that we’re in this free-for-all, even defining morality seems problematic.

I suppose I can make some observations about myself, like

  • When I see injustice in the world, I feel a strong urge to do something about it
  • When I see others suffering, I want to relieve that suffering
  • I have a strong intuition that it’s conscious experience that ultimately matters - “what you don’t know can’t hurt you” is, I think, literally true
    • And some conscious experiences are clearly very bad (and some are clearly very good)
  • And so on

I guess these things are all in the region of “wanting to improve the lives of others”. This sounds a lot like wanting to do what is morally good / morally praiseworthy, and seems at least closely related to morality.

In some ways, whether I label some of my goals and beliefs as being to do with “morality” doesn’t matter - either way, it seems clear that the academic field of moral philosophy is pretty relevant. And presumably when people talk about morality outside of an academic context, they’re at least sometimes talking about roughly the thing I’m thinking of.

Here are some thoughts after reading a book called "The Inner Game of Tennis" by Timothy Gallwey. I think it's quite a famous book and maybe a lot of people know it well already. I consider it to be mainly about how to prevent your system 2/conscious mind/analytical mind from interfering with the performance of your system 1/subconscious mind/intuitive mind. This is explained in the context of tennis, but it seems applicable to many other contexts, as the author himself argues. If that sounds interesting, I recommend checking the book out, it's short and quite readable.

My interest in the book comes mainly from thinking about the best way to go about doing research, at a day-to-day level. Although the arguments of the book seem most directly applicable to learning a physical skill/activity and (to some extent) to performing well at key moments, I still think there are lessons for mental activities performed routinely, i.e. for activities like research.

I think reading the book has generally pushed me a bit more in favour of "trusting my system 1/intuitive mind" while doing research, e.g. trusting that my brain is doing some important processing when I feel inclined to just stare into space and not make any apparent progress to whatever it is I'm trying to achieve at that moment. This feels pretty important.

I think Owen Cotton Barratt says some interesting things about trusting his intuition for prioritisation in this interview with Lynette Bye, which feels kind of related.

The book predates by many decades  Kahneman's Thinking Fast and Slow, which (I think) popularised the concept of system 1 mind and system 2 mind. The book instead refers to "self 1" and "self 2" which seem to have roughly similar meanings, although unfortunately reversed: Gallwey's self 1 and Kahneman's system 2 refer to the conscious/analytical mind, while Gallwey's self 2 and Kahneman's system 1 refer to the subconscious/intuitive mind.

Here are some disorganised notes on bits that seemed worth highlighting (page numbers refer to 2015 edition published by Pan Books):

  • p13 mastering the mental side of tennis:
    • picture desired outcomes as clearly as possible
    • allow self 2 to perform and learn from successes and failures
    • learn to see non-judgementally: see what is happening rather than (just) seeing how well or badly it's happening
    • all subsidiary to the master skill: relaxed concentration
  • p38 "Remember that you are not your tennis game. You are not your body. Trust the body to learn and to play, as you would trust another person to do a job, and in a short time it will perform beyond your expectations. Let the flower grow."
  • p41 communicating with self 2
    • Gallwey exhorts the reader to trust their self 2 (system 1 / intuitive mind). But how can we be sure that self 2 will be optimising for the thing "we" (self 1) thinks is important? Gallwey gives 3 ways to convey to self 2 what the goal is, in the context of tennis:
      • Asking for results: visualise the exact path of the ball. Hold that image in your mind for several seconds
      • Asking for form: observe some particular aspect of your form (e.g. the flatness of your racket while it moves through the ball). Don't make an effort to make the change. Just visualise the change you want
      • Asking for qualities: imagine you are playing the role of a top tennis player on the court for a film
        • There are particular benefits of playing the role of someone very different to you
    • I'm not sure how to turn this into policies for doing research well. Things that seem interesting to explore:  visualising the output you want at the start of the day; reflecting each day on how what you did links to your ultimate goals; picturing yourself as playing the role of a researcher you admire.
  • p80 on the "ego satisfaction" from a self-1-controlled success
    • Gallwey talks a lot about the ego satisfaction from self-1-controlled success. 
    • In the context of research, this doesn't seem to ring true for me from my experience (maybe it's obviously true for tennis or similar activities for people who have experience there, I don't know).
  • p82 "Fighting the mind does not work. What works best is learning to focus it"
    • Gallwey talks about focussing on the seams of the ball and other techniques to focus the (self 1) mind on something kind of irrelevant while playing tennis so that the body and self 2 can perform without interference.
  • p87 on what focus is: "Focus is not achieved by staring hard at something. It is not trying to force focus, nor does it mean thinking hard about something. Natural focus occurs when the mind is interested. When this occurs, the mind is drawn irresistibly toward the object (or subject) of interest. It is effortless and relaxed, not tense and overly controlled."
    • Re research, this seems like good advice for tackling a difficult problem or making progress on some task. One related thing is that I find it much easier to "effortlessly focus" on what I think is important if I'm free of distractions.
  • p127 On managing stress. Pressures come at us from all corners: demands from partners, bosses, coaches, society, etc. These external demands can end up being internalised by self 1 and feeling as if they're things you really want, but this is an illusion.
    • (kind of reminds me of the message from another book I liked called Essentialism)
Curated and popular this week
Relevant opportunities