Hide table of contents

Here's my rejected FTX proposal (with Abel Brodeur) to solve the replication crisis by hiring full-time replicators. (I left out the budget details.) [Added some edits in brackets for more context.]

Please describe your project in under 100 words.

We will actually solve the replication crisis in social science by hiring a “red team” of quantitative researchers to systematically replicate new research. Currently, there are few penalties for academics and journals that publish unreliable research, because few replications are attempted. We will fundamentally change academic incentives by making researchers know that their work will be scrutinized, which will motivate them to improve research design, or else face a loss of reputation. By fixing scientific institutions now, we can reap the compounding benefits of reliable knowledge over the long-term future.

If the project has a website, what’s the URL?

https://i4replication.org/

Please describe what you are doing very concretely—not just goals and long-term vision, but specifically what you are doing in the next few months.

Currently, the Institute for Replication is using volunteers to systematically reproduce and replicate new studies from leading journals in economics and political science. With funding from FTX, we can hire a Project Scientist (Michael Wiebe), post-docs, and research assistants to massively scale-up reproductions and replications. [Definitions: "reproduce" = being able to run the code and obtain the results that are in the paper; "replicate" = re-analyzing the paper using different methods and/or different data. Note that most social science papers use observational data and are not lab experiments.]

We can also launch a cash prize for completed replications, to incentivize even more replications. This can be implemented in several ways; for example, giving a prize of ~$1000 for high quality replications completed using the Social Science Reproduction Platform, as judged by a panel of experts. [We're open to revising this number, eg. to $5k; most grad students replicate papers for coursework, so it might not take much to incentivize them to submit.]

What’s the case for your project?

Social science is facing a replication crisis. Researchers produce unreliable findings that often do not replicate, and the root problem is the lack of replications. 

Academics have basically no incentive to perform replications, since they usually do not yield original findings, and are not valued by journals. Since they do not lead to publications, replications do not help academics get tenure, and hence few are attempted. The replications that are done are conducted by volunteers in their spare time, and can even have negative career effects if they upset powerful academics. 

The rareness of replications causes peer review to be an inadequate form of quality control. Knowing that research won’t be closely scrutinized, journals and referees have little incentive to check for data quality issues, coding errors, or robustness. If a paper with unreliable findings gets published, the journal suffers no loss in reputation, because no one will replicate the paper to expose its flaws. Hence, referees take empirical results at face value, and focus instead on framing the research question and appropriately citing the literature. 

Knowing that their work will not be reproduced nor replicated, most researchers don’t invest time in preparing replication packages, and don’t check for data or coding errors. The result is entire fields with serious reproducibility problems. 

We can fix these incentives by investing heavily in reproduction and replication, and making a big push to systematically replicate new research. With a team of full-time replicators and cash prizes for completed replications, researchers will now expect their work to be immediately scrutinized as a regular practice. This scrutiny will put researchers’ reputations on the line: if their findings are not robust, their work will not be cited (or worse, be retracted), ultimately affecting their promotion and tenure outcomes. At the same time, high-quality work will be rewarded. A big push will attract widespread attention to amplify these reputation effects. Hence, researchers will put more effort into better research design and fixing errors before submitting for publication. To avoid a reputation for publishing unreliable findings, journals will improve their peer review standards. The end result is a scientific literature containing reliable knowledge, to help guide our species through the long-term future.

How long have you been working on this project, and how much has been spent on it?

The Institute for Replication was launched in January, with no funding raised yet.

What has been achieved so far?

We are collaborating with a large team of researchers to reproduce and replicate studies in economics and political science. We have already reproduced over 200 studies, and are currently working with about 50 independent researchers to replicate 30 studies. (See here for precise definitions of ‘reproduce’ and ‘replicate’.) We have built a large network of researchers interested in reproductions and replications. Our collaborators include journal editors, data editors, and reproducibility analysts at selected journals. We have already put together many special journal issues dedicated to replications. We have also conducted a survey of editors of leading outlets in economics, finance, and political science to help replicators identify journals interested in publishing replications.

Do you have any reservations about your project? Is there any way it could cause major harm? If so, what are you going to do to prevent that?

We expect failure to look like a null effect: no one pays attention. We would publish negative replications, but researchers and editors would not change their practices, and departments would not change their tenure and promotion decisions. One possible harm is executing the project badly and giving replication a bad reputation. We can prevent this by giving prizes only to top quality replications, and requiring transparency by allowing the original authors to publicly respond to replications of their work. I4R already has a conflict of interest policy
Another possible harm is negative replications being taken as evidence of cheating by researchers, as opposed to honest mistakes; this could lead to a backlash against replication. We can prevent this by encouraging a culture of charitability, with replicators giving authors the benefit of the doubt when discussing problems and errors.

What will it look like if your project has gone poorly / just OK / well at that time?

Poorly: we are unable to hire some of the post-docs; replications are low quality; less than 100 replications completed. (We are currently at 30 ongoing replications accepted since mid-January, so we naively expect 100 per year.) 

Just OK: we successfully hire 5 post-docs and RAs; replications are high quality, adding important robustness checks (with positive or negative results); 250 replications completed; some media coverage; some replications and retractions published in original journals. 

Well: we successfully hire 10+ postdocs and RAs; high quality replications; 500 replications completed; widespread media coverage; replications published in original journals; negative replications lead to retractions from journals; journals implement new peer review standards; departments account for replications/retractions in tenure/promotion decisions.

 

Comments19


Sorted by Click to highlight new comments since:

Project looks really cool. I appreciate you sharing this. I hope this project continues to grow.

I really want to know what FTX ended up funding since the rejected grants I know of looked really promising to me.

What was the approximate budget? When I read this my first thought was 'did they ask for a super ton of money and get rejected on that basis'?

Around a million.

I am really happy to see someone doing something about the replication crisis. Sorry that you didn't get funded. I know very little about FTX or grantmaking in general and so I can't comment on the nature of your proposal or how to make it better. But now that I see someone doing something about the replication crisis I have done an update on the Tractability of this cause area and I am excited to learn more!

This excitement lead to some small actions from my end:

  1. I visited the Institute for Replication website and found it to be very helpful. I really appreciate the effort that went into making the Teaching tab on the website. I will try to make time in the near future (within a month or so) to go through the resources carefully.
  2. I subscribed to the BITSS YouTube Channel and skimmed through a couple of chapters of the open source textbook, Reproducible Data Science.
  3. I looked for material on the replication crisis elsewhere on this forum. I found this panel discussion from EA Global 2016 and... thats about it! Since, IMO, not enough EA material is there on this cause area, I put down a comment in the What posts do you want someone to write? in the hopes that someone wading through it for ideas will decide to write more about it.

One thing still unclear to me - are there career opportunities here or just volunteer opportunities? In the proposal, you mentioned "reproducibility analysts at selected journals" - I had no idea that was a thing that people did! But it sounds like a very interesting role to me considering the Scale of the problem. How many people do it and is there a high demand for it? What sort of degree does someone need to do it?

All the best with the project! I sincerely hope someone else will fund it and it will be successful.

At this point, 'reproducibility analyst' = undergrad RAs; see this talk by AEA data editor Lars Vilhuber.

Otherwise, the replications are currently done by academics volunteering in their spare time, which is why it would help to have full-time paid replicators.

Looks like a great idea, very glad someone is pursuing the roll-up-your-sleeves method here.

I think the best addition to this that you could make is a business plan—basically, how much would it cost to replicate how many studies, how would you best choose studies for replication to maximize efficiency / impact, how much / how long until you were replicating 1 or 10% of top studies, etc. I'd also personally like to see a different version of "what has been achieved" that didn't lean as much on collaborations / work of collaborators, as I find these basically meaningless.

The budget section (omitted here) has more of these details.

Re: selection, the idea is systematically replicating all new research in top journals, to change researchers' expectations from (a) expecting to have basically no one scrutinize their work to (b) expecting at least some post-publication  review. This incentivizes researchers to improve the quality of their work.

Re: collaborators, I4R currently works by asking academics to volunteer to replicate papers.

This would be cool to fund as a bet on success, e.g., to give you/your early stage funders a $10M price if you "actually solve the replication crisis in social science" (or a much lower amount if you hit your milestones but no transformative change occurs). This would allow larger funders for whom you are less legible to create incentives for others who are more familiar with your work to fund you.

This definitely seems interesting! I'm curious whether you would also be interested in seeing how other, later studies have used any findings that you cannot replicate, and thus get a sense of any "epistemic contagion" in the literature? Or would the studies you try to replicate be too new for that to make sense? (Or do you simply think that's better left to other researchers?)

It at least seems to me that if you had a good sense of "which findings/studies would involve high amounts of epistemic contagion if they do fail to replicate" then that might help with choosing which studies to focus on.

I wrote an EA Forum post describing the concept of epistemic mapping (with pictures) here, but I'll avoid going into detail on that. I just bring it up because one of the reasons that I've thought that epistemic mapping may be valuable is that it could potentially help with understanding research/epistemic contagion: i.e., how flawed datasets, regression analyses, experimental findings, or other inputs might produce inaccurate findings in the broader research literature.

I guess if you found reproducibility problems in a bunch of related papers, that would point to a common cause. In fact, I found a case like this in my dissertation: the entire literature on meritocratic promotion in China is unreliable, and is based on a highly-cited 2005 article.

Michael, I love your work (blog). Other than FTX, have you tried other avenues for funding this?

I've applied to Emergent Ventures and ACX funds for smaller scale versions of this idea (eg. my writing a replication blog), but didn't get anything. FTX inspired me to think of the maximal scale version.

Ah frustrating! I'm surprised Tyler didn't say yes, given your previous blog posts. 

Random thought - maybe it's worth applying to EAF/LTFF for replicating EA specific papers?

Yeah, I've tried to think of empirical EA-related papers that would be informative to replicate; so far it looks like air pollution might be a good topic. The problem is that many EA-relevant papers are theoretical and hence not amenable to my style of replication.

Would the Institute for Replication incorporate insights/methods from replication markets?

Possibly! Anna Dreber is on the board of both.

Thanks for sharing your proposal Michael. The institute looks great. Finding ways to incentivise replication is something I consider to be really important. 

A couple of questions. I am curious what probability you would place on the Institute significantly increasing acceptances of replications in top journals? More abstractly, I wonder if a dedicated instituted could help  change social norms in academia around replication. Do you have any thoughts about this? 

Lastly, did you receive any feedback from FTX? 

I'm not sure if top journals would publish replications. They seem to get prestige from publishing original research, but maybe if replication was higher status, they would do it. I mainly see the benefit of systematic replication in inducing researchers to improve the quality of their research, so we'd actually see fewer negative replications. (Another issue is that only negative replications are 'interesting'.)

I think changing norms is possible. A lot of journals now have a data editor who ensures 'push-button' reproducibility: the data and code are available, and you can run a script that produces all of the results in the paper. This is a big improvement over 10-15 years ago when code wasn't available, or didn't reproduce results.

I didn't get any feedback from FTX.

Using prediction markets, we could set up markets on whether a paper will be retracted or have a comment published about it (for example, this). If the price is low, replicators could profit by using insider information: by scrutinizing the paper and writing up a comment, you can make the event realize 'Yes'. 

Curated and popular this week
 ·  · 5m read
 · 
[Cross-posted from my Substack here] If you spend time with people trying to change the world, you’ll come to an interesting conundrum: Various advocacy groups reference previous successful social movements as to why their chosen strategy is the most important one. Yet, these groups often follow wildly different strategies from each other to achieve social change. So, which one of them is right? The answer is all of them and none of them. This is because many people use research and historical movements to justify their pre-existing beliefs about how social change happens. Simply, you can find a case study to fit most plausible theories of how social change happens. For example, the groups might say: * Repeated nonviolent disruption is the key to social change, citing the Freedom Riders from the civil rights Movement or Act Up! from the gay rights movement. * Technological progress is what drives improvements in the human condition if you consider the development of the contraceptive pill funded by Katharine McCormick. * Organising and base-building is how change happens, as inspired by Ella Baker, the NAACP or Cesar Chavez from the United Workers Movement. * Insider advocacy is the real secret of social movements – look no further than how influential the Leadership Conference on Civil Rights was in passing the Civil Rights Acts of 1960 & 1964. * Democratic participation is the backbone of social change – just look at how Ireland lifted a ban on abortion via a Citizen’s Assembly. * And so on… To paint this picture, we can see this in action below: Source: Just Stop Oil which focuses on…civil resistance and disruption Source: The Civic Power Fund which focuses on… local organising What do we take away from all this? In my mind, a few key things: 1. Many different approaches have worked in changing the world so we should be humble and not assume we are doing The Most Important Thing 2. The case studies we focus on are likely confirmation bias, where
 ·  · 1m read
 · 
Are you looking for a project where you could substantially improve indoor air quality, with benefits both to general health and reducing pandemic risk? I've written a bunch about air purifiers over the past few years, and its frustrating how bad commercial market is. The most glaring problem is the widespread use of HEPA filters. These are very effective filters that, unavoidably, offer significant resistance to air flow. HEPA is a great option for filtering air in single pass, such as with an outdoor air intake or a biosafety cabinet, but it's the wrong set of tradeoffs for cleaning the air that's already in the room. Air passing through a HEPA filter removes 99.97% of particles, but then it's mixed back in with the rest of the room air. If you can instead remove 99% of particles from 2% more air, or 90% from 15% more air, you're delivering more clean air. We should compare in-room purifiers on their Clean Air Delivery Rate (CADR), not whether the filters are HEPA. Next is noise. Let's say you do know that CADR is what counts, and you go looking at purifiers. You've decided you need 250 CFM, and you get something that says it can do that. Except once it's set up in the room it's too noisy and you end up running it on low, getting just 75 CFM. Everywhere I go I see purifiers that are either set too low to achieve much or are just switched off. High CADR with low noise is critical. Then consider filter replacement. There's a competitive market for standardized filters, where most HVAC systems use one of a small number of filter sizes. Air purifiers, though, just about always use their own custom filters. Some of this is the mistaken insistence on HEPA filters, but I suspect there's also a "cheap razors, expensive blades" component where manufacturers make their real money on consumables. Then there's placement. Manufacturers put the buttons on the top and send air upwards, because they're designing them to sit on the floor. But a purifier on the floor takes up
 ·  · 4m read
 · 
[Note: I (the primary author) am writing this entirely in a personal capacity. Funding for the bounty and donations mentioned in this post comes entirely from personal savings and the generosity of friends and family. Colleagues at Open Philanthropy (my employer) reviewed this post at my request, but this project is completely unaffiliated with Open Philanthropy.]   In 2023, GiveWell reported that it received over $250M from more than 30,000 donors, excluding Open Philanthropy. I expect (though haven’t confirmed) that at least $50M of this came from unmatched retail donations, meaning from individuals who don’t work at a company that offers a donation match. I can’t help but hope that there may be some way to offer these donors a source of matching funds that wouldn’t otherwise go toward charitable causes. Over the last couple of years, friends and I have spent >100 hours looking into potential legal, collaborative corporate donation matching opportunities. I think there may be promising ways to partner with corporate donors, but I haven’t found a way forward that I am comfortable with, and I don’t think I’m the best person to continue work on this project. Some donors may be choosing to give through surrogates (friends who work at companies that match donations) without understanding the risks involved. My understanding is that there can be several (particularly if donors send surrogates money conditionally, e.g., by asking them to sign an agreement to give through their company’s match): * The surrogate might inadvertently violate their company’s terms for donation matching. * The surrogate, donor, or company might fail an IRS audit if they don’t correctly report the donations + match. * The surrogate or donor might be sued by the company. * The company might discontinue its matching program and/or claw back funds from recipient nonprofits. “Getting to yes” with a corporate partner in a completely legal, transparent, and good faith way could direct signi