The Unjournal commissioned two evaluations of "Meaningfully reducing consumption of meat and animal products is an unsolved problem: A meta-analysis" by Seth Ariel Green, Benny Smith, and Maya B Mathur. See our evaluation package here.
My take: the research was ambitious and useful, but it seems to have important limitations, as noted in the critical evaluations; Matthew Janés evaluation provided constructive and actionable insights and suggestions.
I'd like to encourage follow-up research on this same question, starting with this paper's example and its shared database (demonstrating commendable transparency), taking these suggestions on board, and building something even more comprehensive and rigorous.
Do you agree? I come back to some 'cruxes' below:
- Is meta-analysis even useful in these contexts, with heterogeneous interventions, outcomes, and analytical approaches?
- Would a more rigorous and systematic approach really add value? Should it follow academic meta-analysis standards, or "a distinct vision of what meta-analysis is for, and how to conduct it" (as Seth suggests)?
- Will anyone actually do/fund/reward rigorous continued work?
Original paper: evidence that ~the main approaches to this don't work
The authors discussed this paper in a previous post.
We conclude that no theoretical approach, delivery mechanism, or persuasive message should be considered a well-validated means of reducing MAP [meat and anumal products'] consumption
Characterizing this as evidence of "consistently small effects ... upper confidence bounds are quite small" for most categories of intervention.[1]
Unjournal's evaluators: ~this meta-analysis is limited and could be improved is not rigorous enough[2]
From the Evaluation Manager's summary (Tabare Capitan)
... The evaluators identified a range of concerns regarding the transparency, design logic, and robustness of the paper’s methods—particularly in relation to its search strategy, outcome selection, and handling of missing data. Their critiques reflect a broader tension within the field: while meta-analysis is often treated as a gold standard for evidence aggregation, it remains highly sensitive to subjective decisions at multiple stages.
Evaluators' substantive critiques
Paraphrasing these -- mostly from E2, Matthew Jané, but many of the critiques were mentioned by both evaluators
Improper missing data handling: Assigning SMD = 0.01 to non-significant unreported effects introduces systematic bias by ignoring imputation variance
Single outcome selection wastes data: Extracting only one effect per study discards valuable information despite authors having multilevel modeling capacity
Risk-of-bias assessment is inadequate: The informal approach omits critical bias sources like selective reporting and attrition
Missing "a fully reproducible search strategy, clearly articulated inclusion and exclusion criteria ..., and justification for screening decisions are not comprehensively documented in the manuscript or supplement."
No discussion of attrition bias in RCTs... "concerning given the known non-randomness of attrition in dietary interventions"
... And a critique that we hear often in evaluations of meta-analyses: "The authors have not followed standard methods for systematic reviews..."
Epistemic audit: Here is RoastMyPoast's epistemic and factual audit of Janés evaluation. It gets a B- grade (which seems like the modal grade with this tool.) RMP is largely positive, but some constructive criticism (asking for "more explicit discussion of how each identified flaw affects the magnitude and direction of potential bias in the meta-analysis results.")
One author's response
Seth Ariel Green responded here.
Epistemic/factual audit: Here is RoastMyPoast's epistemic and factual audit of Seth's response. It gets a C- grade, and it raises some (IMO) useful critiques of the response, and a few factual disagreements about the cited methodological examples (these should be doublechecked). It flags "defensive attribution bias" and emphasizes that "the response treats innovation as self-justifying rather than requiring additional evidence of validity."
Highlighting some of Seth's responses to the substantive critiques:
"Why no systematic search?"
...We were looking at an extremely heterogeneous, gigantic literature — think tens of thousands of papers — where sifting through it by terms was probably going to be both extremely laborious and also to yield a pretty low hit rate on average.
we employed what could be called a ‘prior-reviews-first’ search strategy. Of the 985 papers we screened, a full 73% came from prior reviews, . ... we employed a multitude of other search strategies to fill in our dataset, one of which was systematic search.
David Reinstein:
Seth's response to these issues might be characterized as ~"the ivory tower protocol is not practical, you need to make difficult choices if you want to learn anything in these messy but important contexts and avoid 'only looking under the streetlamp' -- so we did what seemed reasonable."
I'm sympathetic to this. The description intuitively seems like a reasonable approach to me. I'm genuinely uncertain as to whether 'following the meta-analysis rules' is the most useful approach for researchers aiming at making practical recommendations. I'm not sure if the rules were built for the contexts and purposes we're dealing with.
On the other hand, I think a lack of a systematic protocol limits our potential to build and improve on this work, and to make transparent fair comparisons.
And I would have liked the response to directly take on the methodogical issues raised directly -- yes there are always tradeoffs, but you can justify your choices explicitly, especially when you are departing from conversation.
"Why no formal risk of bias assessment?"
The main way we try to address bias is with strict inclusion criteria, which is a non-standard way to approach this, but in my opinion, a very good one (Simonsohn, Simmons & Nelson (2023) articulates this nicely).
After that baseline level of focusing our analysis on the estimates we thought most credible, we thought it made more sense to focus on the risks of bias that seemed most specific to this literature.
... I hope that our transparent reporting would let someone else replicate our paper and do this kind of analysis if that was of interest to them.
David: Again, this seems reasonable, but also a bit of a false dichotomy, but meriting greater explanation. You can have both strict inclusion criteria and do a risk of bias assessment, although every step takes time and brings challenges.
"About all that uncertainty"
Matthew Jané raises many issues about ways in which he thinks our analyses could (or in his opinion, should) have been done differently. Now I happen to think our judgment calls on each of the raised questions were reasonable and defensible. Readers are welcome to disagree.
Matthew raises an interesting point about the sheer difficulty in calculating effect sizes and how much guesswork went into it for some papers. In my experience, this is fundamental to doing meta-analysis. I’ve never done one where there wasn’t a lot of uncertainty, for at least some papers, in calculating an SMD.
More broadly, if computing effect sizes or variance differently is of interest, by all means, please conduct the analysis, we’d love to read it!
David: This characterizes Seth's response to a number of the issues: 1. This is challenging, 2. You need to make judgment calls, 3. We are being transparent, and allowing others to follow up.
I agree with this, to a point. But again, I'd like to see them explicitly engage with the issues, careful and formal treatments, and specific practical solutions that Matthew provided. And as I get to below – there are some systemic barriers to anyone actually following up on this. [Update 10 Nov 2025: I appreciate Seth's response in encouraging future work and inviting inquiries from other researchers including graduate students.]
Where does this leave us – can meta-analysis be practically useful in heterogeneous domains like this? What are the appropriate standards?
Again from the evaluation manager's synthesis (mostly Tabare Capitan)
... the authors themselves acknowledge many of these concerns, including the resource constraints that shaped the final design. Across the evaluations and the author response, there is broad agreement on a central point: that a high degree of researcher judgment was involved throughout the study. Again, this may reflect an important feature of synthesis work beyond the evaluated paper—namely, that even quantitative syntheses often rest on assumptions and decisions that are not easily separable from the analysts' own interpretive frameworks. These shared acknowledgements may suggest that the field currently faces limits in its ability to produce findings with the kind of objectivity and replicability expected in other domains of empirical science.
David Reinstein:
... I’m more optimistic than Tabaré about the potential for meta-analysis. I’m deeply convinced that there are large gains from trying to systematically combine evidence across papers, and even (carefully) across approaches and outcomes. Yes, there are deep methodological differences over the best approaches. But I believe that appropriate meta-analysis will yield more reliable understanding than ad-hoc approaches like ‘picking a single best study’ or ‘giving one’s intuitive impressions based on reading’. Meta-analysis could be made more reliable through robustness-checking, estimating a range of bounded estimates under a wide set of reasonable choices, and enabling data and dashboards for multiverse analysis, replication, and extensions.
I believe a key obstacle to this careful, patient, open work is the current system of incentives and tools offered by academia and the current system of traditional journal publications as a career outcome an ‘end state’. The author’s response “But at some point, you declare a paper ‘done’ and submit it” exemplifies this challenge.The Unjournal aims to build and facilitate a better system.
Will anyone actually follow up on this? Once the "first paper" is published in an academic journal, can anyone be given a career incentive, or direct compensation, to improve upon it? Naturally, this gets at one of my usual gripes with the traditional academic journal model, a problem that The Unjournal's continuous evaluation tries to solve.
This also depends on... whether the animal welfare and EA community believes that rigorous/academic-style research is useful in this area. And wants to fund and support a program to gradually and continually improve our understanding and evidence on perhaps a small number of crucial questions like this.
I also think it depends on good epistemic norms.
Cross-posted to LessWrong here
- ^
However they say "the largest effect size, ... choice architecture, comes from too few studies to say anything meaningful about the approach in general. So for that case we're dealing with an absence of evidence, i.e., wide posteriors. [Added 10 Nov 2025] Some other parts of the author's discussion also suggest they're making a case for an absence of evidence rather than evidence of a 'tightly bounded near-zero impact.
- ^
10 Nov 2025: I adjusted this header in response to Geoffrey's comment, That I had characterized this somewhat too harshly/negatively, which I accept.

Really enjoyed this. Not much public debate in this space as far as I can see. To two of your cruxes:
I've sometimes wondered if it'd be worth funding a "mega study" like Milkman et al. (2021). They tested 54 different interventions to boost exercise among 61,000 members. Something similar for meat reduction could allow for some clean apples-to-apples comparisons.
I've seen the number $2.6 million floating around for how much this intervention costs. Granted, that's probably on top of convincing the mega-team of researchers to work on the project, which might only happen through the prestige of an academic lab. But it's also not an astronomical cost. And there'd be still some learning value from a smaller set of interventions and a smaller sample.
This might be a better use of resources than striving for the "ideal" meta-analysis, since that sounds expensive too.
@geoffrey We'd love to run a megastudy! My lab put in a grant proposal with collaborators at a different Stanford lab to do just that but we ultimately went a different direction. Today, however, I generally believe that we don't even know what is the right question to be asking -- though if I had to choose one it would be, what ballot intiative does the most for animal welfare while also getting the highest levels of public support, e.g. is there some other low-hanging fruit equivalent to "cage free" like "no mutilation" that would be equally popular. But in general I think we're back to the drawing board in terms of figuring out what is the study we want to run and getting a version of it off the ground, before we start thinking about scaling up to tens of thousands of people.
@david_reinstein, I suppose any press is good press so I should be happy that you are continuing to mull on the lessons of our paper 😃 but I am disappointed to see that the core point of my responses is not getting through. I'll frame it explicitly here: when we did one check and not another, or one one search protocol and not another, the reason, every single time, is opportunity costs. When I say "we thought it made more sense to focus on the risks of bias that seemed most specific to this literature," I am using the word 'focus' deliberately, in the sense of "focus means saying no," i.e. 'we are always triaging.' At every juncture, navigating the explore/exploit dilemma requires judgment calls. You don't have to like that I said no to you, but it's not a false dichotomy, and I do not care for that characterization.
To the second question of whether anyone will do the kind of extension work, I personally see this as a great exercise for grad students. I did all kinds of replication and extension work in grad school. A deep dive into a subset of contact hypothesis literature I did in a political psychology class in 2014, which started with a replication attempt, eventually morphed into The Contact Hypothesis Re-evaluated. If you, a grad student. want to do this kind of project, please be in touch, I'd love to hear from you. (I'd recommend starting by downloading the repo and asking claude code about robustness checks that do and do not require gathering additional data).
That is clearly the case, and I accept there are tradeoffs. But ideally I would have liked to see a more direct response to the substance of the points made by the evaluators. But I understand that there are tradeoffs there as well.
Perhaps 'false dichotomy' was too strong, given the opportunity costs (not an excuse: I got that from the RoastMyPost's take on this). But as I understand it there are clear rubrics and guidelines for this meta-analyses. In cases where you choose to depart from the standard practice, maybe it's reasonable to give a more detailed and grounded explanation of why you did this. And the evaluators did present very specific arguments for different practices you could have followed and could still follow in future work. I think judgment calls based on experience gets you somewhere but it would be better to explicitly defend why you made a particular judgment call, and respond to and consider the analytical points made by the evaluators. And ideally follow up with the checks they suggest, although I understand that it's hard to do this given how busy you are and the nature of academic incentives.
I hope I am being fair here; I'm trying to be even-handed and sympathetic to both sides. Of course, for this exercise to be useful, we have to allow for and permit constructive expert criticism; which I think these evaluations do indeed embody. I appreciate you having responded to these at all. I'd be happy to get others' opinions on whether we've been fair here.
I had previously responded "casting this as 'for graduate students" makes it seem less valuable and prestigious," which I still stand by. But I appreciate that you adjusted your response to note "If a grad student wanted to do this kind of project, please be in touch, I'd love to hear from you" which I think helps a lot.
The point I was making -- perhaps preaching to the choir here:
These extensions and replication, and follow-up steps may be needed to a large project deeply credible and useful and to capture a large part of the value. Why not give equal esteem and career rewards for that? The current system of journals tends not to do so (at least not in economics, the field I'm most familiar with). This is one of the things that we hope that credible evaluation separated from journal publications can improve upon.
Chiming in here with my outsider impressions on how fair the process seems
@david_reinstein If I were to rank the evaluator reports, evaluation summary, and the EA Forum post in which ones seemed the most fair, I would have ranked the Forum post last. It wasn't until I clicked through to the evaluation reports that I felt the process wasn't so cutting.
Let me focus on one very specific framing in the Forum post, since it feels representative. One heading includes the phrase "this meta-analysis is not rigorous enough". This has a few connotations that you probably didn't mean. One, this meta-analysis is much worse than others. Two, the claims are questionable. Three, there's a universally correct level of quality that meta-analyses should reach and anything that falls short of that is inadmissible as evidence.
In reality, it seems this meta-analysis is par for the course in terms of quality. And it was probably more difficult to do so given the heterogeneity in the literature. And the central claim of the meta-analysis doesn't seem like something either evaluator disputed (though one evaluator was hesitant).
Again, I know that's not what you meant and there are many caveats throughout the post. But it's one of a few editorial choices that make the Forum post seem much more critical than the evaluation reports, which is a bit unusual since the Evaluators are the ones who are actually critiquing the paper.
Finally, one piece of context that felt odd not to mention was the fundamental difficulty of finding an expert in both food consumption and meta-analysis. That limits the ability of any reviewer to make a fair evaluation. This is acknowledged at the bottom of the Evaluation Summary. Elsewhere, I'm not sure where it's said. Without that mentioned, I think it's easy for a casual reader to leave thinking the two Evaluators are the "most correct".
Thanks for the detailed feedback, this seems mostly reasonable. I'll take a look again at some of the framings, and try to adjust. (Below and hopefully later in more detail).
This was my take on how to succinctly depict the evaluators' reports (not my own take), in a way the casual reader would be able to digest. Maybe this was rounding down too much, but not by a lot, I think. Some quotes from Janés evaluation that I think are representative:
This doesn't seem to reflect 'par for the course' to me, but it depends on what the course is; i.e., what the comparison group. My own sense/guess is that this more rigorous and careful than most work in this area of meat consumption interventions (and adjacent) but less rigorous than the meta-analyses the evaluators are used to seeing in their academic contexts and the practices they espouse. But academic meta-analysts will tend to focus on areas where they can find a proliferation of high-quality more homogenous research, not necessarily the highest impact areas.
Note that the evaluators rated this 40th and 25th percentile for methods and 75th and 39th percentile overall.
To be honest I'm having trouble pinning down what the central claim of the meta-analysis is. Is it a claim that "the main approaches being used to motivate reduced meat consumption don't seem to work", i.e., that we can bound the effects as very small, at best? That's how I'd interpret the reporting of the pooled effects 95% CI as standardized mean effect of 0.02 and 0.12. I would say that both evaluators are sort of disputing that claim.
However the authors hedge this in places and sometimes it sounds more like they're saying that ~"even the best meta-analysis possible leaves a lot of uncertainty" ... An absence of evidence more than an evidence of absence, and this is something the evaluators seem to agree with.
That is/was indeed challenging. Let me try to adjust this post to note that.
My goal for this post was to fairly represent the evaluator's take, to provide insights to people who might want to use this for decision-making and future research, to raise the question of standards in meta-analysis in EA-related areas. I will keep thinking about whether I missed the mark here. One possible clarification though: we don't frame the evaluator's role as (only) looking to criticize or find errors in the paper. We ask them to give a fair assessment of it, evaluating its strengths, weaknesses, credibility, and usefulness. These evaluations can also be useful if they give people more confidence in the paper and its conclusions, and thus reason to update more on this for their own decision-making.
This does indeed look interesting, and promising. Some quick (maybe naive) thoughts on that particular example, at a skim.
The "cost of convincing researchers to work on it" Is uncertain to me. If it was already a very well-funded high-quality study in an interesting area that is 'likely to publish well' (apologies), I assume that academics would have some built-in 'publish or perish' incentives from their universities.
Certainly there is some trade-off here: Of course investing resources, intellectual and time, into more careful, systematic, and robust meta-analysis of a large body of work of potentially varying quality and great heterogeneity comes at the cost of academics and interested researchers organizing better and more systematic new studies. There might be some middle ground where a central funder requires future studies to follow common protocols and reporting standards to enable better future meta-analysis (perhaps as well as outreach to authors of past research to try systematically dig out missing information.)
Seems like there are some key questions here
For what it's worth, I thought David's characterization of the evaluations was totally fair, even a bit toned down. E.g. this is the headline finding of one of them:
David characterizes these as "constructive and actionable insights and suggestions". I would say they are tantamount to asking for a new paper, especially the excluding of small studies, which was core to our design and would require a whole new search, which would take months. To me, it was obvious that I was not going to do that (the paper had already been accepted for publication at that point). The remaining suggestions also implied dozens ( hundreds?) of hours of work. Spending weeks satisfying two critics didn't pass a cost-benefit test.[1] It wasn't a close call.
really need to follow my own advice now and go actually do other projects 😃
I meant "constructive and actionable" In that he explained why the practices used in the paper had potentially important limitations (see here on "assigning an effect size of .01 for n.s. results where effects are incalculable")...
And suggested a practical response including a specific statistical package which could be applied to the existing data:
"An option to mitigate this is through multiple imputation, which can be done through the
metansue(i.e., meta-analysis of non-significant and unreported effects) package"In terms of the cost-benefit test it depends on which benefit we are considering here. Addressing these concerns might indeed take months to do and might indeed cost hundreds of hours. Indeed, it's hard to justify this in terms of the current academic/career incentives alone, as the paper had already been accepted for publication. If this we're directly tied to grants there might be a case but as it stands I understand that it could be very difficult for you to take this further.
But I wouldn't characterize doing this as simply "satisfying two critics". The critiques themselves might be sound and relevant, and potentially impact the conclusion (at least in differentiating between "we have evidence," the effects are small and "the evidence is indeterminate", which I think is an important difference). And the value of the underlying policy question (~'Should animal welfare advocates be using funding existing approaches to reducing mep consumption?') seems high to me. So I would suggest that the benefit exceeds the cost here in net even if we might not have a formula for making it worth your while to make these adjustments right now.
I also think there might be value in setting an example standard that, particularly for high-value questions like this, we strive for a high level of robustness, following up on a range of potential concerns and critiques etc. I'd like to see these things as long-run living projects that can be continuously improved and updated (and re-evaluated). The current research reward system doesn't encourage this, which is a gap we are trying to help fill.
David, there are two separate questions here, which is whether these analyses should be done or whether I should have done them in response to the evaluations. If you think these analyses are worth doing, by all means, go ahead!
Seth, for what it's worth, I found your hourly estimates (provided in these forum comments but not something I saw in the evaluator response) on how long the extensions would take to be illuminating. Very rough numbers like this meta-analysis taking 1000 hours for you or a robustness check taking dozens / hundreds of hours more to do properly helps contextualize how reasonable the critiques are.
It's easy for me (even now while pursuing research, but especially before when I was merely consuming it) to think these changes would take a few days.
It's also gives me insight into the research production process. How long does it take to do a meta-analysis? How much does rigor cost? How much insight does rigor buy? What insight is possible given current studies? Questions like that help me figure out whether a project is worth pursuing and whether it's compatible with career incentives or more of a non-promotable task
Love talking nitty gritty of meta-analysis 😃
> Too often, research syntheses focus solely on estimating effect sizes, regardless of whether the treatments are realistic, the outcomes are assessed unobtrusively, and the key features of the experiment are presented in a transparent manner. Here we focus on what we term landmark studies, which are studies that are exceptionally well-designed and executed (regardless of what they discover). These studies provide a glimpse of what a meta-analysis would reveal if we could weight studies by quality as well as quantity. [the point being, meta-analysis is not well-suited for weighing by quality.]
A final reflective note: David, I want to encourage you to think about the optics/politics of this exchange from the point of view of prospective Unjornal participants/authors. There are no incentives to participate. I did it because I thought it would be fun and I was wondering if anyone would have ideas or extensions that improved the paper. Instead, I got some rather harsh criticisms implying we should have written a totally different paper. Then I got this essay, which was unexpected/unannounced and used, again, rather harsh language to which I objected. Do you think this exchange looks like an appealing experience to others? I'd say the answer is probably not.
A potential alternative: I took a grad school seminar where we replicated and extended other people's papers. Typically the assignment was to do the robustness checks in R or whatever, and then the author would come in and we'd discuss. It was a great setup. It worked because the grad students actually did the work, which provided an incentive to participate for authors. The co-teachers also pre-selected papers that they thought were reasonably high-quality, and I bet that if they got a student response like Matthew's, they would have counseled them to be much more conciliatory, to remember that participation is voluntary, to think through the risks of making enemies (as I counseled in my original response), etc. I wonder if something like that would work here too. Like, the expectation is that reviewers will computationally reproduce the paper, conduct extensions and robustness checks, ask questions if they have them, work collaboratively with authors, and then publish a review summarizing the exchange. That would be enticing! Instead what I got here was like a second set of peer reviewers, and unusually harsh ones at that, and nobody likes peer review.
It might be the case that meta-analyses aren't good candidates for this kind of work, because the extensions/robustness checks would probably also have taken Matthew and the other responder weeks, e.g. a fine end of semester project for class credit but not a very enticing hobby.
Just a thought.
I appreciate the feedback. I'm definitely aware that we want to make this attractive to authors and others, both to submit their work and to engage with our evaluations. Note that in addition to asking for author submissions, our team nominates and prioritizes high-profile and potential-high-impact work, and contact authors to get their updates, suggestions, and (later) responses. (We generally only require author permission to do these evaluations from early-career authors at a sensitive point in their career.) We are grateful to you for having responded to these evaluations.
I would disagree with this. We previously had author prizes (financial and reputational) focusing on authors who submitted work for our evaluation. although these prizes are not currently active. I'm keen to revise these prizes when the situation permits (funding and partners).
But there are a range of other incentives (not directly financial) for authors to submit their work, respond to evaluations and engage in other ways. I provide a detailed author FAQ here. This includes getting constructive feedback, signaling your confidence in your paper and openness to criticism, the potential for highly positive evaluations to help your paper's reputation, visibility, unlocking impact and grants, and more. (Our goal is that these evaluations will ultimately become the object of value in and of themselves, replacing "publication in a journal" for research credibility and career rewards. But I admit that's a long path.)
I would not characterize the evaluators' reports in this way. Yes, there was some negative-leaning language, which, as you know, we encourage the evaluators to tone down. But there were a range of suggestions (especially from Jané) which I see as constructive, detailed, and useful, both for this paper and for your future work. And I don't see this as them suggesting "a totally different paper." To large extent they agreed with the importance of this project, with the data collected, and with many of your approaches. They praised your transparency. They suggested some different methods for transforming and analyzing the data and interpreting the results.
I think it's important to communicate the results of our evaluations to wider audiences, and not only on our own platform. As I mentioned, I tried to fairly categorize your paper, the nature of the evaluations, and your response. I've adjusted my post above in response to some of your points where there was a case to be made that I was using loaded language, etc.
Would you recommend that I share any such posts with both the authors and the evaluators before making them? It's a genuine question (to you and to anyone else reading these comments) - I'm not sure the correct answer.
As to your suggestion at the bottom, I will read and consider it more carefully -- it sounds good.
Aside: I'm still concerned with the connotation of replication, extension, and robustness checking being something that should be relegated to graduate students and not. This seems to diminish the value and prestige of work that I believe to be of the highest order practical value for important decisions in the animal welfare space and beyond.
In the replication/robustness checking domain, I think what i4replication.org is doing is excellent. They're working with both graduate students and everyone from graduate students to senior professors to do this work and treating this as a high-value output meriting direct career rewards. I believe they encourage the replicators to be fair – excessively conciliatory nor harsh, and focus on the methodology. We are in contact with i4replication.org and hoping to work with them more closely, with our evaluations and “evaluation games” offering grounded suggestions for robustness replication checks.
Yes. But zooming back out, I don't know if these EA Forum posts are necessary.
A practice I saw i4replication (or some other replication lab) is that the editors didn't provide any "value-added" commentary on any given paper. At least, I didn't see these in any tweets they did. They link to the evaluation reports + a response from the author and then leave it at that.
Once in a while, there will be a retrospective on how the replications are going as a whole. But I think they refrain from commenting on any paper.
If I had to rationalize why they did that, my guess is that replications are already an opt-in thing with lots of downside. And psychologically, editor commentary has a lot more potential for unpleasantness. Peer review tends to be anonymous so it doesn't feel as personal because the critics are kept secret. But editor commentary isn't secret...actually feels personal, and editors tend to have more clout.
Basically, I think the bar for an editor commentary post like this should be even higher than the usual process. And the usual evaluation process already allows for author review and response. So I think a "value-added" post like this should pass a higher bar of diplomacy and insight.
Thanks for the thoughts. Note that I'm trying to engage/report here because we're working hard to make our evaluations visible and impactful, and this forum seems like one of the most promising interested audiences. But also eager to hear about other opportunities to promote and get engagement with this evaluation work, particularly in non-EA academic and policy circles.
I generally aim to just summarize and synthesize what the evaluators had written and the authors' response, bringing in what seemed like some specific relevant examples, and using quotes or paraphrases where possible. I generally didn't give these as my opinions but rather, the author and the evaluators'. Although I did specifically give 'my take' in a few parts. If I recall my motivation I was trying to make this a little bit less dry to get a bit more engagement within this forum. But maybe that was a mistake.
And to this I added an opportunity to discuss the potential value of doing and supporting rigorous, ambitious, and 'living/updated' meta-analysis here and in EA-adjacent areas. I think your response was helpful there, as was the authors. I'd like to see others' takes
Some clarifications:
The i4replication groups does put out replication papers/reports in each case and submits these to journals, and reports on this outcome on social media . But IIRC they only 'weigh in' centrally when they find a strong case suggesting systematic issues/retractions.
Note that their replications are not 'opt-in': they aimed to replicate every paper coming out in a set of 'top journals'. (And now, they are moving towards a research focusing on a set of global issues like deforestation, but still not opt-in).
I'm not sure what works for them would work for us, though. It's a different exercise. I don't see an easy route towards our evaluations getting attention through 'submitting them to journals' (which naturally, would also be a bit counter to our core mission of moving research output and rewards away from the 'journal publication as a static output.)
Also: I wouldn't characterize this post as 'editor commentary', and I don't think I have a lot of clout here. Also note that typical peer review is both anonymous and never made public. We're making all our evaluations public, but the evaluators have the option to remain anonymous.
But your point about a higher-bar is well taken. I'll keep this under consideration.
Executive summary: The Unjournal’s evaluations of a meta-analysis on reducing meat/animal-product consumption found the project ambitious but methodologically limited; the author argues meta-analysis can still be valuable in this heterogeneous area if future work builds on the shared dataset with more systematic protocols, robustness checks, and clearer bias handling—while noting open cruxes and incentive barriers to actually doing that follow-up (exploratory, cautiously optimistic).
Key points:
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.