Thanks. A quick, non-exhaustive list:
I've stayed at a (non-EA) professional contact's house before when they'd invited me to give a talk and later very apologetically realized they didn't have the budget for a hotel. They likely felt obliged to offer; I felt like it would be awkward to decline. We were both at pains to be extremely, exceedingly, painstakingly polite given the circumstances and turn the formality up a notch.
I agree the org should have paid for a hotel, I'm only mentioning this because if baseline formality is a 5, I would think it would be more normal to kick it up to a 10 under the circumstances. It makes this situation all the more bizarre.
I'm not going to deal with the topic of the post, but there's another reason to not post under a burner account if it can be avoided that I haven't seen mentioned, which this post indirectly highlights.
When people post under burner accounts, it makes it harder to be confident in the information that the posts contain, because there is ambiguity and it could be the same person repeatedly posting. To give one example (not the only one), if you see X number of burner accounts posting "I observe Y", then that could mean anywhere from 1 to X observations of Y, and it's hard to get a sense of the true frequency. This means it undermines the message of those posting, to post under burners, because some of their information will be discounted.
In this post, the poster writes "Therefore, I feel comfortable questioning these grants using burner accounts," which suggests in fact that they do have multiple burner accounts. I recognize that using the same burner account would, over time, aggregate information that would lead to slightly less anonymity, but again, the tradeoff is that it significantly undermines the signal. I suspect it could lead to a vicious cycle for those posting, if they repeatedly feel like their posts aren't being taken seriously.
Thanks for mentioning the Social Science Prediction Platform! We had some interest from other sciences as well.
With collaborators, we outlined some other reasons to forecast research results here: https://www.science.org/doi/10.1126/science.aaz1704. In short, forecasts can help to evaluate the novelty of a result (a double-edged sword: very unexpected results are more likely to be suspect), mitigate publication bias against null results / provide an alternative null, and over time help to improve the accuracy of forecasting. There are other reasons, as well, like identifying which treatment to test or which outcome variables to focus on (which might have the highest VoI). In the long run, if forecasts are linked to RCT results, it could also help us say more about those situations for which we don't have RCTs - but that's a longer-term goal. If this is an area of interest, I've got a podcast episode, EA Global presentation and some other things in this vein... this is probably the most detailed.
I agree that there's a lot of work in this area and decision makers actively interested in it. I'll also add that there's a lot of interest on the researcher side, which is key.
P.S. The SSPP is hiring web developers, if you know anyone who might be a good fit.
As a small note, we might get more precise estimates of the effects of a program by predicting magnitudes rather than whether something will replicate (which is what we're doing with the Social Science Prediction Platform). That said, I think a lot of work needs to be done before we can have trust in predictions, and there will always be a gap between how comfortable we are extrapolating to other things we could study vs. "unquantifiable" interventions.
(There's an analogy to external validity here, where you can do more if you can assume the study you predict is drawn from the same set as those you have studied, or the same set if weighted in some way. You could in principle make an ordering of how feasible something is to be studied, and regress your ability to predict on that, but that would be incredibly noisy and not practical as things stand, and past some threshold you don't observe studies anymore and have little to say without making strong assumptions about generalizing past that threshold.)
Great comment. I don't think anyone, myself included, would say the means are not the same and therefore everything is terrible. In the podcast, you can see my reluctance to that when Rob is trying to get me to give one number that will easily summarize how much results in one context will extrapolate to another, and I just don't want to play ball (which is not at all to criticize!). The number I tend to focus on these days (tau squared) is not one that is easily interpretable in that way - instead, it's a measure of the unexplained variation in results - but how much is unexplained clearly depends on what model you are using (and because it is a variance, it really depends on units, making it hard to interpret across interventions except for those dealing with the same kind of outcome). On this view, if you can come up with a great model to explain away more of the heterogeneity, great! I am all for models that have better predictive power.
On the other hand:
1) I do worry that often people are not building more complicated models, but rather thinking about a specific study (if lucky, a group of studies), most likely being biased towards those which found particularly large effects as people seem to update more on positive results.
2) I am not convinced that focusing on mechanisms will completely solve the problem. I agree that interventions that are more theory-based should (in theory) have more similar results -- or at least results that are better able to be predicted, which is more to the point. On the other hand, implementation details matter. I agree with Glennerster and Bates that there is an undue focus on setting -- everyone wants an impact evaluation done in their particular location. But I think there is too much focus on setting because (perhaps surprisingly) when I look in the AidGrade data, there is little to no effect of geography on the impact found, by which I mean that a result from (say) Kenya does not even generalize to Kenya very well (and I believe James Rising and co-authors have found similar results using a case study of conditional cash transfers). This isn't always going to be true; for example, the effect of health interventions depend on the baseline prevalence of disease, and baseline prevalences can be geographically clustered. But what I worry -- without convincing evidence yet so take this with a grain of salt -- is that small implementation details might frequently wash out the effects of knowing the mechanisms. Hopefully, we will have more evidence on this in the future (whichever way that evidence goes), and I very much hope that the more positive view turns out to be true.
I do agree with you that it's possible that researchers (and policymakers?) are able to account for some of the other factors when making predictions. I also said that there was some evidence that people were updating more on the positive results; I need to dig into the data a bit more to do subgroup analyses, but one way to reconcile these results (which would be consistent with what I have seen using different data) is that some people may be better at it than others. There are definitely times when people are wildly off, as well. I don't think I have a good enough sense yet of when predictions are good and when they are not, and that would be valuable.
Edit: I meant to add, there are a lot of frameworks that people use to try to get a handle on when they can export results or how to generalize. In addition to the work cited in Glennerster and Bates, see Williams for another example. And talking with people in government, there are a lot of other one-off frameworks or approaches people use internally. I am a fan of this kind of work and think it highly necessary, even though I am quite confident it won't get the appreciation it deserves within academia.
This video might also add to the discussion - the closing panel at CSAE this year was largely on methodology, moderated by Hilary Greaves (head of the new Global Priorities Institute at Oxford), with Michael Kremer, Justin Sandefur, Joseph Ssentongo, and myself. Some of the comments from the other panellists still stick with me today.
https://ox.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=ec3f076c-9c71-4462-9b84-a8a100f5a44c
There's another point I don't quite know how to put but I'll give it a go.
Despite the comments above about having many ideas and getting feedback early about one's projects - which both point to having and abandoning ideas quickly - there's another sense in which actually what one needs is an ability to stick to things. And the good taste to be able to evaluate when to try something else and when to keep going. (This is less about specific projects and more about larger shifts like whether to stay in academia/a certain line of work at all.)
I feel like sometimes people get too much advice to abandon things early. It's advice that has intuitive appeal (if you can't pick winners, at least cut your losses early), and it's good advice in a lot of situations. But my impression is that while there are some people who would do better failing faster, there are also some people who would do better if they were more patient. At least for myself, I started having more success when sticking with things for longer. The longer you stick to a thing, the more expertise you have in it. That may not matter in some fields, but it matters in academia.
Now, obviously, you want to be very selective about what you stick to. That's where having good taste comes in. But I'd start by looking honestly at yourself and looking at people near you that you see doing well for themselves in your chosen field, and asking which side of the impatient-patient spectrum you fall on compared to them. Some people are too patient. Some people are too impatient. I was too impatient and improved with more patience, and for some people it's the opposite. Which advice applies the most to you depends on your starting point and field, and of course the outside options.
For econ PhDs, I think it's worth having a lot of ideas and discarding them quickly especially in grad school because a lot of them are bad at first, but I also think there are people who jump ship from an academic career too early, like when they are on the market or in the first few years after. I suspect this might be generally true in academia where expertise really, really matters and you need to make a long-term investment, but I can't speak for certain about other academic fields beyond economics. And I've definitely met many academics who played it too safe for maximizing impact, and many people who didn't leave quickly enough. What I'm trying to emphasize is that it's possible to make mistakes in both directions and you should put effort into figuring out which type of error you personally are more likely to make.