I am an (almost finished) PhD student in biostatistics and infectious disease modelling (population-level); my research focuses on Bayesian statistical methods to produce improved estimates of the number of new COVID-19 infections. During the pandemic, I was a member of SPI-M-O (the UK government committee providing expert scientific advice based on infectious disease modelling and epidemiology).
I enjoy applying my knowledge broadly, including to models of future pandemics, big picture thinking on pandemic preparedness, and forecasting.
I'm currently nearing PhD competition with nothing lined up for after. I'm interested in opportunities in biosecurity and global health, especially answering questions about cost-effectiveness and prioritisation using modelling / stats / epidemiology skills. Please DM if of even vague interest.
Happy to chat about my experience providing scientific advice to government, the biosecurity field, epidemic modelling, doing a PhD, or pretty much anything else!
I don't see how you can say both that it will "almost never" be the case that NYC will "hit 1% cumulative incidence after global 1% cumulative incidence" but also that it would surprise you if you can get to where your monitored cities lead global prevalence?
Sorry, this is poorly phrased by me. I meant that it would surprise me if there's much benefit from adding a few additional cities.
The best stuff looking at global-scale analysis of epidemics is probably by GLEAM. I doubt full agent-based modelling at small-scales is giving you much but massively complicating the model.
Sorry, I answered the wrong question, and am slightly confused what this post is trying to get out. I think your question is: will NYC hit 1% cumulative incidence after global 1% cumulative incidence?
I think this is almost never going to be the case for fairly indiscriminately-spreading respiratory pathogens, such as flu or COVID.
The answer is yes only if NYC's cumulative incidence is lower than the global mean region (weighted by population). Due to connectedness, I expect NYC to always be hit pretty early, as you point out, definitely before most rural communities. I think the key point here is that NYC doesn't need to be ahead of the epicentre of the disease, only the global mean.
One way of looking at this is how early on does NYC get hit compared to other cities/regions. This analysis (pdf) orders cities by connectedness to Wuhan to answer this question for COVID. It looks like they've released an online tool that lets you specify different origin locations and epidemiological parameters. So you could rank how early NYC gets hit for a range of different scenarios.
by carefully choosing a few cities to monitor around the world you can probably get to where it leads global prevalence
This would surprise me. It's hard to imagine a scenario where the arrival time at different major travel hubs is very desynchronized as these locations are highly connected to each other. So you'd probably then end up looking at a long tail of locations which are poorly connected to the main travel hubs.
This effect should diminish as the pandemic progresses, but at least in the <1% cumulative incidence situations I'm most interested in it should remain a significant factor.
1% cumulative incidence is quite high, so I think this is probably far along you're fine. E.g. we've estimated London hit this point for COVID around 22 Mar 2020 when it was pretty much everywhere.
This seems intuitively in the right ballpark (within an order of magnitude of GiveWell), but I'd caution that, as far as I can tell, the World Bank and Bernstein et al. numbers are basically made up.
I've previously written about how to identify higher impact opportunities. In particular, we need to be careful about the counterfactuals here because a lot of the money on pandemic preparedness comes from governments who would otherwise spend on even less cost effective things.
Could you please expand on why you think a Pareto distribution is appropriate here? Tail probabilities are often quite sensitive to the assumptions here, and it can be tricky to determine if something is truly power-law distributed.
When I looked at the same dataset, albeit processing the data quite differently, I found that a truncated or cutoff power-law appeared to be a good fit. This gives a much lower value for extreme probabilities using the best-fit parameters. In particular, there were too few of the most severe pandemics in the dataset (COVID-19 and 1918 influenza) otherwise; this issue is visible in fig 1 of Marani et al. Could you please add the data to your tail distribution plot to assess how good a fit it is?
A final note, I think you're calculating the probability of extinction in a single year but the worst pandemics historically have lasted multiple years. The total death toll from the pandemic is perhaps the quantity most of interest.