Hide table of contents

Highlights

Index

  • Prediction Markets & Forecasting Platforms
  • In The News
  • Blog Posts
  • Long Content

Sign up here (a) or browse past newsletters here (a).

Prediction Markets & Forecasting Platforms

Replication Markets

Replication Markets (a) was a project to research how well prediction markets could predict whether papers would replicate. They are paying out (a) $142k in cash rewards for the prediction markets part of the experiment. This corresponds to 121 resolved questions, which includes 12 meta-questions and 30 about covid papers.

The leaderboard for users is here (a). I won a mere $809, and I don't remember participating all that much. In particular, I was excited at the beginning but lost interest because a user—or a bot—named "unipedal" seemed like it was taking all the good opportunities.

Now, a long writeup by "unipedal" himself can be read at How I Made $10k Predicting Which Studies Will Replicate (a). The author started out with a simple quantitative model based on Altmejd et al. (2019) (a)

Predicting the replicability of social science lab experiments, Altmejd et al., 2019.

But in later rounds, he dropped the quantitative model, and started "playing the market". That is, he found out that trying to predict how the market will move is more profitable than giving one's own best guess. Unipedal then later automated his trades when the market API was opened to users.

In contrast, I participated in a few rounds and put in 10x less effort while earning much more than 1/10th of the rewards. As unipedal points out, this is backwards:

...I think one of the most important aspects of "ideal" prediction markets is that informed traders can compound their winnings, while uninformed traders go broke. The market mechanism works well because the feedback loop weeds out those who are consistently wrong. This element was completely missing in the RM [Replication Markets] project.

The same author previously wrote: What's Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers (a), which is also based on his experiences with the Replication Markets competition.

Besides Replication Markets, DARPA has also founded another group to predict replications through their SCORE (a) program. Based on preliminary results, this second group (a) seems like they beat Replication Markets by using a more Delphi-like (a) methodology to elicit predictions.

Metaculus

It has been an active month for Metaculus.

For starters, they rehauled (a) their scoring system for tournaments. Then, Metaculus' Journal (a) started to give fruit: The article on forecasts of Human-Level Language Models (a) (also on LessWrong here (a)) was of fairly high quality.

Metaculus also started to keep track of the accuracy of a small number of Public Figures (a). Because Metaculus has so many questions, every time one of these figures makes a public prediction, it is likely enough that Metaculus also has a prediction on the same issue. Over time, this will allow Metaculus to see who is generally more accurate. This is a more adversarial version of Tetlock's original Alpha Pundit (a) idea: instead of having experts willingly participate, Metaculus is just passively keeping track of how bad they are. Kudos!

Two comments worth highlighting from SimonM's (a) list of top comments from Metaculus this past November) (a) are:

  • juancambeiro (a) re-opens a previously closed question on whether or not a member of the IC community thinks COVID was a lab leak.
  • ege_erdil (a) thinks we should be extremely uncertain about crime. "Overall I think everyone in this thread is way too confident that they know what's going on with crime rates at some frequency scale. My opinion is that let alone understanding the long-term mechanisms which drive changes in crime rates, we don't even have a very good understanding of crime rates from the past. If Louis XIV's reign in France cut murder rates in half, we would never know it from the evidence available to us today."

In addition, Metaculus and its community worked at full speed to put up questions and produce forecasts on the Omicron (a) variant (a). Metaculus also added more questions to the Keep Virginia Safe Tournament) (a), and increased the price pool somewhat to $2,500.

Polymarket

Polymarket saw some extremely large swings, where 1:250 (a) and 1:700 (a) underdogs ended up winning. h/t @Domahhhh (a)

Zvi positively covers some Polymarket markets on Covid here (a).

Polymarket also added support for Metamask (a), one of the most popular crypto-wallets, making Polymarket ever more mainstream. They also had a bit of a brouhaha on a market on the number of exoplanets (a) discovered, where the resolution source pointed to two different numbers.

Odds and ends

Augur—a set of pioneering prediction market contracts on Ethereum and the community around it—is creating a decentralized autonomous organization (a), AugurDAO (a). I get the impression that the original developers have gotten tired of supporting Augur, whose current focus on sports markets merely makes it a very slow sportsbook.

But the move is also consistent with Augur's initial ethos of being decentralized. For example, the Forecast Foundation (a) which supports Augur's development, seems to live under Estonian jurisdiction (a), whereas a DAO arguably lives under no jurisdiction.

The Foresight Institute is hosting a "Vision Weekend" (a) in the US and France. Although I remembered the Foresight Institute as something that Eric Drexler founded before he went on to do other things (a), I did find some familiar names in the list of presenters (a), and browsing the details the event is probably going to be of higher quality than I would have thought.

The Anticipation Hub is hosting a "Global Dialogue Platform on Anticipatory Humanitarian Action (a)", hosted online from the 7th to the 9th of December. Although it seems more focused on global health and development topics, it might be of interest to NGOs around the forecasting space more generally.

The Global Priorities Institute (a) is dipping its toes into forecasting. As one might expect, so far there is a lot of academically-flavored discussions, but very little actual forecasting.

Hedgehog Markets had an NFT-minting event (a), where users could buy NFTs which they will be able to use to participate in competitions closed-off to non-NFT holders. I don’t see the appeal, but others did and spent around $500k on these tokens (5000 NFTs at 0.5 SOL (a) per token).

A former US Commodity Futures Trading Commission commissioner joined Kalshi’s board (a).

Hypermind started a new contest (a) on the "future of Africa", with $6000 in prize money.

There is a fairly neat calibration app (a) based on the exercises from The Scout Mindset.

I've added Peter Wildeford's pubicly available predictions to Metaforecast:

On the negative side, Metaforecast is experiencing some difficulties with updating with new forecasts, which I hope to get fixed in the coming week.

In the News

The Economist featured a full page with forecasts from Good Judgment Open (a) on their "The World in 2022" edition.

Reuters reports that Climate change extremes spur U.N. plan to fund weather forecasting (a). My impression is that climate change fears are being used to fund much-needed bog-standard weather forecasting. I have mixed feelings about this.

An Excel competitor with some forecasting functionality, Pigment (a), raises $73M (a).

Blog Posts

Joe Carlsmith writes down his thoughts on Solomonoff induction (a) (see a decent introduction of the concept here (a)). Although I was already familiar with the concept, I still feel like I learnt a bunch:

  • the speed prior (a) is a nice hack to get around the fact that some programs can run forever, and you can't say which ones they are per the Halting problem.
  • There is some unavoidable sense in which one has to assign smaller probabilities to longer programs.
  • Solomonoff Induction requires that uncomputable processes like Solomonoff Induction be impossible. If the universe could include uncomputable processes, it could include a copy of your Solomonoff Induction process. In that case, the universe could function as a Solomonoff "anti-Inductor". That is, the universe could perfectly simulate what your Solomonoff Inductor will predict next and then feed you the opposite.

Tanner Greer of The Scholar's Stage has a new piece on Sino-American Competition and the Search For Historical Analogies (a). His main point is that the tensions around Taiwan break the analogy between the current relationship between the US and China and the relationship between the US and the USSR during the Cold War.

Jaime Sevilla posts A Bayesian Aggregation Paradox (a): There is no objective way of summarizing a Bayesian update over an event with three outcomes A:B:C as an update over two outcomes A:¬A. From the comments:

Imagine you have a coin that's either fair, all-heads, or all-tails. If your prior is "fair or all-heads with probability 1/2 each", then seeing heads is evidence against "fair". But if your prior is "fair or all-tails with probability 1/2 each", then seeing heads is evidence for "fair". Even though "fair" started as 1/2 in both cases. So the moral of the story is that there's no such thing as evidence for or against a hypothesis, only evidence that favors one hypothesis over another.

Long Content

Are "superforecasters" a real phenomenon? (a). David Manheim, a superforecaster, answers:

So in short, I'm unconvinced that superforecasters are a "real" thing, except in the sense that most people don't try, and people who do will do better, and improve over time. Given that, however, we absolutely should rely on superforecasters to make better predictions that the rest of people - as long as they continue doing the things that make them good forecasters.


Note to the future: All links are added automatically to the Internet Archive, using this tool (a). "(a)" for archived links was inspired by Milan Griffes (a), Andrew Zuckerman (a), and Alexey Guzey (a).


It is curious to reflect that out of all the "experts" of all the schools, there was not a single one who was able to foresee so likely an event as the Russo-German Pact of 1939. And when news of the Pact broke, the most wildly divergent explanations were of it were given, and predictions were made which were falsified almost immediately, being based in nearly every case not on a study of probabilities but on a desire to make the U.S.S.R. seem good or bad, strong or weak. Political or military commentators, like astrologers, can survive almost any mistake, because their more devoted followers do not look to them for an appraisal of the facts but for the stimulation of nationalistic loyalties

— George Orwell, Notes on Nationalism (a), 1945, h/t Scott Alexander.

Comments2


Sorted by Click to highlight new comments since:

The GPI link goes to their homepage. Is there a more specific URL for the claim that "The Global Priorities Institute is dipping its toes into forecasting"?

No. But on the other hand, I did attend a couple of their meetings/activities on the topic this past month.

Curated and popular this week
 ·  · 23m read
 · 
Or on the types of prioritization, their strengths, pitfalls, and how EA should balance them   The cause prioritization landscape in EA is changing. Prominent groups have shut down, others have been founded, and everyone is trying to figure out how to prepare for AI. This is the first in a series of posts examining the state of cause prioritization and proposing strategies for moving forward.   Executive Summary * Performing prioritization work has been one of the main tasks, and arguably achievements, of EA. * We highlight three types of prioritization: Cause Prioritization, Within-Cause (Intervention) Prioritization, and Cross-Cause (Intervention) Prioritization. * We ask how much of EA prioritization work falls in each of these categories: * Our estimates suggest that, for the organizations we investigated, the current split is 89% within-cause work, 2% cross-cause, and 9% cause prioritization. * We then explore strengths and potential pitfalls of each level: * Cause prioritization offers a big-picture view for identifying pressing problems but can fail to capture the practical nuances that often determine real-world success. * Within-cause prioritization focuses on a narrower set of interventions with deeper more specialised analysis but risks missing higher-impact alternatives elsewhere. * Cross-cause prioritization broadens the scope to find synergies and the potential for greater impact, yet demands complex assumptions and compromises on measurement. * See the Summary Table below to view the considerations. * We encourage reflection and future work on what the best ways of prioritizing are and how EA should allocate resources between the three types. * With this in mind, we outline eight cruxes that sketch what factors could favor some types over others. * We also suggest some potential next steps aimed at refining our approach to prioritization by exploring variance, value of information, tractability, and the
 ·  · 1m read
 · 
I wanted to share a small but important challenge I've encountered as a student engaging with Effective Altruism from a lower-income country (Nigeria), and invite thoughts or suggestions from the community. Recently, I tried to make a one-time donation to one of the EA-aligned charities listed on the Giving What We Can platform. However, I discovered that I could not donate an amount less than $5. While this might seem like a minor limit for many, for someone like me — a student without a steady income or job, $5 is a significant amount. To provide some context: According to Numbeo, the average monthly income of a Nigerian worker is around $130–$150, and students often rely on even less — sometimes just $20–$50 per month for all expenses. For many students here, having $5 "lying around" isn't common at all; it could represent a week's worth of meals or transportation. I personally want to make small, one-time donations whenever I can, rather than commit to a recurring pledge like the 10% Giving What We Can pledge, which isn't feasible for me right now. I also want to encourage members of my local EA group, who are in similar financial situations, to practice giving through small but meaningful donations. In light of this, I would like to: * Recommend that Giving What We Can (and similar platforms) consider allowing smaller minimum donation amounts to make giving more accessible to students and people in lower-income countries. * Suggest that more organizations be added to the platform, to give donors a wider range of causes they can support with their small contributions. Uncertainties: * Are there alternative platforms or methods that allow very small one-time donations to EA-aligned charities? * Is there a reason behind the $5 minimum that I'm unaware of, and could it be adjusted to be more inclusive? I strongly believe that cultivating a habit of giving, even with small amounts, helps build a long-term culture of altruism — and it would
 ·  · 1m read
 · 
I recently read a blog post that concluded with: > When I'm on my deathbed, I won't look back at my life and wish I had worked harder. I'll look back and wish I spent more time with the people I loved. Setting aside that some people don't have the economic breathing room to make this kind of tradeoff, what jumps out at me is the implication that you're not working on something important that you'll endorse in retrospect. I don't think the author is envisioning directly valuable work (reducing risk from international conflict, pandemics, or AI-supported totalitarianism; improving humanity's treatment of animals; fighting global poverty) or the undervalued less direct approach of earning money and donating it to enable others to work on pressing problems. Definitely spend time with your friends, family, and those you love. Don't work to the exclusion of everything else that matters in your life. But if your tens of thousands of hours at work aren't something you expect to look back on with pride, consider whether there's something else you could be doing professionally that you could feel good about.