Hide table of contents

Epistemic Status: This concept is early and speculative

TLDR: Accuracy Agreements are contracts in which one party commits to remunerate the other based on a predetermined rate, proportional to the accuracy of a series of forecasts supplied by the contract owner. This incentivizes the owner to find cost-effective methods for producing calibrated and accurate forecasts.
 

The Basics

The problem: Prediction Markets are effective at determining outcomes for binary questions, but they struggle with probability distributions and more expressive situations, such as those involving functions. Also, often there are only a few potential organizations that manage estimation, and it’s simpler to pay one for estimation than it is to set up a market.

Proposed solution: A client publishes an "Accuracy Agreement" with an associated "Type Specification."

A simple agreement's terms are as follows:

  1. A purchaser of the agreement provides a one-time or continuously-updating forecast that complies with the established Type Specification
  2. Upon reaching a prearranged future date, the Agreement will undergo Resolution, which entails:
    1. The client will determine the answers to the forecasting questions.
    2. The client will evaluate and assign a score to the submitted forecast.
    3. The contract holder at the time of resolution will receive payment based on a function of a scoring rule. An example of this could be: "The average log score of the forecast minus the prior, multiplied by $10,000. Stated differently, $10,000 per bit of information.”

Holding an "Accuracy Agreement" with a specific forecast has a calculable value. If the holder trusts the forecast, the value would equal the expected resolution payment. (Easily calculated using expected loss).

One elegant way the client could ensure a reasonable price is by selling the contract, initially, to the winner of a bidding system. When a question is proposed, potential buyers bid for the Agreement. If a bidder believes their forecast has an EV of "$25,000," they might bid "$23,000" for the Agreement. The winner would either need to submit a prediction immediately, or would have time to generate one.

An Example:

A client publishes:

  1. A list of 50 continuous variables, to be resolved in 1 year.
    1. A set of (weak) priors for those variables.
    2. A simple scoring function, e.g., $1K * total logScore sum.
    3. A future date for bidding to commence (1 month out).
  2. Bidders submit:
    1. A list of forecasts for the 50 variables, each as a probability distribution.
    2. A maximum price for purchasing the agreement with their forecast.
  3. The top bidder is chosen and sold the contract.
  4. In 1 year, the contract resolution occurs.
    1. All (or some) of the variables are resolved.
    2. The contracts’ forecasts get scored.
    3. The agreement owner is paid according to the predetermined scoring rule

Potential Changes / Improvements

  1. An active marketplace. Purchasers can trade these contracts with each other over time, after the initial purchase.
  2. Introduce a market with multiple shares, allowing buyers and sellers to purchase only a portion of the market. This can help ensure that no one purchaser can overbid for the privilege of promoting their (bad) forecast.
  3. Buyers can update their forecasts over time. The contracts cover the average accuracy over that time. This incentivizes buyers to continue to improve their contracts.
  4. Forecasting questions can be resolved and scored by agreed-upon third parties.
  5. There’s an extra reward for a forecast that comes with a good explanation or similar. This could be part of the scoring function.
  6. In some cases, it might make sense for bidders not to have to put any money down. This might be considered gambling. One alternative, if the bidders are trusted, is that they get payments in two parts: one fixed fee, and one Accuracy Agreement. For example, some EA org puts out a call for a forecasting consultancy to forecast 1000 variables - but most of the payment is in the form of an Accuracy Agreement.

Powering Accuracy Agreements with Prediction Markets

Would information purchasers get a better return by subsidizing Prediction Markets, than they would by selling Accuracy Agreements?

I think this difference should be bounded. If/when it is the case that Prediction Markets are more efficient than “Accuracy Agreements”, then the owner of the Accuracy Agreement could just set up Prediction Markets once they own the agreement. This owner could subsidize the market in whatever way would do the best job.

For example, say a purchaser sees an agreement that could make them $50,000 profit without a Prediction Market, or $100,000 if they spend $20,000 on setting up and subsidizing a corresponding Prediction Market. They might bid $60,000 on the contract, and if they get it, then spend $10,000 on the market.

This setup would help encourage both good Prediction Markets, and encourage actors who are savvy about properly setting up and subsidizing said markets.

Bottlenecks

The first bottleneck to actually selling Accuracy Agreements is the legality. It might be tricky to do in a way that’s not considered gambling.

The second bottleneck is finding buyers. Accuracy agreements will be somewhat complicated assets. You’d need sophisticated estimation agencies to do competent estimation and to have the flexibility to take the necessary financial risk.

The third challenge is just the infrastructural and logistical expenses. Setting up software to make this process smooth, and making sure that questions get resolved by third parties, would require some overhead, especially early on.

Next Steps

To make progress, I’d like to see this sort of setup be tested in simple ways. First, talk with lawyers to best maneuver around the first bottleneck. If any readers of this have insight here, please leave comments!

Second, experiment with small contracts (maybe <$5k) and small consultancies. This should be achievable with a small team with a decent reputation with potential estimation consultancies. I could see QURI doing this in the future but I would be happy to see other groups do so first. Early contracts don’t need to allow for themselves to be sold, they can be simple agreements with individual consultancies.

One nice property of these contracts is that they should be able to scale gracefully, in that you could do them on small scales or large scales. So, start small, and as we gain experience, we’ll be able to better scale these setups with larger sums and more diverse Accuracy Agreement purchasers.

 

Thanks to Javier Prieto and Nuño Sempere for feedback

Comments5


Sorted by Click to highlight new comments since:

Based on reading through this, it seems promising - the question is whether it's worth someone overcoming the startup costs / bottlenecks to run a pilot, and I'm not sure it is unless there is a specific promising use case that makes this seem much better than current approaches. Any ideas for specific markets where this is valuable?

Some ideas:

1. CEA / Open Phil wants to contract [Metaculus/Manifold/Good Judgement/Epoch] to come up with forecasts for 100 continuous questions regarding clean meat. They just care that the forecasts are good, they don't care much for which specific methods are used. They also would like to see new forecasting players join the fray and compete for cost-effective work.

The put out an open call for proposals, and say that much of the money will come in the form of a simple Accuracy Agreement (one that perhaps can't be re-sold, for legal simplicity reasons, for now).

2. A government agency wants to outsource "Predictions of construction costs, for a megaproject". This set of predictions will be very expensive to produce and update. Instead of paying consultants in the traditional way, they organize much of this as an Accuracy Agreement. 

3. In 2030, in a hypothetical future world, there are many big forecasting players and purchasers. Complicated forecasting Type Definitions are routinely created for things. Like, "For every US regional area, and every point in time for the next 5 years, how many of each disease will be monitored/registered in that area?". Forecasting purchasers can expect that a public bidding market will return a reasonable result, similar to some current large project bidding procedures now. 

 

I see this approach as a more complex but powerful alternative to the model of buyers now just contracting with specific firms like Metaculus or Good Judgement, which has started to happen in the last few years (for judgemental questions). 

One weakness that these Agreements have is that they require the client (or a third party) to ensure that questions are written and scored, instead of the consultant.

This is a similar issue that Prediction Markets have, but not one that existing forecasting contracts often have. These contracts often have the forecasting contractors do the work of question specification and resolution. 

So, Accuracy Agreements are probably in-between Prediction Markets and current contractor agreements, in complexity.

For big government construction projects, I believe some firms/agencies will do a lot of preparation and outlining, before a bidding process might begin. Getting things specific enough for a large bidding process is itself a fair bit of work. This can be useful for large projects, or in cases where the public has little trust in the key decision makers, but is probably cost-prohibitive for other situations.

I'm not sure about this, but there is a possibility that this sort of model violated US online gambling laws. (These laws, along with those against unregulated trading of securities, are the primarily obstacles to prediction markets in the US.) IIRC, you can get into trouble with these rules if there is a payout on the outcome of a single event, which seems like it would be the case here. There's definite gray area, but before setting up such a thing one would definitely want to get some legal clarity.

I agree, the legal aspect is my main concern, double so if people can exchange/sell these agreements later on. 

Curated and popular this week
 ·  · 11m read
 · 
Confidence: Medium, underlying data is patchy and relies on a good amount of guesswork, data work involved a fair amount of vibecoding.  Intro:  Tom Davidson has an excellent post explaining the compute bottleneck objection to the software-only intelligence explosion.[1] The rough idea is that AI research requires two inputs: cognitive labor and research compute. If these two inputs are gross complements, then even if there is recursive self-improvement in the amount of cognitive labor directed towards AI research, this process will fizzle as you get bottlenecked by the amount of research compute.  The compute bottleneck objection to the software-only intelligence explosion crucially relies on compute and cognitive labor being gross complements; however, this fact is not at all obvious. You might think compute and cognitive labor are gross substitutes because more labor can substitute for a higher quantity of experiments via more careful experimental design or selection of experiments. Or you might indeed think they are gross complements because eventually, ideas need to be tested out in compute-intensive, experimental verification.  Ideally, we could use empirical evidence to get some clarity on whether compute and cognitive labor are gross complements; however, the existing empirical evidence is weak. The main empirical estimate that is discussed in Tom's article is Oberfield and Raval (2014), which estimates the elasticity of substitution (the standard measure of whether goods are complements or substitutes) between capital and labor in manufacturing plants. It is not clear how well we can extrapolate from manufacturing to AI research.  In this article, we will try to remedy this by estimating the elasticity of substitution between research compute and cognitive labor in frontier AI firms.  Model  Baseline CES in Compute To understand how we estimate the elasticity of substitution, it will be useful to set up a theoretical model of researching better alg
 ·  · 7m read
 · 
Crossposted from my blog.  When I started this blog in high school, I did not imagine that I would cause The Daily Show to do an episode about shrimp, containing the following dialogue: > Andres: I was working in investment banking. My wife was helping refugees, and I saw how meaningful her work was. And I decided to do the same. > > Ronny: Oh, so you're helping refugees? > > Andres: Well, not quite. I'm helping shrimp. (Would be a crazy rug pull if, in fact, this did not happen and the dialogue was just pulled out of thin air).   But just a few years after my blog was born, some Daily Show producer came across it. They read my essay on shrimp and thought it would make a good daily show episode. Thus, the Daily Show shrimp episode was born.   I especially love that they bring on an EA critic who is expected to criticize shrimp welfare (Ronny primes her with the declaration “fuck these shrimp”) but even she is on board with the shrimp welfare project. Her reaction to the shrimp welfare project is “hey, that’s great!” In the Bible story of Balaam and Balak, Balak King of Moab was peeved at the Israelites. So he tries to get Balaam, a prophet, to curse the Israelites. Balaam isn’t really on board, but he goes along with it. However, when he tries to curse the Israelites, he accidentally ends up blessing them on grounds that “I must do whatever the Lord says.” This was basically what happened on the Daily Show. They tried to curse shrimp welfare, but they actually ended up blessing it! Rumor has it that behind the scenes, Ronny Chieng declared “What have you done to me? I brought you to curse my enemies, but you have done nothing but bless them!” But the EA critic replied “Must I not speak what the Lord puts in my mouth?”   Chieng by the end was on board with shrimp welfare! There’s not a person in the episode who agrees with the failed shrimp torture apologia of Very Failed Substacker Lyman Shrimp. (I choked up a bit at the closing song about shrimp for s
 ·  · 9m read
 · 
Crosspost from my blog.  Content warning: this article will discuss extreme agony. This is deliberate; I think it’s important to get a glimpse of the horror that fills the world and that you can do something about. I think this is one of my most important articles so I’d really appreciate if you could share and restack it! The world is filled with extreme agony. We go through our daily life mostly ignoring its unfathomably shocking dreadfulness because if we didn’t, we could barely focus on anything else. But those going through it cannot ignore it. Imagine that you were placed in a pot of water that was slowly brought to a boil until it boiled you to death. Take a moment to really imagine the scenario as fully as you can. Don’t just acknowledge at an intellectual level that it would be bad—really seriously think about just how bad it would be. Seriously think about how much you’d give up to stop it from happening. Or perhaps imagine some other scenario where you experience unfathomable pain. Imagine having your hand taped to a frying pan, which is then placed over a flame. The frying pan slowly heats up until the pain is unbearable, and for minutes you must endure it. Vividly imagine just how awful it would be to be in this scenario—just how much you’d give up to avoid it, how much you’d give to be able to pull your hand away. I don’t know exactly how many months or years of happy life I’d give up to avoid a scenario like this, but potentially quite a lot. One of the insights that I find to be most important in thinking about the world is just how bad extreme suffering is. I got this insight drilled into me by reading negative utilitarian blogs in high school. Seriously reflecting on just how bad extreme suffering is—how its intensity seems infinite to those experiencing it—should influence your judgments about a lot of things. Because the world is filled with extreme suffering. Many humans have been the victims of extreme suffering. Throughout history, tort