Hide table of contents

Epistemic Status: This concept is early and speculative

TLDR: Accuracy Agreements are contracts in which one party commits to remunerate the other based on a predetermined rate, proportional to the accuracy of a series of forecasts supplied by the contract owner. This incentivizes the owner to find cost-effective methods for producing calibrated and accurate forecasts.
 

The Basics

The problem: Prediction Markets are effective at determining outcomes for binary questions, but they struggle with probability distributions and more expressive situations, such as those involving functions. Also, often there are only a few potential organizations that manage estimation, and it’s simpler to pay one for estimation than it is to set up a market.

Proposed solution: A client publishes an "Accuracy Agreement" with an associated "Type Specification."

A simple agreement's terms are as follows:

  1. A purchaser of the agreement provides a one-time or continuously-updating forecast that complies with the established Type Specification
  2. Upon reaching a prearranged future date, the Agreement will undergo Resolution, which entails:
    1. The client will determine the answers to the forecasting questions.
    2. The client will evaluate and assign a score to the submitted forecast.
    3. The contract holder at the time of resolution will receive payment based on a function of a scoring rule. An example of this could be: "The average log score of the forecast minus the prior, multiplied by $10,000. Stated differently, $10,000 per bit of information.”

Holding an "Accuracy Agreement" with a specific forecast has a calculable value. If the holder trusts the forecast, the value would equal the expected resolution payment. (Easily calculated using expected loss).

One elegant way the client could ensure a reasonable price is by selling the contract, initially, to the winner of a bidding system. When a question is proposed, potential buyers bid for the Agreement. If a bidder believes their forecast has an EV of "$25,000," they might bid "$23,000" for the Agreement. The winner would either need to submit a prediction immediately, or would have time to generate one.

An Example:

A client publishes:

  1. A list of 50 continuous variables, to be resolved in 1 year.
    1. A set of (weak) priors for those variables.
    2. A simple scoring function, e.g., $1K * total logScore sum.
    3. A future date for bidding to commence (1 month out).
  2. Bidders submit:
    1. A list of forecasts for the 50 variables, each as a probability distribution.
    2. A maximum price for purchasing the agreement with their forecast.
  3. The top bidder is chosen and sold the contract.
  4. In 1 year, the contract resolution occurs.
    1. All (or some) of the variables are resolved.
    2. The contracts’ forecasts get scored.
    3. The agreement owner is paid according to the predetermined scoring rule

Potential Changes / Improvements

  1. An active marketplace. Purchasers can trade these contracts with each other over time, after the initial purchase.
  2. Introduce a market with multiple shares, allowing buyers and sellers to purchase only a portion of the market. This can help ensure that no one purchaser can overbid for the privilege of promoting their (bad) forecast.
  3. Buyers can update their forecasts over time. The contracts cover the average accuracy over that time. This incentivizes buyers to continue to improve their contracts.
  4. Forecasting questions can be resolved and scored by agreed-upon third parties.
  5. There’s an extra reward for a forecast that comes with a good explanation or similar. This could be part of the scoring function.
  6. In some cases, it might make sense for bidders not to have to put any money down. This might be considered gambling. One alternative, if the bidders are trusted, is that they get payments in two parts: one fixed fee, and one Accuracy Agreement. For example, some EA org puts out a call for a forecasting consultancy to forecast 1000 variables - but most of the payment is in the form of an Accuracy Agreement.

Powering Accuracy Agreements with Prediction Markets

Would information purchasers get a better return by subsidizing Prediction Markets, than they would by selling Accuracy Agreements?

I think this difference should be bounded. If/when it is the case that Prediction Markets are more efficient than “Accuracy Agreements”, then the owner of the Accuracy Agreement could just set up Prediction Markets once they own the agreement. This owner could subsidize the market in whatever way would do the best job.

For example, say a purchaser sees an agreement that could make them $50,000 profit without a Prediction Market, or $100,000 if they spend $20,000 on setting up and subsidizing a corresponding Prediction Market. They might bid $60,000 on the contract, and if they get it, then spend $10,000 on the market.

This setup would help encourage both good Prediction Markets, and encourage actors who are savvy about properly setting up and subsidizing said markets.

Bottlenecks

The first bottleneck to actually selling Accuracy Agreements is the legality. It might be tricky to do in a way that’s not considered gambling.

The second bottleneck is finding buyers. Accuracy agreements will be somewhat complicated assets. You’d need sophisticated estimation agencies to do competent estimation and to have the flexibility to take the necessary financial risk.

The third challenge is just the infrastructural and logistical expenses. Setting up software to make this process smooth, and making sure that questions get resolved by third parties, would require some overhead, especially early on.

Next Steps

To make progress, I’d like to see this sort of setup be tested in simple ways. First, talk with lawyers to best maneuver around the first bottleneck. If any readers of this have insight here, please leave comments!

Second, experiment with small contracts (maybe <$5k) and small consultancies. This should be achievable with a small team with a decent reputation with potential estimation consultancies. I could see QURI doing this in the future but I would be happy to see other groups do so first. Early contracts don’t need to allow for themselves to be sold, they can be simple agreements with individual consultancies.

One nice property of these contracts is that they should be able to scale gracefully, in that you could do them on small scales or large scales. So, start small, and as we gain experience, we’ll be able to better scale these setups with larger sums and more diverse Accuracy Agreement purchasers.

 

Thanks to Javier Prieto and Nuño Sempere for feedback

Comments5


Sorted by Click to highlight new comments since:

Based on reading through this, it seems promising - the question is whether it's worth someone overcoming the startup costs / bottlenecks to run a pilot, and I'm not sure it is unless there is a specific promising use case that makes this seem much better than current approaches. Any ideas for specific markets where this is valuable?

Some ideas:

1. CEA / Open Phil wants to contract [Metaculus/Manifold/Good Judgement/Epoch] to come up with forecasts for 100 continuous questions regarding clean meat. They just care that the forecasts are good, they don't care much for which specific methods are used. They also would like to see new forecasting players join the fray and compete for cost-effective work.

The put out an open call for proposals, and say that much of the money will come in the form of a simple Accuracy Agreement (one that perhaps can't be re-sold, for legal simplicity reasons, for now).

2. A government agency wants to outsource "Predictions of construction costs, for a megaproject". This set of predictions will be very expensive to produce and update. Instead of paying consultants in the traditional way, they organize much of this as an Accuracy Agreement. 

3. In 2030, in a hypothetical future world, there are many big forecasting players and purchasers. Complicated forecasting Type Definitions are routinely created for things. Like, "For every US regional area, and every point in time for the next 5 years, how many of each disease will be monitored/registered in that area?". Forecasting purchasers can expect that a public bidding market will return a reasonable result, similar to some current large project bidding procedures now. 

 

I see this approach as a more complex but powerful alternative to the model of buyers now just contracting with specific firms like Metaculus or Good Judgement, which has started to happen in the last few years (for judgemental questions). 

One weakness that these Agreements have is that they require the client (or a third party) to ensure that questions are written and scored, instead of the consultant.

This is a similar issue that Prediction Markets have, but not one that existing forecasting contracts often have. These contracts often have the forecasting contractors do the work of question specification and resolution. 

So, Accuracy Agreements are probably in-between Prediction Markets and current contractor agreements, in complexity.

For big government construction projects, I believe some firms/agencies will do a lot of preparation and outlining, before a bidding process might begin. Getting things specific enough for a large bidding process is itself a fair bit of work. This can be useful for large projects, or in cases where the public has little trust in the key decision makers, but is probably cost-prohibitive for other situations.

I'm not sure about this, but there is a possibility that this sort of model violated US online gambling laws. (These laws, along with those against unregulated trading of securities, are the primarily obstacles to prediction markets in the US.) IIRC, you can get into trouble with these rules if there is a payout on the outcome of a single event, which seems like it would be the case here. There's definite gray area, but before setting up such a thing one would definitely want to get some legal clarity.

I agree, the legal aspect is my main concern, double so if people can exchange/sell these agreements later on. 

More from Ozzie Gooen
82
Ozzie Gooen
· · 9m read
Curated and popular this week
 ·  · 20m read
 · 
Advanced AI could unlock an era of enlightened and competent government action. But without smart, active investment, we’ll squander that opportunity and barrel blindly into danger. Executive summary See also a summary on Twitter / X. The US federal government is falling behind the private sector on AI adoption. As AI improves, a growing gap would leave the government unable to effectively respond to AI-driven existential challenges and threaten the legitimacy of its democratic institutions. A dual imperative → Government adoption of AI can’t wait. Making steady progress is critical to: * Boost the government’s capacity to effectively respond to AI-driven existential challenges * Help democratic oversight keep up with the technological power of other groups * Defuse the risk of rushed AI adoption in a crisis → But hasty AI adoption could backfire. Without care, integration of AI could: * Be exploited, subverting independent government action * Lead to unsafe deployment of AI systems * Accelerate arms races or compress safety research timelines Summary of the recommendations 1. Work with the US federal government to help it effectively adopt AI Simplistic “pro-security” or “pro-speed” attitudes miss the point. Both are important — and many interventions would help with both. We should: * Invest in win-win measures that both facilitate adoption and reduce the risks involved, e.g.: * Build technical expertise within government (invest in AI and technical talent, ensure NIST is well resourced) * Streamline procurement processes for AI products and related tech (like cloud services) * Modernize the government’s digital infrastructure and data management practices * Prioritize high-leverage interventions that have strong adoption-boosting benefits with minor security costs or vice versa, e.g.: * On the security side: investing in cyber security, pre-deployment testing of AI in high-stakes areas, and advancing research on mitigating the ris
saulius
 ·  · 22m read
 · 
Summary In this article, I estimate the cost-effectiveness of five Anima International programs in Poland: improving cage-free and broiler welfare, blocking new factory farms, banning fur farming, and encouraging retailers to sell more plant-based protein. I estimate that together, these programs help roughly 136 animals—or 32 years of farmed animal life—per dollar spent. Animal years affected per dollar spent was within an order of magnitude for all five evaluated interventions. I also tried to estimate how much suffering each program alleviates. Using SADs (Suffering-Adjusted Days)—a metric developed by Ambitious Impact (AIM) that accounts for species differences and pain intensity—Anima’s programs appear highly cost-effective, even compared to charities recommended by Animal Charity Evaluators. However, I also ran a small informal survey to understand how people intuitively weigh different categories of pain defined by the Welfare Footprint Institute. The results suggested that SADs may heavily underweight brief but intense suffering. Based on those findings, I created my own metric DCDE (Disabling Chicken Day Equivalent) with different weightings. Under this approach, interventions focused on humane slaughter look more promising, while cage-free campaigns appear less impactful. These results are highly uncertain but show how sensitive conclusions are to how we value different kinds of suffering. My estimates are highly speculative, often relying on subjective judgments from Anima International staff regarding factors such as the likelihood of success for various interventions. This introduces potential bias. Another major source of uncertainty is how long the effects of reforms will last if achieved. To address this, I developed a methodology to estimate impact duration for chicken welfare campaigns. However, I’m essentially guessing when it comes to how long the impact of farm-blocking or fur bans might last—there’s just too much uncertainty. Background In
 ·  · 2m read
 · 
In my opinion, we have known that the risk of AI catastrophe is too high and too close for at least two years. At that point, it’s time to work on solutions (in my case, advocating an indefinite pause on frontier model development until it’s safe to proceed through protests and lobbying as leader of PauseAI US).  Not every policy proposal is as robust to timeline length as PauseAI. It can be totally worth it to make a quality timeline estimate, both to inform your own work and as a tool for outreach (like ai-2027.com). But most of these timeline updates simply are not decision-relevant if you have a strong intervention. If your intervention is so fragile and contingent that every little update to timeline forecasts matters, it’s probably too finicky to be working on in the first place.  I think people are psychologically drawn to discussing timelines all the time so that they can have the “right” answer and because it feels like a game, not because it really matters the day and the hour of… what are these timelines even leading up to anymore? They used to be to “AGI”, but (in my opinion) we’re basically already there. Point of no return? Some level of superintelligence? It’s telling that they are almost never measured in terms of actions we can take or opportunities for intervention. Indeed, it’s not really the purpose of timelines to help us to act. I see people make bad updates on them all the time. I see people give up projects that have a chance of working but might not reach their peak returns until 2029 to spend a few precious months looking for a faster project that is, not surprisingly, also worse (or else why weren’t they doing it already?) and probably even lower EV over the same time period! For some reason, people tend to think they have to have their work completed by the “end” of the (median) timeline or else it won’t count, rather than seeing their impact as the integral over the entire project that does fall within the median timeline estimate or
Recent opportunities in Forecasting
20
Eva
· · 1m read