Pseudonymous accounts, commonly found on prediction platforms and forums, provide individuals with the opportunity to share their forecasts and predictions without revealing their true identity. While this anonymity can protect the privacy and security of users, it also creates an environment ripe for manipulation. Pseudonymous accounts can present an inconsistent track record by making predictions that support opposing outcomes on different platforms or even within the same community.
One tactic employed by pseudonymous accounts to deceive followers is uncorrelated betting strategies - to place bets or predictions that cover multiple outcomes in an uncorrelated manner. By doing so, these accounts increase the probability of being correct on at least one prediction. For example, if an account predicts a AI will takeoff fast and slow on different platforms, they are essentially hedging their bets, ensuring that they can claim accuracy regardless of the actual outcome. This strategy allows them to maintain an illusion of expertise while minimizing the risk of being proven wrong. Even financial costs to betting can be compensated for given the grant and employment opportunities offered to successfull forecasters.
Another deceptive practice seen with pseudonymous accounts is selective disclosure. This means that individuals only reveal their identity when their predictions have been accurate or appear to be favorably aligned with the actual outcome. By withholding information about incorrect forecasts, these accounts create an inflated perception of their success rate and erode the reliability of their overall track record. Such selective disclosure can mislead followers into believing that the account possesses a higher level of accuracy than it genuinely does.
Relying on the track records of pseudonymous accounts can have significant consequences. Strategists and funders may make decisions based on inaccurate information, leading to impaired impact. Individuals seeking guidance on effective charities might be misled into making donation that are doomed to fail.
While pseudonymous accounts can provide a platform for diverse opinions and insights, it is crucial to approach any purported track records with skepticism. The ability to bet both ways, over multiple bets in uncorrelated ways, and selectively disclose favorable outcomes can create a distorted perception of accuracy.
I think you point to some potential for scepticism, but I don't think this is convincing. Selective disclosure is unlikely to be a problem where a user can only point to summary statistics for their whole activity, like on Metaculus. An exception might be if only a subset of stats were presented, like ranking in past 3/6/12 months without giving Briers or other periods etc. But you could just ask for all the relevant stats.
The uncorrelated betting isn't a problem if you just require a decent volume of questions in the track record. If you basically want at least 100 binary questions to form a track record, and say 20 of them were hard enough such that the malicious user wanted to hedge on them, you'd need 2^20 accounts to cover all possible answer sets. If they just wanted good performance on half of them, you'd still need 2^10 accounts.
A more realistic reason for scepticism is that points/ranking on Metaculus is basically a function of activity over time. You can be only a so-so forecaster but have an impressive Metaculus record just by following the crowd on loads of questions or picking probabilities that guarentee points. But Brier scores, especially relative to the community, should reveal this kind of chicanery.
The biggest reason for scepticism regarding forecasting as it's used in EA is generalisation across domains. How confident should we be that the forecasters/heuristics/approaches that are good for U.S. political outcomes or Elon Musk activity translate successfully to predicting the future of AI or catastrophic pandemics or whatever? Michael Aird's talk mentions some good reasons why some translation is reasonable to expect, but this is an open and ongoing question.
Good point on the correlated outcomes. I think you’re right that cross-domain performance could be a good measure, especially since performance in a single domain could be driven by having a single foundational prior that turned out to be right, rather than genuine forecasting skill.
On the second point, I’m pretty sure the Metaculus results already just compare your Brier to the community based on the same set of questions. So you could base inter-forecaster comparisons based on that difference (weakly).