The following article discusses the results of Hewlett-Packard’s trials with predictions markets in the late 90s. I’m posting my comments as a review and critique of this paper.
Information Aggregation Mechanisms: Concept, Design and Implementation for a Sales Forecasting Problem Kay-Yut Chen & Charles R. Plott.
At the outset, I’d like to commend the authors for publishing their data. Even though these markets were run more than a decade ago, there have been virtually no other published results to date. Unless we are able to review actual case studies of real prediction markets, the future of the prediction market “industry” will be bleak (no prediction market is necessary to reach this conclusion). If I appear to be overly critical of some of the authors’ conclusions and methodology, I apologize. My intent is to point out areas in which prediction markets may be improved for use in a corporate setting.
In this paper, the authors report on the results of HP’s internal prediction markets to forecast sales. The 12 prediction markets were run between October 1996 and May 1999. Their goal was to take prediction markets (Information Aggregation Mechanisms) out of the laboratory and into the field, to see how they work in a practical setting. Most markets attempted to forecast monthly sales of particular products, three months in advance.
To be fair, the design and implementation of these markets was constrained by management. Each market was open for one week only and for a limited time period each day. The number of active participants ranged from 12 to 24, with one that had only seven. Even the authors acknowledge that these markets could only be described as being “thin”. While the participants had access to HP data bases, they did not have access to the official HP forecasts (where available).
The markets were not operated continuously up to the start of the outcome month (or even during that month). This was unfortunate, as we might have learned more about how well (or not) prediction markets incorporate new information to revise market predictions.
Most likely a function of the market thinness (and the double auction market mechanism), the sum of the market prices for each potential outcome (range) did not add up to the market payoff (as it should), and the market prices were not “stable”. This says a lot about the need for a sufficient number of participants (however many that might be). It also says that maybe we do need some form of market scoring rule or a dynamic pari-mutual mechanism, to at least ensure that the probabilities add up the payoff.
The authors conclude that the results indicate that the HP prediction market is “a considerable improvement over the HP official forecast.” Basically, they’re saying that, because in 6 out of 8 events the prediction market error was smaller than the error of the official HP forecast, the prediction market outperforms the HP official forecast. It is true, but we need to take a closer look at the data.
In virtually every case, the prediction market forecast is closer to the official HP forecast than it is to the actual outcome. Perhaps these markets are better at forecasting the forecast than they are at forecasting the outcome! Looking further into the results, while most of the predictions have a smaller error than the HP official forecasts, the differences are, in most cases, quite small. For example, in Event 3, the HP forecast error was 59.549% vs. 53.333% for the prediction market. They’re both really poor forecasts. To the decision-maker, the difference between these forecasts is not material.
There were eight markets that had HP official forecasts. In four of these (50%), the forecast error was greater than 25%. Even though, only three of the prediction market forecast errors were greater than 25%, this can hardly be a ringing endorsement for the accuracy of prediction markets (at least in this study).
Without doing the math, it appears that there is a stronger correlation between the predictions and the HP official forecasts than there is between the predictions and the actual outcomes. But, to make the case for prediction market accuracy, the correlation has to be significant with respect to the actual outcome. It was noted in the study that, in several cases, there was evidence to suggest that the official forecasts were based, in part, on information gleaned from the prediction market exercise. Perhaps this explains the correlation with the HP official forecasts. It appears that many of the participants were also involved in setting the official forecasts. To the extent that they may have dominated the trading in the prediction markets, it is not surprising that the predictions would be closer to the official estimates than they would be to the actual outcomes.
Interestingly, in using the prediction markets to make forecasts, rather than using all of the trades, the authors chose to determine several forecasts based on the last 40%, 50% and 60% of the trades. They argue that the latest trades are more likely to be at or near the equilibrium. Yet, one of their observations is that there were no significant trends in trading (they looked at each 10% of the trades). They speculate that the market quickly aggregates a prediction, with subsequent trading moving the prediction around the equilibrium. If this is true, it makes little sense to exclude any of the trades from the determination of the prediction. Arguing from first principles, we would never want to exclude any trades, because it would interfere with the offsetting of trading errors. Excluding trades means we are excluding the information attached to those trades, which runs counter to the theory behind prediction markets.
Though the prediction market results were “better” than the HP forecasts, some markets were better than others. It would have been nice to know why this happened. To be useful, prediction markets will have to be consistently better performers than other forecasting methods. From this study, we aren’t able to make this conclusion. Unfortunately, the authors don’t delve into this issue.
Perhaps the sleeper conclusion is result 2: The probability distributions calculated from market prices are consistent with (those for the) actual outcomes. This is truly useful information. It gives us a measure of uncertainty or risk. Traditional forecasting methods do not provide this information (at least not objectively). Decision-makers can use this information to focus their efforts more wisely to reduce the uncertainty or more fully develop contingency plans where the uncertainty is greatest.
When I look at the graphs of the distributions, they appear to be fairly widely dispersed, rather than tightly focused around the mean. I’m guessing that the relatively small number of participants and the short trading period had something to do with this. It would have been nice to experiment with longer trading periods and greater numbers of participants to see whether this would have reduced the variance around the mean. It would also have been useful to keep these markets open, so that we could see how the distributions changed as they got closer to the outcome being revealed. After all, one of the major benefits of prediction markets is that they are able to dynamically update predictions.
Result 3 is valuable as well. They argue that the prediction markets were particularly good at predicting whether the actual outcome would occur above or below the HP official forecast. They looked at the direction the distributions of the prediction outcomes were skewed to predict whether the actual outcome would be higher or lower than the HP official forecast. It worked. In all cases they were able to make the correct prediction. Given that the official forecasts were usually wrong (as is the case with most forecasts), knowing whether the actual outcome is going to be higher or lower than the official forecast reduces the error (uncertainty) by at least 50%. There might be something to this analysis, at least for HP’s forecasting. It would be interesting to see if this holds up with other prediction market results. Too bad, no one seems to be looking at this.
My Conclusions (so far)
Run a lot of prediction markets, using a variety of participant sizes, to determine the effects on liquidity, prediction distributions, accuracy and speed of prediction. We need more than a sample of 12 prediction markets. We need more than 7 – 24 participants in each market. Keep the markets running after the initial prediction is determined, so that we can see how the market incorporates new information and how more accurate the prediction becomes. Perform more detailed post-mortem analyses. We need to know why the participants made their trading decisions. We need to know when the market has reached an equilibrium.
Run prediction markets on lots of different things. We need to figure out why some markets are more predictable than others.