Prediction markets have been promoted as the best thing since “sliced bread” for forecasting future outcomes and events. The truth is that the case has not been made to justify this position. Today’s post will examine the necessary prerequisite for adopting prediction markets, and build a case for the seemingly incongruous conclusion that more prediction markets need to be put into practice now.
I have been very interested in the potential of prediction markets to accurately predict future events and outcomes, but I have been equally frustrated that not only is there very little proof that they work “as advertised”, very few researchers are even looking at the issue of accuracy. It is as if the vendors and leading academic proponents simply repeat (over and over) a few past “success” stories, quickly conclude that prediction markets work (and if one works, they all work), and proceed to describe their newest application.
Robin Hanson and others have advocated the use of prediction markets, where they can be shown to be better than alternative methods of forecasting future outcomes. It is hard to take exception with this statement, other than to question how one might implement it. That is what prompted today’s paper.
In order to be considered “better” than an alternative forecasting method, a prediction market must generate a marginal net benefit by forecasting more accurately than the next best alternative method. This implies that a more accurate prediction causes the decision-maker to choose a different course of action than the one that would have been chosen had a less accurate prediction (or forecasting) model been relied upon. Not only that, the better course of action must generate a net benefit.
So, the prediction must be materially more accurate than the alternative forecasts. That is, the improvement in accuracy must be large enough to cause the decision-maker to change his or her decision. The decision-maker must be able to choose a more beneficial course of action (it must exist as a possible action). Finally, there must be sufficient time to implement the better course of action. Most of the real world prediction markets have been unable to meet these conditions. The HP markets showed that there was some potential for prediction markets, but none of the pilot markets generated materially more accurate predictions than the official forecasts. The General Mills prediction markets, using much larger crowds than the HP markets, were no better than the internal forecasts, too.
It is very questionable, at this point, whether it is possible to achieve accurate predictions from markets sufficiently far in advance to implement more beneficial courses of action. There are very few long-term prediction markets, and even these are wildly inaccurate until very close to the actual outcome revelation. Operating a long-term prediction market is pointless unless it is possible to take some beneficial action based on an accurate prediction. One can only imagine the harm that could be caused by basing public policy on an early prediction of a long-term market, only to find that the policy was completely inappropriate. Until their advocates work out this little problem with long-term prediction market accuracy, these markets should never be used to support any important decisions.
Most of the real world prediction markets are very short-term in scope. Even when they provide accurate predictions, in most cases, it is almost impossible for the decision-maker to make any significant changes to the course of action. We can see this in the General Mills prediction markets (follow link above), where the markets only arrive at accurate predictions during the second month of a two month sales forecasting problem. Not actionable. Hardly useful information.
One exception to this general observation is the case of markets to predict project milestone completion dates. The reason that these markets offer some promise is that decision-makers can use this information profitably on a daily basis.
So, how do we determine whether a prediction market is accurate? David Pennock helps us out by stating, “the truth is that the calibration test is a necessary test of prediction accuracy.” As he comments, this is a necessary condition for statistically independent events. The problem with this definition is that calibration is impossible to prove. The best we can do is empirically estimate the calibration of a large number of similar prediction market predictions with the distributions of similar outcomes. To date, no one has researched the calibration of specific prediction markets in any useful way. True, there have been studies of horse race betting markets that have shown a very strong calibration with actual horse race outcomes, but this only proves calibration of these types of pari-mutuel markets. Such results indicate that it may be possible to obtain well-calibrated prediction markets, but it certainly does not prove that such markets are, in fact, calibrated.
For more information about this, please refer to my previous post on calibration, here.
Why does calibration matter?
As the number of uncertain future outcomes (or events) grows, they form a distribution, which provides us with the likelihood of each outcome occurring. If we knew the distribution of actual outcomes before one occurred, we could make an optimal decision. We would choose to base the decision on the most likely outcome. This does not mean that we would always be right. In fact, if we were to make this decision a number of times, we would only expect to be “right” about the same number of times as the likelihood of that outcome occurring would suggest. But this is a hypothetical example where we know the actual distribution of the outcomes. In order to make an optimal decision in the real world, we would like to find a method of estimating the distribution of actual outcomes. The better the estimate, the better the decision-making result.
Some situations involve outcomes that are discrete and have no relationship between the alternatives. Examples might include the selection of a future Olympic host city, the winner of a horse race, or who will win a contest. Decisions involving these types of problems require a very high percentage of correct predictions, in order to be useful. Since there is no relationship between the possible outcomes, it is not possible to “just miss” and be “almost right”. Coming close is no good at all. We’re still dealing with a distribution of outcomes, and we will still base our decision on the most likely outcome, but unless one of the possible outcomes has a high likelihood of occurrence, we are likely to be wrong more often than we are right, even when the prediction distribution is accurate. The higher the likelihood of one outcome occurring, the less uncertainty there is about the outcome.
Such discrete outcome situations are problematic for prediction markets. The only way to minimize the percentage of incorrect decisions is to predict outcomes that have very little uncertainty associated with them. If one of the outcomes is a near “sure thing”, we don’t need a prediction market to figure this out! One potential use of prediction markets for these types of problems is to provide a ranking of the possible outcomes. The decision-maker would make a decision based on the most likely outcome and develop contingency plans for other reasonably likely possible outcomes.
Many outcomes are points along a continuous variable, such as dates (on a time line) or sales volumes (part of all possible sales volumes). In these types of situations, making decisions based on a reasonable range surrounding the most likely outcomes may be quite acceptable. It depends on the tightness of the distribution and the sensitivity of the decision to the outcome being relied upon. That is, if the decision would not change when the outcome falls within a certain range, and the outcome can be expected to fall within this range a high percentage of the time, the risk of a “wrong” decision will be minimal.
The closer the distribution of predictions matches that of the actual outcomes, the more often the prediction market will provide an accurate prediction of the actual outcome. This is not to say that the prediction market will always be correct. It only says that it has the greatest chance of being correct most often. Consequently, over a large number of trials, a well-calibrated prediction market will generate the best overall results from decisions that rely on the market predictions.
A prediction market provides a distribution of predictions around a mean market prediction. Most decisions would be made based upon the mean market prediction. If the market is calibrated with the distribution of actual outcomes, this will maximize the number of occasions that the decision will be correct, based on the actual outcome. Furthermore, in non-discrete outcome cases, coming close to the predicted outcome will be the next most likely outcome to occur. Coming close may be good enough.
Comparing Forecasting Methods
Our original problem was to determine whether a prediction market is better than another method in forecasting an outcome. Now that we know a bit about distributions and calibration, we can proceed.
Most forecasting methods provide subjective distributions of forecasts, if they provide any at all. Prediction markets offer a significant improvement over other forecasting methods, by providing an objective distribution of predictions, which can be compared with the distribution of actual outcomes. This gives us the possibility of measuring the calibration accuracy of a prediction market, if we can obtain enough data points to consider. At least it is possible. Most other methods can create a rough distribution of possible outcomes which may be tested for calibration. A good example is a sales forecast with a “worst case”, “most likely” and “best case” scenarios. Likelihoods would be applied (subjectively) to create a rough distribution of possible outcomes.
Next, we need a fairly large number of trials. This is a problem for almost every type of prediction market we may wish to consider. Technically, each outcome or event is unique. We can’t obtain a large number of trials for a particular outcome. However, maybe we can obtain a larger number of trials for a set of homogeneous prediction markets and outcomes. Ideally, each prediction market should have approximately the same “crowd” of participants and be attempting the predict the same type of variable outcome, such as quarterly sales of a product. Another crowd could predict project completion dates, etc…
After a reasonable number of trials, we would measure how well the distribution of predictions matched the distribution of actual outcomes. That is, across all of the prediction markets, prediction ranges that had, say, a 10% probability of occurrence should capture the actual outcome 10% of the time. If this is true for all (or most) of the prediction probabilities, we can conclude that type of prediction market is “well-calibrated” and may be used for future predictions of that type, using that “crowd” of participants. Of course, we would also measure the calibration of the distributions (however crude) from the alternative methods. Whichever method consistently develops the best-calibrated distribution of predictions should be the primary information model for that particular type of decision-making. This doesn’t necessarily mean that you can drop all of the other forecasting methods. These other methods may be generating the information that is being aggregated by the prediction market. If we were to eliminate the source of critical information, the prediction market may not be as accurate. In both the HP and the General Mills markets, some or all of the prediction market participants were also part of the internal forecasting process. At HP, it appears that the markets were better predictors of the internal forecast than they were of the actual outcome.
Every “crowd” is different, and each type of outcome has unique information required to make a reasonable prediction. Consequently, it would be ridiculous to assume that, because one prediction market is considered accurate, all prediction markets are accurate. Yet, this is exactly what we are told on vendor web sites, and worse, by academic researchers. It can probably be taken as a “fact” that horse race pari-mutuel markets are well-calibrated, so it is not surprising that we find almost everyone assuming that these markets are accurate. Add a tie-in about how similar pari-mutuel market are to prediction markets, and we’re half way home.
A few prediction market successes in political election markets and one “success” in enterprise prediction markets are trumpeted, in just about every academic paper on prediction markets, as evidence that prediction markets are “more accurate” than alternative forecasting methods. On the basis of a mere handful of prediction market success stories, they conclude that prediction markets are the future of forecasting. This is simply wishful thinking and leads one to question the motives of those who continue to promote a model that they know (or ought to know) is not nearly as accurate or useful as they claim and has precious little proof that it works for each type of promoted application. The worst part about this is that the research has slowed to a trickle. There seems to be no need to prove that prediction markets work. It has already been done. Now it is all about getting an application on the market.
By now you may be thinking this guy really has it in for prediction markets. They’re nothing but high-tech “snake oil” and the sooner these defective products are removed from the market the better. Fair enough. I do think that the vast majority of prediction markets could be categorized as “snake oil”. Completely unproven. However, I do think they have some potential to improve decision-making enterprise applications.
Since the only way to determine the accuracy of a prediction market is to determine its degree of calibration with that of the distribution of actual outcomes, we need to focus on calibration. The only way to measure calibration is empirically. Since this will require as many trials as possible, I am actually going to advocate that their use be promoted even though there are few benefits right now. As they are promoted, the clients must be told that they aren’t proven, yet, but that there is a possibility that they will develop into very useful tools in the future.
Since calibration is not a characteristic of prediction markets in general, we need to assess calibration for each type of market and for each “crowd”. That is an awful lot of work, but without it, prediction markets are nothing more than a crap shoot.