Posted by: Paul Hewitt | June 28, 2012

SCOTUS Prediction Markets Fail

Today, the Supreme Court of the United States handed down its ruling on whether parts of the Affordable Care Act were constitutional.  Various sites had set up prediction markets to try to predict how the SCOTUS would rule.  I looked at a few of them.  Inkling Markets had a few, the Wisconsin School of Business had one (on Inkling), and Intrade had a real-money market.

The short story is that none of the markets truly got the prediction “right”.  I’ll explain, below, what I mean by “right”.  Most of the markets were wildly wrong, in fact.  What went wrong?

Wisconsin School of Business Market

This prediction market was only concerned with the individual mandate (IM) component of the Healthcare Reform Act.  The options were to delay the decision, rule the IM unconstitutional and not severable from the remainder of the Act, rule the IM unconstitutional and severable, and rule that the IM is constitutional.

This market was unique, because trading ceased after April 27, 2012, two months prior to the outcome.  All other markets that I looked at either traded right up to the outcome or continue to be traded into the future.

This was the only market that got it “right”, in the sense that if we had relied on the market, we would have predicted that the individual mandate would be found constitutional.  Still, with only a 42.27% likelihood, we wouldn’t have been very certain of our decision.

Here are a few interesting observations.  There was very little trading in this market.  Only 220 trades in total, with most of them taking place on April 4, when the market resoundingly favoured the “unconstitutional and not severable option”.  There were a few small trades for the remainder of the trading period, which moved the likelihoods in the right direction.

Usually, prediction markets don’t start to become accurate until 30 days or so before the outcome.  This market appears to have arrived at a more accurate prediction a full two months before the outcome.  I have a theory as to why, which I will share with you, below.  Here is the trading activity and pricing:

Inkling Markets

Inkling runs a public marketplace, which included a similar, play money prediction market.  Here are the results for that market:

This market closed immediately prior to the outcome being revealed.  There were only 68 trades.  Again, very thinly traded.  The participants were way off in their predictions.  Medicaid Expansion was the one area that was iffy, in terms of its constitutionality, and it was predicted to be least likely to be struck down.  This market got everything wrong – way wrong.  Based on this, I’m thinking a bit like George Costanza on Seinfeld.  Do the opposite of everything you would normally do.

What about the trading activity?  Here’s how the market prices moved while the market was open.

When the market launched, there was a flurry of trading which heavily favoured the IM mandate being struck down.  This option increased in likelihood during the last week or so before the outcome.  The likelihood of the Premature Challenge option jumped a couple of weeks prior to the close.

A couple of observations are notable.  Apart from the Premature Challenge bump, the likelihoods for all of the options didn’t change all that much after the initial trades were made.  Obviously, there were very few trades after the initial activity, which kept the likelihoods about the same.  Could this be an indication that there was no new information available on which to trade?  Possibly, and this would be a good thing, because there was no new information, about the outcome, available to the market participants.  More likely, however, is that traders forgot about the market for a long period of time, until the issue was discussed in the media.

I think the media had a lot to do with the predictions of these markets.  There was no information leaking from the SCOTUS, at all.  There were, however, numerous pundits discussing the issue, and it was a highly politicized conversation.  Even CNN pundits were largely of the opinion that the Individual Mandate would be struck down.  Could it be that the participants relied on the pundits’ “information” in forming their predictions?  I think so.  I also think that predictions were, to a large extent, made based on the outcome the participants wanted to happen.  In other words, the “home team bias” appears to be quite evident in these markets.

The “home team bias” is a bias toward the outcome one wants to occur, not the one that is necessarily likely to occur.  Prediction markets are supposed to eliminate biases, such as this one.  Perhaps political persuasion trumps rational thought.  Lots of evidence of that!

Intrade

Intrade ran the only real money prediction market that I reviewed, yet if fared no better than the others.  Here is a chart of the trading activity.

Again, not a lot of trading, given that this issue is one of the most anticipated events to occur in a long time.  This chart shows trading over the last 17 months.  Once again, there really wasn’t any new information made available to the participants, other than the models they may have used to interpret existing data.  Yet, we see significant swings in the market price throughout the trading period.  How can this be?  Rather than new models or new information, it appears that the only thing “new” in this market is the dissemination of pundit views, which were eagerly lapped up by the participants.

Given the fact that there was very little relevant, new information about the subject, if any, perhaps trading should have been halted before the talking heads began their assault on our senses!

The following chart shows trading for the day leading up to the outcome (June 28, 2012):

The first thing to note is that this market runs until December 31, 2012.  Right up to the time when the SCOTUS decision was handed down, the market predicted that the Healthcare Act would be considered unconstitutional.  It appears that some traders bid the price up, when the decision was released (CNN initially reported that the IM had been declared unconstitutional).  Then, the price plummeted, once the outcome was known with more certainty.  All this shows is that the market can react to new information.  Prior to the release of real, new information, the market had no information to incorporate into the market price.

If we take these two charts together, we see that in the latter chart, the market was able to quickly incorporate new information into the market price, but the price change looks eerily similar to those in the former chart, where there was no new information available to the traders.  In other words, the trading charts look the same regardless of whether the participants are trading on new information or not.  That, is not how prediction markets are supposed to work!

How do we Know the Markets Failed?

Just because a market predicts one thing and something else occurs, doesn’t mean that the market is inaccurate.  Of course, it doesn’t mean the market is accurate, either!

When prediction markets try to predict binary events or discrete outcomes, there is no way to be almost right.  Relying on the market to make an accurate prediction, you will either be bang on or dead wrong.  In the case of these markets, except for the Wisconsin market, the predictions were dead wrong.  We would have made disastrous decisions relying on these markets.

It is difficult to compare these three markets, because they were asking different questions about the same subject.  However, there certainly seems to be a disconnect between the three groups of traders.  We do know one thing.  All of the participants had access to the same information, which was very little.  The Wisconsin market predicted, with a likelihood of 51%, that the IM would be unconstitutional.  Inkling predicted this with a likelihood of 91%!  How can the difference be 40%?  While the Intrade market was looking at any part of the Healthcare Act being unconstitutional, for the most part the issue concerned the IM.  Its prediction was about 70% likely!  Three markets, three wildly different predictions!

Given this wide range of predictions about the same issue, we can honestly say that at least two of them are not “accurate” and probably all three.  One of the requirements of accuracy is that the market be calibrated with the likelihood of actual outcomes.  This is an empirical test, which is impossible to perform on such issues and markets.  If we were able to have the Supreme Court issue numerous rulings on the same subject, we could, theoretically, obtain a distribution of ruling outcomes.  Ideally, we could compare the distribution with those from multiple prediction markets on the same subject.  If every 40% prediction came true 40% of the time, every 50% prediction came true 50% of the time, etc…, we would be able to say that the prediction market was well-calibrated.  We could rely upon the prediction market to accurately predict the likelihood of the various outcomes.

Still, the markets could get it wrong for any single prediction.  The market may have predicted that the IM would be struck down, with a 40% likelihood, but the actual outcome may be the opposite.  This may have been one of the 60% of the times that it would be ruled constitutional.  To further explain this point, if we only considered a market’s prediction to be “accurate” if it did, in fact, come true, the probability would have to be 100%, not 40% or 50% or some other likelihood!

Is There Any Usefulness in These Markets?

There is only one way that any of these markets might be useful to decision-makers.  If we could test for calibration to ensure that the likelihoods provided by the prediction market were reasonably accurate, we could use those predictions to perform a risk analysis.  The distribution of likelihoods given by the market represent the uncertainty surrounding the outcome from the SCOTUS.  Each outcome would have related consequences (costs and benefits), which could be quantified.  The risk for each outcome is the net cost multiplied by the uncertainty of the event occurring.  Contingency plans would be sought for any significant risks.

No one ever talks about this implication, but it is one of the few beneficial features of accurate prediction markets.

Other than that, there is little value in running prediction markets involving discrete outcomes.  This is especially true for markets designed to predict the outcome of a panel of experts, such as the nine judges that make up the SCOTUS, or the panel that chooses future Olympic cities, or the panel that selects the winner of Britain’s Got Talent, or…

Advertisement
Posted by: Paul Hewitt | June 19, 2012

Good Judgment Project Round 2

Well, the Good Judgment Project has started up for the second season.  Again, we will be predicting events around the world.  Last year, my group used a modified prediction market.  It was a bit unusual, because you could rescind earlier predictions and lose nothing.  In my opinion, this created powerful incentives for seeking out risk.  Not exactly what one would hope for when making predictions about the future.  Still, the Good Judgment Project had the most “accurate” predictions among the five groups taking part in the IARPA contest.

This year, we have a true prediction market.  Our group uses a continuous double auction market mechanism to effect trades.  There is no automated market maker, like we had last year.

Generic Problem, Creative Solution

One problem with any prediction market concerns the setting of the initial likelihood of the event taking place.  Ideally, we want to set the likelihood equal to the current best estimate (from whatever sources are available).  That way, there are no windfall gains to be made by those that place the earliest trades.

For example, in a binary contract, if the likelihood of the event occurring is not 50%, based on current knowledge, but the market is set up at 50%, the first trader has the opportunity to make greater than normal profits from very little new information.  In these markets, events to be predicted are released in batches.  An early bird will pick up a lot of free worms.  Over the course of the prediction season, this early bird will acquire significantly more “wealth” than will the late-risers.  Unfortunately, the early bird will be able to have greater influence on future market prices, despite the fact that he or she doesn’t really know much more than the late-risers.

The designers of the Good Judgment Project prediction markets found a novel way of setting the initial likelihoods.  Rather than set a market price for each contract, they set up open bids and asks around the initial likelihood.  In order to make a purchase, at the outset, you have to match the lowest asking price.  Similarly, to make a short sale, you have to match the highest bid.  Of course, you can make your buy or sell order at any price and wait, hopefully, for the market to move in your direction.

Rather interestingly, these markets are designed to generate automatic bids and asks, depending on the price movements in the market.  Let’s say a trade moves the price of a contract from $0.40 to $0.45.  The market will automatically generate a new ask at $0.50 and a new bid at $0.40, unless there is already a bid or ask at that price.  This modifying mechanism creates liquidity, but just a little bit.  Each of these automatic bids and asks is for one share only.

Whether this was intended or not, it does allow one trader to quickly move the market (or manipulate it?) with a small investment.  By successively purchasing (or selling) contracts (one at a time), you can move the market a fair bit with a few trades.  Of course, other traders can bring the market price back, if it is not considered to be “accurate”.

Liquidity

Here, I’m talking about a different kind of liquidity.  There are quite a few markets available for trading, some carried over from last year’s session.  This year, we have been given $100 to trade, which isn’t a lot, but considering that all contracts are less an $1.00, it should be sufficient.  Unfortunately, as you trade, you create a portfolio of contracts and ultimately very little cash.  When new contracts are introduced, we will have to make decisions to get out of existing contracts, if possible, so that we can invest in some of the new ones, which may offer a better return on investment.

Judging by the first couple of days of trading, it isn’t always easy to get out of a position, if the market has moved significantly since the original purchase (or sale).  There will be times when I have to sell off a loss, in order to have funds to invest in a more lucrative contract.  Not only that, I may wish to do so before the new contracts are released, so I will be able to get in early.

How am I doing, you ask?

At this point, on day 2, I’m leading our group with $128.01, with the next closest at $119.42.  This could all change in the next hour, but at least I got an early start to trading this year, and it is reflected in my relatively high “wealth”.  Last year, I didn’t get into the market for a couple of weeks, and as a result I was never able to catch up to the early birds!

One other thing.  It is much more time-consuming to follow the markets and decide upon bids than it was to make simple trades last year!  By the time this is all said and done, I’m sure I will have earned about a penny an hour for my efforts!  Oh well, it’s for a good cause…enlightenment.

I’ll keep you informed from time to time.

Posted by: Paul Hewitt | June 15, 2012

Predicting Facebook’s Closing Price After The IPO

This is the second article about predicting Facebook’s closing share price after the IPO.  The first article, Not So Intelligent Collective Intelligence, examined an attempt to access the collective intelligence of a crowd, through a poll.  The results were remarkably wrong.  The average prediction was $54, but the closing share price after the IPO was $38.23.  The distribution of guesses ranged from $29 to $87 and had a standard deviation of 11.37.  It was a relatively flat distribution, indicating that the participants knew very little about the subject.  There appeared to be a significant herding effect, where those without much knowledge follow the guesses of those that appear to know the subject better.  The attempt was a complete and utter failure.

So, that experiment got me to thinking.  What if they had set up a prediction market instead of a poll?  Would the results have been more accurate?  Would the prediction have been accurate enough to be useful?  This article takes a look at the issues involved in setting up such a prediction market and assesses whether it would have been successful.

 

Setting Up The Prediction Market

There are a variety of considerations that should be addressed when setting up a prediction market.  I won’t go into all of them, here, but I will address the crucial issues and the unique aspects of this particular application.

In the poll, there were 2,261 participants.  So, I’ll assume that we will have a “crowd”.  As we learned in the first article, almost all of the participants were male.  It might be a good idea to attract more females to the group.  We want to get as diverse and decentralized a crowd as we can.  Let’s further assume that we can do this, because we’re most concerned about whether the prediction market model is capable of making an accurate prediction.

 

What Type of Security Should Be Used?

What is the reason for trying to predict the future share price?  Let’s say it is to make a decision about buying or selling Facebook shares.  Do we need an exact prediction, or is a range of possible prices adequate?  If we need an exact price prediction, the prediction market would probably have to be set up as an indexed security.  Participant trades would move the price up or down until the market closed.  Unfortunately, this type of security doesn’t give us much information about the distribution of predictions, which would give us an idea about the amount of uncertainty surrounding the market prediction.

The poll used in the initial experiment appears to have used discrete share prices.  But, given that the share price could be anything between the discrete dollar amounts, I’ll assume that a price of $54 really means any share price between $54.00 and $54.99.  The problem with using such a small range in a prediction market is that there may be too many securities available to the traders.  To cover the same range of prices that was covered in the poll, there would have to be 61 securities!  Note that to be mutually exclusive and exhaustive, we would need to add a security for $28.99 and under and another for $88.00 and above.  Even that setup requires additional thought, because we don’t want to unduly influence the trading by pre-setting the range of securities that are most likely to be true.

 

How Should We Structure The Market?

A better way to figure this out would be to determine the materiality level for the people who would use the market’s prediction.  By this, I mean the size of the error that would cause the decision-makers to change their decisions.  For example, I will purchase 1,000 Facebook shares if the predicted price is $40, but I will only purchase 500 if the price is $35 and only 250 if the price is $30.  From this, it appears that the materiality level is $5 per share.  Therefore, we don’t have to concern ourselves about being too exact in our predictions.

Some research would have to be undertaken to determine the likely minimum and maximum share prices.  Let’s say they are $25 and $60.  Given our materiality level of $5, we’ll create securities covering the following ranges:

<=$25

$25 – $29.99

$30 – $34.99

$35 – $39.99

$40 – $44.99

$45 – $49.99

$50 – $54.99

$55 – $59.99

>=$60

This gives us a reasonable number of securities and covers every possible share price.

Another issue concerns the level of confidence that a typical decision-maker will require before making a decision about whether to invest in Facebook shares.  We may need to adjust the range of values for each security to reflect the standard deviation required to achieve a desired level of confidence.  After trading begins, the distribution of trades will reveal a market prediction and a standard deviation.  The size of the standard deviation will determine the level of confidence that the actual market price will be within the materiality level.

 

How Will Participants Trade?

Here we have a choice between using an Automated Market Maker (AMM), such as the Logarithmic Market Scoring Rule, or using a Double Auction method (usually continuous).  Automated Market Makers are usually used to ensure liquidity in markets that don’t have enough traders.  This method allows anyone to make a trade, even if there is no other participant that wishes to take the opposite position.

The Continuous Double Auction (CDA) method maintains an open book of bids (to buy) and of asks (to sell) securities.  The highest bids are listed first, as are the lowest

Assuming there are enough participants to make a “crowd”, the better option is to go with the Continuous Double Auction, because it requires a greater consideration of the risks of buying and selling securities.  In a CDA market, a trader has to consider that it may not be possible to trade out of a position once it has been taken.  There may not be a willing buyer.  Contrast this with an AMM market, where all trades are executed.  Markets based on an AMM tend to encourage risk-seeking behaviour.  Ideally, we would prefer to have risk-neutral investing decisions, which will provide the most unbiased decisions and the greatest accuracy.  When the highest bid matches the lowest asking price, a trade takes place.

If we go with the AMM mechanism, we may have a problem with too many uninformed traders.  We saw the effects of this in the polling structure (article one), where the chimps swamped the experts.  In a CDA based market, we actually want a few chimps (or chumps), because they will provide the liquidity that makes it possible for the better informed traders to effect their trades.  So, let’s go with a Continuous Double Auction market.

 

Incentives to Trade

One of the key functions of a prediction market is that it gives participants incentives to trade on their privately-held information.  Searching for information is costly.  So, there must be some benefit to entice participants to gather new information.  One way is to provide a real monetary reward, if the information is more accurate than that already in the market.  Unfortunately, in the U.S., real money prediction markets are not allowed (with some exceptions).  So, we have to find another way to compensate traders for seeking out new information.

Studies have shown that play money markets can offer sufficient incentives for participants to gather information and make trades.  Many such markets create a leaderboard to rank the best traders.  Over time, the traders who acquire the best information will make more profitable trades and acquire more “wealth” relative to the uninformed traders (“chimps”).

Interestingly, in my last article, I mentioned that Ville Miettinen seemed to equate credentials with expertise.  Then, he showed that, in the Facebook IPO poll, a lot more non-experts guessed the correct share price than did the “experts”.  In prediction markets, we don’t use outside credentials (degrees, job, position, or any other criterion) to determine whether one is an expert.  Instead, we let the market identify the experts through their superior trading.  Being knowledgeable about social media, for example, does not make you an expert in predicting Facebook’s post IPO share price!

So, we’ll need a leaderboard to keep track of trader performance, and it would be a good idea to have prizes for the top traders.  Being the best trader among thousands feeds the ego quite nicely, but being able to take your spouse out to dinner, with your prize, might make it all worthwhile.

 

Are We There Yet?

Uh, no.  If we set the opening odds, or likelihoods, equally among the securities, there will be windfall profits to be taken by those that make the first trades.  This is more relevant in markets that use an Automated Market Maker, but there will still be some excessive profit opportunities, using minimal information, in markets using a Continuous Double Auction mechanism.  Therefore, we need to set the initial likelihoods based on the best information available.

Since the poll, described in the first article, was set up on an ad hoc basis, we have to assume that this prediction market would have been set up this way, too.  That means that everyone who participates will have the same initial wealth from which they can make investments in the market.

One of the functions of prediction markets is to identify the experts and give them more power to move the market than is given to non-experts.  This is a natural phenomenon of prediction markets.  Those that make the most accurate predictions, earliest, make the most profitable trades and amass the greatest wealth.  Those that trade on erroneous or minimal information make losing trades and end up losing their wealth and exit the marketplace.  The problem with this being the first market is that everyone has the same wealth.  No experts have been identified.  Chimps can move the market just as much as the experts can.

What if there are too many chimps in the market?  In the polling case, we saw that there was a very wide distribution and evidence of herding behaviour.  It’s easy to make the case that the market was dominated by chimps.  In a one vote for all poll, the votes of the chimps ameliorated the accurate votes of the experts (whoever they may have been).  We have a similar problem, here, because everyone has the same trading power at the beginning.

There is only one solution, and it is an impossible one.  There must have been a sufficient number of previous prediction markets, about similar subjects, involving many of the same participants as we have in this market.  That is the only way that the “cream” could rise to the top of the leaderboard and the influence of the chimps could be lessened.  Unfortunately, at this point, we have no choice but to go with the current market, knowing that it is fatally flawed.

What Would Happen?

Given that this is a one-time market, the leaderboard will be based solely on the results of trading in this market.  At least we have eliminated some of the risk-seeking behaviour by going with the CDA mechanism, but there is still likely to be a significant amount of risk-taking among the participants.  Those that purchase the outlier securities stand to make the most profit, if they are correct.  Go big or go home.  If they’re wrong, who really cares?

These are the long-shot trades, similar to picking the long-shot horse in a race.  There’s a well-known long-shot bias in horse racing, where more bets are placed on the long-shot horses than are deserving, based on the actual outcome likelihoods.  It is likely to be even more pronounced in this market.

It is unlikely that the prediction market will yield a distribution of predictions that is as wide as the one exhibited by the poll.  However, it is still likely to reflect a significant amount of uncertainty about the true share price, because the real experts don’t have enough wealth to make the market reflect their information.

The only things that would be positive about this experiment is that we would have an improved leaderboard and we would have some data from which to start assessing the calibration of this type of prediction market.  Many more similar markets would have to be run to identify the experts and enable us to measure the calibration of prediction market distributions with the distribution of actual outcomes.

The point of the exercise has been to show that trying to make predictions using ad hoc markets is frivolous.  We cannot rely upon these ad hoc markets to deliver accurate predictions.  It takes time to operate a sufficient number of similar markets to determine whether they are “accurate”.  There are no shortcuts.  Rather than try these one-off experiments, we should be looking at long-term solutions.

 

Posted by: Paul Hewitt | June 14, 2012

Not So Intelligent Collective Intelligence

If you have read James Surowiecki’s book, The Wisdom of Crowds, you know the story of Galton’s Ox.  Early in the 1900s, a live ox was put on display at a county fair.  People were asked to guess the weight of the ox, once it had been butchered.  Some of the participants were considered “experts” at this, others were not.  Francis Galton obtained the list of guesses and found that the average was within one pound of the actual butchered weight of the ox.  Not only that, the average guess was better than that of any of the “experts”.  And so, with this little experiment, the concept of collective intelligence was born.

There have been many similar experiments involving guessing the date when an event will occur or the quantity of something, like how many jelly beans in a jar.  Usually, the average guess from a large group will be closer to the true date or quantity than the guess from any one person.  There are a few conditions that allows this to happen, as we shall see.

Recently, Facebook announced it’s IPO, and everyone wanted to know what the price would be after it launched.  Apparently, a venture capitalist named Chris Sacca suggested that the “crowd” be asked to predict the price.  James Proud set up a simple website to collect predictions at Facebook IPO Day Closing Price.  The average of the collective guesses turned out to be miserably wrong.  It wasn’t even close.  This group’s average IPO price (at the close of trading after the launch) was $54.  The actual closing price was $38.23.  What went wrong?

 

“Did everything right”

Ville Miettinen wrote a piece, Predicting the Facebook IPO: The crowd gets it wrong, trying to explain why the crowd got it wrong in this experiment.  The article claims that the website “did everything right” in setting up the method of collecting guesses, other than perhaps attaching real money to the guesses.  There was a crowd of 2,261 guessers.  Miettinen claims they were diverse, coming from all over the world, and there was “rampant” discussion on Twitter about the topic.  But, collective intelligence requires more than that!  They actually have to know something about the subject in order to make an informed guess or prediction!

Miettinen notes that all but three of the 26 who guessed the correct price were non-experts.  Curiously, a Google + engineer, tech entrepreneur and a Bloomberg analyst were considered to be “experts” on Facebook’s share price.  These wouldn’t likely be examples of “experts”, on this topic, if I were to make the classification.  Senior investment bank executives, fund managers, and perhaps, senior Facebook management would be my picks, especially if they had inside information about this particular IPO.  One of the features of collective intelligence, especially with respect to prediction markets, is that they can reduce bias in making predictions.

Miettinen claims that “experts”, more so than non-experts, are sensitive to hype.  While there certainly was a lot of hype about Facebook before the launch, I would think “experts” to be more immune to hype than your average person.  More likely, this group of guessers suffered from the common affliction of herding.  When people don’t know very much, they tend to follow the “herd” (do what other people do).

 

The Herd Tweets

It appears that many of the guessers posted their predictions on Twitter, along with comments about the IPO price.  This feature of the experiment may have caused the eventual prediction to be less accurate than it might have been.  I’ll come to this shortly, but there is good evidence that this particular group of guessers held very little useful information about the topic.  Consequently, publicizing information about others’ predictions and spreading hype would have done nothing more than generate herding behaviour among future guessers.  If you think others know more than you, you’re more likely to follow their guesses.

 

Rewarding the Ego

Also, Tweets provided the proof of making guesses.  Once the true price had been established in the market, it would be possible to claim that your guess was right, armed with your previous Tweet.  Would you want to be one of hundreds that guessed the most likely price, or would you prefer to be one of a few that picked the outlier price?  There was no leaderboard or price incentive built into this aggregation method, so the participants may have tried to win the ego “prize” of being one of a very few that got it right.  Even Galton’s Ox contest had a prize.  If the experiment designer fails to provide an incentive for making accurate predictions, the predictions will not be as accurate as they might otherwise have been.

 

Not Even Good Guessers

I must admit, I am completely baffled by the wide range of guesses submitted by the participants.  Even if you don’t know much specific information about something, surely you can make an educated guess!  Not so in this case.  The predicted prices ranged from $29 to $87!  I analysed the guesses of the participants, based on the graph provided on the prediction website, and found that the average price was $54.53, with a standard deviation of 11.37.  Based on the guesses of these participants, the actual price of Facebook would have been expected to fall within the range of about $43 – $66, with a likelihood of 68%.

The wide range of prices and a large standard deviation indicates that there was a high level of uncertainty in the guesses.  Put another way, the participants really didn’t know very much about Facebook, share price behaviour or IPOs.  No method of aggregation (such as averaging) will create the information that is not held by the participants.  By making a guess, each participant injects his or her information into the model (here, the averaging aggregation method).  Since there was very little information held by the participants, the result of averaging was a very poor estimate of the Facebook share price.  Garbage-in, garbage-out.

Contrast this result with Galton’s Ox experiment.  Even the most uninformed townsfolk would have been able to limit his guess to a reasonable range of choices.  An ox weights more than me, but less than my horse.  An ox has bones and hoofs that aren’t butchered.  Therefore, the butchered weight can be confined to a reasonable range.  The result was a relatively tight distribution of guesses.  Each guess would have had a smaller error factor than it would have had, had the range been larger.  The law of large numbers says that these small errors cancel out, leaving a reasonably accurate average prediction.  When the errors are much larger, as they were in the Facebook IPO situation, the errors can swamp the “accurate” portion of each guess.

Another example of a failure occurred when Apple was about to release the iPad for the first time.  Nineteen so-called experts tried to forecast the number of iPads that would be shipped.  None of them came even close.  Their predictions were all substantially lower than the actual shipments.  Again, this was a poll and not a prediction market.  This situation shows that when there are only a few predictors, the errors can be large and they may be mostly, or all, in the same direction.  There is no way for these errors to cancel out.

 

Diversity?

Miettinen wrote that the participants appeared to form a diverse group, because they came from all over the world and shared a variety of views.  Interestingly, based on the Twitter feed of guesses, almost all of the participants were male!  I don’t know whether it would have made a difference, having more women involved, but the ultimate prediction could hardly have been worse!

 

Conclusion

The group of individuals, that were given the task of predicting the Facebook share price after the IPO launch, had very little knowledge of the subject.  Even if a few individuals were sufficiently knowledgeable, their guesses would have been ameliorated by the relatively large number of erroneous guesses.  There appears to have been a significant herding effect, too.  I think we can say that the crowd wasn’t as diverse as it should have been.  So, just because something is “collective” doesn’t make it “intelligent”.  The method of generating a collective prediction was seriously flawed.

What if James Proud had set this up as a prediction market instead?  That will be the subject of my next article.

Posted by: Paul Hewitt | March 17, 2012

15,000 Visitors!

It has been about three years since I started writing this blog and it has finally reached 15,000 visitors.  I have to thank Chris Masse of Midas Oracle for re-posting many of my articles and providing helpful comments.  Initially, I was very optimistic about the prospects for using prediction markets to improve predictions and forecasting in business.  But the more I delved into the research, the less optimistic I became.  My skepticism began when I researched HP’s prediction markets in an Analysis of HP’s Real Prediction Markets.  Unfortunately, just about everyone in the field continues to cite the accuracy of the HP prediction markets as evidence that “prediction markets work”.  They weren’t that accurate, and even if they were, they don’t prove that prediction markets work.

Now, I think there are a few applications that can benefit from using prediction markets, but the list is quite short!  Specifically, prediction markets can be used very effectively in project management to predict completion dates.  It would be even better if combinatorial prediction markets could be used, as I explained, here.  The other application is not really a prediction market, but rather an idea pageant, using crowd sourcing to identify the best ideas among many.  Pretty slim pickings, considering the hype created by James Surowiecki’s Wisdom of Crowds!

Here’s the top 15 articles from my blog, although it’s hard to be sure, because many visitors went to the home page to view an article (click to expand and see the visitor totals):

Interestingly, Prediction Market Prospects 2010 continues to draw readers, though it was written at the end of 2010.  I didn’t feel the need to update it since, because very little has changed since then!  The Future of Futarchy still attracts a few readers, especially when Robin Hansen teaches his students about his concept of “Futarchy”.  It still holds the record for a new post that drew the most readers in one day – 223.

Why Public Prediction Markets Fail, Oscars Prediction Markets Get it Right, and The Oscars 2011 – The Good, The Bad & The Ugly all speak to major problems with relying on prediction markets.  All of the prediction markets noted in these posts displayed significant errors in predicting events, which is especially troublesome when dealing with discrete outcomes.  They also showed that these markets lacked the essential ingredients for success, that can be found in The Essential Prerequisite for Adopting Prediction Markets and The Forgotten Principle Behind Prediction Markets.

So, what’s new?

Well, I have been taking part in The Good Judgment Project which is attempting to accurately predict world events using a variety of methods.  My group is using a quasi-prediction market.  So far, I am doing quite well, but I question how the organizers are determining the accuracy of the markets.

This being an election year in the U.S., much will be written about the amazing accuracy of prediction markets to pick the election winners.  Hog wash.  The prediction markets are very good at aggregating poll results, and they’re really only accurate immediately before the election.

I am presently working with one group that is trying to utilize prediction markets to predict specific future outcomes.  I can’t disclose much about it, but the two main problems are determining how far in advance the markets can proven to be “accurate” and figuring out a way to make money from the information.  I will report on this project when it is no longer confidential.

The ideal in predicting events is to find the Holy Grail that will provide the outcome before any other method…and predict the outcome in time to act on that information.  A high standard, to be sure, but one that must be met for one to get really excited about prediction markets!

Posted by: Paul Hewitt | March 15, 2012

Status Quo Predicting

If you’ve been reading my blog, you know that I am taking part in the Good Judgment Project, where a number of us are predicting future world events.  Now, I wouldn’t consider myself to be all that knowledgeable about world events, but I am doing quite well in predicting outcomes.  In fact, I haven’t got any prediction wrong, except for one fluke that caught almost everyone by surprise.  So much so, that the Project excluded this question from the rankings.

While I may not be an expert in world events, I do have some knowledge in most areas, which I gain by reading The Economist every week, reading local papers, and watching CNN.  So, how am I doing it?

I think the main reason I have been successful is my perspective on the world in general.  We have experienced a huge recession (still are in most parts of the world), financial turmoil in the EU, a U.S. election year, and the Arab Spring.  As a result, except for a few specific instances, I have predicted that most events will not occur.  That is, the status quo will prevail.

Here are a few examples (percentages relate to likelihoods of the status quo outcome):

Will Saif al-islam Gaddafi face trial before March 31?  I bet no at 13% and it now stands at 86%

Will a civil war break out in Syria before April 1?   I bet no at 37%.  It’s now at 69%

Will South Korea announce a policy of reducing Iranian oil imports by April 1?  I bet no at 23% and it’s now at 76%.

Will Asif Ali Zardari lose the Presidency of Pakistan by June 1?  I bet no at 29% and it is now at 41%.

Will the U.N. Security Council pass a resolution regarding Syria by March 31?  I bet no at 27% and it’s now at 69%.

Will the U.N. Security Council pass a resolution regarding Iran before April 1?  I bet no at 41% and it is now at 86%

Will Greece remain in the EU at June 1?  I bet yes (status quo) at 50% and it’s now at 82%.

Most of my other no “bets” were placed at 50% and are now in the 75% – 94% range.

The only exception to the status quo bets was a major bet I placed on whether Wade would be re-elected in Senegal.  I bet that he would not be re-elected at 18% and it is now sitting at 80%.  I decided against the status quo on this issue, because Wade is old (he could die), there had been a lot of political unrest in Senegal, and if Wade did not receive at least 50% of the vote during the election, a run-off election would be called to decide the winner.  In my opinion, Wade will lose the run-off, as the opposition candidates will come together and elect Macky Sall instead.  It’s looking better every day.

Based on this anecdotal evidence, it appears that many of the participants in my group display an initial bias towards change, which lowers the early likelihood of the status quo outcome.  In fact, it appears that most events will not occur before the deadlines, meaning that there should be a natural bias towards the status quo.  By betting the status quo after the early “chimps” have bet on change, one can be very successful, indeed.  I expect to rocket up the leaderboard once the results are announced in early April.  I’ll let you know.

Update April 5, 2012:  Well, I did do quite well on the questions that closed around the end of March, “earning” about $40,000.  This “rocketed” me up the leaderboard from #13 to #9.  The top predictor broke past $100,000 in winnings.  Once you get behind, there’s no way to catch the front runners, because no one loses a bet in this game (at least they shouldn’t).

Posted by: Paul Hewitt | February 8, 2012

Prediction Market Design Issues

I’ve been taking part in the Good Judgment Project, which is attempting to predict global events.  I’ve discussed some of the quirks in the design of the prediction markets.  Here’s a few more.

Initial Likelihoods

Most of the questions involve binary events, but every once in a while there are multiple options in winner-take-all markets.  A recent example is this question (click to expand):

Note that there are five possible outcomes.  When the question was first posted (February 7, 2012), each of the outcomes was given a 20% likelihood of coming true.  As a result, early purchasers (like me) could purchase an outcome that would pay off relatively handsomely.  I was able to pick a very likely winner, simply by acting swiftly.  One very quick Google search yielded enough information to select (b) Henrique Capriles Radonski as the overwhelming favourite.

Of course, the same thing happens with binary markets, where the initial likelihoods are set at 50%, meaning you can double your money by being the first to pick the right outcome. The problem is exacerbated when there are multiple contracts in the market.  In this case, I will quintuple my investment, if correct.

The issue is that some participants can achieve significant “profits” without actually having much real knowledge or superior predictive abilities.  One of the purposes of a prediction market is to identify the best predictors – they end up with the most money, allowing them to have a greater say in future predictions.  In cases such as these, there are windfall gains to be had by the earliest traders, because the initial odds are not set properly.

The odds are set, assuming no information is available to make a decision.  That is, it’s kind of like a five-sided die roll or a coin flip for binary events.  In the above case, there is information available, which should have been taken into account in setting the initial odds.  We can assume there was quite a bit of “good” information about the outcome, because within one day, the likelihood of the frontrunner jumped from 20% to 92.8%!

Trade Limits

Another quirky design issue involves limiting trades to $1,000 per market.  This restricts someone with superior knowledge from having the appropriate amount of influence in a market.  To my way of thinking, this runs counter to prediction market theory.  Essentially, these market quirks preclude a successful trader from being able to justifiably influence any market, but it does allow him or her to invest in more markets than unsuccessful traders.  Of course, the early winners will be able to maximize the number of markets in which they can make the maximum bets.

Short-term vs. Long-term

Another interesting observation is that new questions are added every few weeks.  While I have done reasonably well, I still find that I am almost fully invested, without being able to take positions in all of the markets I would like to.  When new questions arise, I have to decide whether I wish to get out of some markets in order to invest in new ones.  This involves deciding whether it is better to get out of a long-term question (say 3 – 12 months hence), which may pay off handsomely, to be able to invest in one or more shorter term markets, which may collectively pay off even better.

In other words, you have to keep your money “working”.  Long term investments are riskier (more unpredictable intervening events may occur) and tie up your money.  So, even though you may have superior information about a market outcome, it may not be financially appropriate for you to act on it and place a bet.  This provides incentives for traders to invest where they have the best information relative to other traders, even though they may have better information than other traders in many (or all) markets.

This is probably a good thing, but, by design, the overall exchange is leaving “good” information on the table!

Posted by: Paul Hewitt | January 5, 2012

Good Judgment Project Performance

The Good Judgment Team is competing against other teams to see which one is able to “more accurately” predict future events (mainly political, so far).  After the first month of official predictions, the Good Judgment Team released the following statement by email (bold/italics are mine):

“Our forecasters are simply the best!  (That’s not just our opinion:  in the early days of the tournament, the Good Judgment Team’s aggregate forecasts have proven to be more accurate than those of any other research team participating in the IARPA tournament.)”

This got me to thinking.  How is the IARPA determining which Team is more accurate in their predictions?  I’ve posed the question to my team, but haven’t received a response, yet.  So, let’s make a few educated guesses.

Each Team has a large number of participants.  On our Team, there are a number of groups, presumably with some common characteristics, that are each predicting future events.  We took a variety of tests before joining the team, to measure or describe how we make decisions, process information, etc…

Almost all of the questions about future events are  binary.  They will either happen or not, by a specific date.  Our Team uses a modified prediction market to generate a likelihood of each event occurring (more information here).  Now, this is where it gets interesting.  I’m guessing that most, if not all, of the Teams predicted the correct outcomes for most of the questions.  If our Team got one or two more correct than the other teams, does that really mean that we are “simply the best“?

Could it be that our collective likelihoods of the events that occurred were higher than those for the other Teams?  In other words, when an event did happen, our Team gave the event a higher likelihood of occurring.  Jeez, I hope not, for a number of reasons.  Remember, these are binary events.  Just because a likelihood is higher doesn’t mean that it is more correct than a lower likelihood prediction!  What we really want to compare is the calibration of the market predictions with market outcomes.  Unfortunately, there isn’t enough data, yet, to determine whether our predictions are better calibrated than any other Team’s.

These markets are kept open for trading until a day or so before the outcome is revealed, unless the outcome is determined prior to the anticipated closing.  Uncertainty surrounding the outcome decreases as time marches toward the market closing.  Consequently, at the market close, all that should remain is the irreducible uncertainty (random events that affect the outcome).  Accordingly, most market should converge on a likelihood close to 100% for one of the binary outcomes, and there shouldn’t be very much variability among the Teams.

Could it be that accuracy is being determined at various points in time prior to the market close?  It’s a better basis, but again, we can’t prove calibration.  So, this isn’t likely the answer.  Maybe it’s the speed of adjusting predictions, given new information?  I doubt this one, too.  In some cases, information will lead one forecaster to conclude the event is more likely and another to conclude the opposite.  It would be impossible to determine whether the market was incorporating new information in every case.

Maybe our Team won more money.  Nope.  Basically, with an Automated Market Maker, except for the seed capital, it’s a zero sum game.  All teams would do equally well, with the same system.

Conclusion

Let’s forget for a minute that these predictions are pretty useless, if they’re only “accurate” immediately before the outcome being revealed.  How many times have I spouted on about this issue?  Also, they’re predicting binary events.  There’s no such thing as being almost right in a binary market.  So, even though it isn’t theoretically correct, I’m going to guess that the IARPA thinks a higher likelihood prediction is more accurate than a lower likelihood one, when the event does, in fact, come true.  Maybe that’s the best they can do, until they figure out the calibration issue.

Posted by: Paul Hewitt | November 5, 2011

The Good Judgment Project

I have been participating in The Good Judgment Project, one of five teams in a US government sponsored, four year, forecasting tournament.  Each team develops its own methods for forecasting world events.  Our team is based in the University of Pennsylvania and the University of California Berkeley.  I gather each team will be using some form of collective intelligence to make predictions.

This may change, but our present aggregation mechanism is an odd variant of a prediction market with an automated market maker.  Let me explain.  During the first two months, just about every question has been binary (either it will happen or it won’t).  Apparently, there may be some questions that have up to five derivative shares in a winner-take-all market.  All markets involve an automated market maker.

Participants can place trades (up to $1,000) in any market, for the event to happen or not, by a given date.  As trades are filled, the market price changes.  So far, so good.  The twist is that trades can be rescinded at any time up until the market closes or the event becomes known.  When you rescind a trade, you get back all of the money that was originally invested.  Huh?  That’s right, there’s almost no risk of selecting the wrong outcome!  But, part of what makes markets “accurate” is that there is a consequence for being wrong.  Not so here.  In a traditional prediction market, selling out of a position would net you the current market price (not your original purchase price).

At least you can’t take positions in both sides of a binary market!  The market mechanism encourages you to bet the maximum, usually at the beginning of the market.  This will allow you to double your investment (if you are correct).  In some cases, the likelihood will fall and you can generate a higher profit by investing at that point.  Usually, you will want to maximize your bet when you first enter the market, because if you try to revise your bet later, you will receive the new payoff on your entire investment (if correct).

If the odds for the outcome you selected start to fall, but you still wish to hold that investment, you need to continually revise your investment, to obtain the most favorable odds.

The other quirk is that the maximum bet is $1,000 (previously $500).  That’s a minor point, but it does potentially hinder someone with “perfect” information from placing a bet that would move the market to the appropriate likelihood.  Recall that part of the rationale for prediction markets is that it helps identify the best forecasters (they have the most funds).  When you combine this with the failure to penalize poor guesses (by allowing traders to rescind investments without penalty), I’m wondering whether this particular prediction market mechanism will be as accurate as it might otherwise be.

Posted by: Paul Hewitt | August 14, 2011

In Search of a Better Prediction Model

Among other things, Robin Hanson is famous for advocating the use of prediction markets, where their predictions are “more accurate” than other methods of forecasting.  I won’t argue with that, as long as the benefits of being more accurate exceed the marginal costs.  However, if you’ve been keeping up with my blog, you should come away with the thought that I’m not quite as high on the prediction market fumes as some of the other adherents.  I find prediction markets to be wanting in many significant areas.

The Search for Something Better

A few years ago, this got me to thinking.  If prediction markets might be better than alternative prediction methods, could there be an even better model?  And so, I scoured the literature in search of just such a model.  I thought I had found one a couple of years ago, and set out to prove the case for its replacement of prediction markets.

In making my assessment of “better”, in terms of predictions, I considered the calibration of the predictions with the actual outcomes and how far in advance the calibration was reasonably accurate.  I chose to consider the latter characteristic, because prediction markets are notoriously poor at being able to predict anything but very short-term outcomes.

I am pleased to report that my alternative prediction model appears to be better than prediction markets in most respects!  My model was able to match the calibration of prediction markets in every case, but the real benefit was how far in advance my model was able to predict the outcome, with equal or better calibration than prediction markets!  In all cases, my model was very well-calibrated with the outcomes a full two years prior to the outcome being revealed!   To my knowledge, no prediction market has ever been well-calibrated two years prior to the outcome.

Not only that, but my model was able to achieve this level of accuracy for the most difficult to predict outcomes.  Unfortunately, however, my model was not able to forecast so-called “easier to predict” outcomes with the same level of accuracy.

A Model Prediction Model

Coin toss

I’m sure I have kept you in suspense long enough.  My model involves a hand, a wrist and a coin.  Who knew that a simple coin toss might be as good, or better, a predictor of future events than a prediction market?  Very difficult-to-predict binary events have a likelihood near 50%.  If a prediction market for such an event indicates a 50.1% likelihood of occurrence, the decision-maker would predict that the event was going to occur, and he’d be right about 50% of the time.  Same thing with the coin toss, but we can toss the coin two years before the event and get an equally well-calibrated prediction.  For these really-hard-to-predict events, prediction markets, typically, fluctuate all over the map before settling on the safer 50% likelihood.

Earlier, I noted that the model does not work as well with easier-to-predict events, like for example, an event with a likelihood of 75%.  Rest assured, I’m experimenting with a new version of the model which involves bending the coin with a hammer before the toss.  I’ll let you know how that turns out.

One problem with the new model is that it only works on binary events.  However, I’m working on an even better one that will work on a group of mutually exclusive and exhaustive events (winner-take-all).  It involves darts and a dartboard.

Back to the Drawing Board

Obviously, this was intended to be a humorous post, poking a bit of fun at prediction markets and calibration.  This is the lead-in to a series of upcoming posts, in which I hope to tie together the concepts of uncertainty, price distributions, calibration, accuracy, prediction market design, and market mechanisms.  None of these issues has been adequately researched by the major players in the prediction market arena, and it is one of the major reasons why prediction markets continue to flounder.  I hate to think that it is a fear of uncovering evidence that is not supportive of the use of prediction markets that holds back the researchers.

Posted by: Paul Hewitt | August 10, 2011

The Forgotten Principle Remembered

I suppose I should be flattered when another author makes reference to, and adopts, a concept that I developed.  But surely, half the fun comes from the formal citation showing where the brilliant idea was found!  Alas, such was not the case, when I read the recent Forrester Research Inc. report:  How Prediction Markets Help Forecast Consumers’ Behaviors, by Roxana Strohmenger.

In discussing the principles that help ensure prediction markets provide accurate predictions, the author makes reference to “information completeness”, in the following passage:

At the end of the day, a prediction market must have sufficient “information completeness” even if the individuals interacting in the market do not, to accurately predict outcomes with a reasonable degree of certainty.

Here is the passage where I introduced the concept of “information completeness”:

Prediction markets must have sufficient information completeness to accurately predict outcomes with a reasonable degree of certainty.

I added the bold italic parts to show the exact same words in each paper.  I’m still flattered, just a bit miffed.

Galton’s Ox Revisited

One other interesting point in the paper concerned a reference to a recent test in the Netherlands that tried to replicate Galton’s ox experiment (James Surowiecki, The Wisdom Of Crowds).  Using 1,400 guessers (oops again, I mean participants), the average estimate of a cow’s weight was 552Kg, but the actual weight was 740Kg.  The guessers were off by a full 25%!  How could this happen?

The average guess of Francis Galton’s townspeople was remarkably accurate (1,197lbs vs. 1,198lbs).   Clearly, the townspeople were a bit more knowledgeable about the likely weight range of a butchered ox than the Netherlands guessers were about the weight range of a cow.  The author of the Forrester paper calls this “perspective“, which is a good word for it.

I called it  having a minimal level of information about the subject in order to make a prediction.  If you think about the problem, logically, when the townsfolk made their estimates, there was a fairly narrow range of possible weights from which to choose.  We would expect a normal distribution of guesses that would centre around the true weight, given reasonably small estimation errors (which cancel).

The cow guessers didn’t have a narrow range of possible weights (they actually guessed between 108 and 4,500 Kg.)!  The errors would have been much more significant, on average, and much less likely to cancel out when aggregated.

Interestingly, there must have been a few knowledgeable cow weight estimators among the 1,400.  Would a prediction market have provided a more accurate number than the simple aggregation of estimates?  That would have been an interesting follow-up experiment.

On a humourous note, this research paper is the first I’ve read on prediction markets that does NOT mention Robin Hanson.  How can this be?

Posted by: Paul Hewitt | August 9, 2011

Fallacy of Economic Estimates

Back in March, 2009, I wrote about the Fallacy of Economic Forecasts, essentially arguing that economic forecasts are bullshit (or for the faint of heart:   most likely wrong).  In an odd sort of way, the “forecast” was really a future estimate of past economic results.  Maybe I should have changed the title to the Fallacy of Economic Estimates.

Well, in this week’s The Economist, Growth figures:  Six years into a lost decade, there is ample proof of my claim.  The U.S.  Bureau of Economic Analysis (BEA) has revised it’s growth numbers for the 4th quarter 2008.  Initially, it was estimated to contract 3.8%.  This was revised a year later to indicate a much more serious decline of 6.8%.  Now, it has revised the estimate downward still, to 8.9%.

The inaccuracy is blamed on a piecemeal and slow collection of survey data, which gets fed into a national economic model.  Revisions to past estimates are made but once a year.

Perhaps the BEA needs a better model to estimate economic growth!  Maybe take a walk down Main Street and see how many storefronts are for lease.  Measure the length of unemployment lines.  Actually talk to real people about their spending plans.

In March, 2009, I estimated that growth would be down at least 10% compared with government estimates of -1% to -4%.  Seems that my noggin houses a better economic model than the that of the Bureau of Economic Analysis.

Posted by: Paul Hewitt | February 28, 2011

The Oscars 2011 – The Good, The Bad & The Ugly

We already know, or should know, that using prediction markets to forecast who will win what, as determined by a panel, is pointless.  Remember last year’s markets?  The Olympic site markets?  Britain’s Got Talent?  It really is a fool’s pursuit to try and out-guess the people that actually make the choice!

So, knowing that, at best, the Oscar prediction markets are mildly amusing diversions, I present a few interesting observations.

When we use prediction markets to make decisions, we usually make a decision based on the most likely possible outcome in the market.  Consequently, in Oscar prediction markets, when we rely on the markets, we select the actor/movie that the market gives the highest likelihood of winning.  As I have written before, in discrete markets, you will be disappointed using prediction markets.

The Good

Prediction markets at Inkling and HSX had a few amazing successes!  Yes, once again, prediction markets have proven to be remarkably accurate predictors of slam-dunk outcomes.  We can now say, at least anecdotally, that if an Oscar prediction market gives an outcome at least a 70% chance of occurring, we can rely on the market to pick the correct outcome.

Here are the markets that predicted an outcome with a 70%+ probability of occurring:

  • The King’s Speech wins Best Movie (71.28% on hsx)
  • Colin Firth wins Best Leading Actor (89.36% on hsx)
  • Christian Bale wins Best Supporting Actor (77.92% on hsx)
  • Natalie Portman wins Best Leading Actress (81.04% on hsx)
  • Toy Story 3 wins Best Animated Feature Film (94.82% on Inkling)
  • The Social Network wins Best Film Editing (76.29% on Inkling)
  • The Wolfman wins Best Makeup (70.74% on Inkling)
  • Inception wins Best Sound Editing (76.83% on Inkling)
  • Inception wins Best Sound Mixing (77.53% on Inkling)
  • Inception wins Best Visual Effects (93.51% on Inkling)
  • The Social Network wins Best Adapted Screenplay (74.16% on Inkling)
  • The King’s Speech wins Best Original Screenplay (71.52% on Inkling)
  • The King’s Speech wins the Most Oscars (70.1% on Inkling)

 

The Bad

There were a few “upsets”:

  • Alice in Wonderland won for Best Art Direction (18.04% on Inkling), even though The King’s Speech (favourite at 38.25%) and Inception (26.68%) were more likely to win.
  • True Grit was favoured to win for Best Art Cinematography (65.19%), but Inception (11.53%) did win.
  • Alice in Wonderland won for Best Costume Design (31.27%), but The King’s Speech was favoured at 46.67%.
  • The Inside Job won for Best Documentary Feature (30.78%), but Exit Through The Gift Shop was favoured (51.34%).
  • Biutiful (34.94%) got beat out by In a Better World (24.98%) for Best Foreign Language Film.
  • The Lost Thing (6.95%) pulls off a major upset against The Gruffalo (42.09%) and Day & Night (36.89%) to win Best Animated Short Film.
  • The God of Love (12.08%) wins the Best Short Film, beating out front runners, Wish 143 (39.34%) and Na Wewe (27.13%).

There was another possible upset.  The King’s Speech won the Oscar for Best Directing.  Was it an upset?  On the HSX, it was a bit of an upset.  The Social Network was favoured at 54.44%, but The King’s Speech won with 33.48%.  On Inkling, however, the two films each had an identical likelihood of winning, at 43.68%.

Getting Better All The Time?

In most prediction markets, we expect the forecast to get more and more accurate the closer it gets to the outcome being revealed.  In the Best Directing Oscar markets (HSX), we saw the exact opposite!  Basically, it was a two-horse race between The Social Network and The King’s Speech.  The King’s Speech had been steadily becoming less likely to win over the last three weeks of trading.  In normal markets this type of trend would require a steady diet of negative information.  Logically, we would expect sudden jumps in likelihoods, when (if) significant information comes to light about which way Academy voters are likely to vote.  I suppose it is possible for there to be a gradual revelation of information (say one voter/day discloses his vote), it isn’t likely.  The Academy likes to keep these things secret until the show.

At any rate, the market was right, but trending wrong.  Maybe there was some information that came to light, resulting in more uncertainty about the outcome.  Then again, maybe the predictors were really just guessers, and the markets are simply aggregating “garbage information”.  Garbage in, garbage out.

While this may not have been an upset, it does bring up another important issue.  Two prediction markets  trying to predict the same thing, unfortunately, the markets predicted significantly different likelihoods.  There were many examples, here are but a few:

For the Best Original Screenplay, The King’s Speech had a likelihood of winning of 71.52% on HSX but only 53.99% on Inkling.  That’s a difference of almost 18%.  Seems quite high to me.  The same thing happened with the Best Adapted Screenplay, where The Social Network won.  This time Inkling predicted it with a likelihood of 88.93%, while HSX gave it a likelihood of only 74.16% (about a 15% difference).

Suffice it to say, the prediction market “industry” must find out why this happens and how it can be corrected.  Otherwise, these types of markets should be abandoned for serious prediction purposes.  What am I saying?  These aren’t serious prediction markets!  Okay, the industry needs to get to the bottom of this issue, so these types of markets can be used as fair betting markets.

There are several possible reasons for the different likelihoods, and none of them help the case for prediction market accuracy or usefulness (for these types of markets).  I’ve discussed these issues in previous posts (too many to link to), so I won’t do so here.  If you took the time to read The Wisdom of Crowds, surely, you can spend a couple of hours reading this blog to learn the reasons.

Something Doesn’t Add Up

Inkling’s prediction markets consider each award as a separate market, with each nominee being a separate “share” within the market.  Accordingly, the sum of all of the likelihoods of the possible shares always add up to one (1.0 or 100%).  However, on HSX, each nominee is a separate market.  All of the markets (nominees) for a particular category are aggregated to show the results the same way Inkling does, but the sum of the likelihoods did not always add up to one.  In fact, they were often significantly different.

For examples (Award, sum of likelihoods),

  • Best Picture, 93%
  • Leading Actor, 109%
  • Supporting Actor, 110%
  • Leading Actress, 111%
  • Best Directing, 106%

Even though this is a phenomenon created by the structure of the markets, it still begs the question – why?  Shouldn’t the markets have been arbitraged back to a total likelihood of around 100%?  Not only did these discrepancies occur, they persisted!  While I didn’t continuously monitor these markets, I did take snapshots at various times and the sum of the nominee markets rarely added up to 100%.  If I start getting into all of the reasons why this might have happened, this would turn into a book. 

The Ugly

No one told us the writers had gone on strike, again!  A mere eight minutes in and we had barely cracked a smile.  When we did, it wasn’t for anything either of the hosts said, it was for the wink that Anne Hathaway directed at Colin Firth (as the King) in the opening film vignette.  Other than that, there was a lot of odd (not funny) banter between presenters and little to keep us occupied until the next Anne Hathaway appearance.  Their writers were pathetic, but her makeup person seemed to be on his or her game.  Note to the Academy:  hire Randy Newman to write next year’s script.  Either that or put Ricky Gervais on speed dial.

Final Words

For the second year in a row, my picks (from the prediction markets) were better than my wife’s.  All that’s left to be determined is my prize for this feat.

Posted by: Paul Hewitt | February 1, 2011

Disaster Hits Toronto (Few Saw it Coming)!

It has been a relatively mild winter in Toronto this year.  Even parts of the Southern U.S. have been hit harder than we have.  It’s just as well, too.  While we do know how to drive in snow, we’re a bunch of babies when it does come down after Christmas Eve.  We’re about to get hit with a snow storm that is wrecking havoc across the US midwest.  This reminded me of another snowstorm that hit Toronto.

For a bit of comic relief, I present this video news report.  It pokes fun at Torontonians, who seem to have acquired a reputation for being, well, shall we say, a bit sensitive when confronted by inconveniences (or even Acts of God for that matter).  Enjoy.  Being a prediction market blog, I should note that no prediction market could have seen this storm coming.  We were caught completely by surprise.

Posted by: Paul Hewitt | January 12, 2011

Prediction Market Prospects 2010

INTRODUCTION

Gartner Hype Cycle Social Software 2010

As we can see from the Gartner Hype Cycle Graph for Social Software, Prediction Markets are now on the downside of the dreaded “Trough of Disillusionment” (2010). Last year, it was just entering this phase, and in 2008 it was at the most-hyped “Peak of Inflated Expectations”.  The object of this paper is to examine the current status of the prediction market “industry”, discuss several troubling issues that are holding back enterprise prediction market adoption, and look at the prospects for the future.  Even if you get really sleepy reading this paper, keep going to the very end, where I will reveal a very, very long-term prediction!  Can you guess what it is about?

You’re probably already familiar with the following graph showing the Prediction Market growth trend.  It’s the one that appears in many presentations on prediction markets.  As far as I know, the graph hasn’t been updated since 2006.  It sure did look like the market was going to experience explosive growth!  Did it?


Prediction Market Growth Trend 1997-2006, Source: Newsfutures.

According to a McKinsey Global Survey of Web 2.0 adoption, enterprise prediction market “adoption” grew from less than 1% in 2007 to 8% in 2009.  This is how Consensus Point disclosed this results of the McKinsey report.  I looked at the actual McKinsey interactive graphs and found that prediction market adoption was 9% in 2008.  Does this mean that prediction market adoption had already peaked in 2008?  I thought we were just getting started!   If the survey is correct, prediction markets experienced more than an eight-fold increase in usage in the last two years.  Based on what we can see, there appears to be something wrong with the definitions of “adoptions” and  “prediction markets”.  Alternatively, prediction market adoption is taking place behind closed doors or it isn’t really happening at all.

If the adoption rate is correct, why aren’t we seeing a significant spike in reported success stories?  There has been very little reporting of any prediction market results – good or bad.  I suspect the companies that have “adopted” prediction markets have done so in very limited pilot studies.  Here’s another possibility.  A quick review of several vendor websites indicates that many of the success stories involve idea pageant (or idea market) “prediction markets”.  I’m willing to bet that the companies that implemented these “markets” were included in those “adopting” prediction markets.  While this type of market does involve collective intelligence, it isn’t really a prediction market.

To start the review of the current status of prediction markets, let’s check in with Jed Christiansen, who recently posted his take on the industry.

There was nothing new in Jed Christiansen’s Prediction Market Review for 2010. His comments are correct, but he didn’t provide much commentary about the reasons for the developments over the past year.  Essentially, his summary was as follows:

  1. Real money betting sites are booming
  2. Free public prediction markets cannot survive without monetizing site traffic
  3. Software vendors are providing more consulting services to their clients

He sees the PM industry as “maturing”.  Existing vendors will continue to establish themselves, “as more companies experiment with new management tools and techniques.”  The problem with the industry is that the product is still in its infancy.  I don’t think you can call a market “maturing”, when the majority of the clients are merely experimenting with the concept of prediction markets and the “product” is, basically, still a concept.  As we saw at the beginning of this post, prediction markets appear to be firmly entrenched in the trough of disillusionment.  Furthermore, Gartner estimates that mainstream adoption is 5-10 years away, the same estimate they gave in 2009 and 2008.

Not only is the industry mired in the trough of disillusionment, I think the primary researchers are stuck in one too (with one notable exception)!  Over the last few years, there have been no important new research studies, no significant published prediction market trials, and no major prediction market issues resolved.  It is as if the researchers don’t want to look too closely at the issues for fear that some of them may seriously undermine the usefulness or potential of prediction markets.

I exclude one researcher (and his team) from the list of disillusioned researchers.  During the year, David Pennock and his group at Yahoo! Research, launched Predictalot to showcase a fairly complex example of a combinatorial prediction market.  So far, it has been used to predict the winners of the NCAA March Madness basketball tournament and the World Cup.  On a humorous note, Predictalot and its developers received the Best Prediction Market Development of the Year for 2010. I’ll have more to say about the significance of this development, below.

Let’s look at the reasons behind Jed’s industry developments, which will lead into a discussion of the issues holding back the adoption of prediction markets and the future prospects for the industry.

 

Betting Markets

Real-money prediction markets are booming and expected to continue to boom, not because they are good predictors, but because betting is booming.  The major players are Betfair and Intrade, neither of which spout on about the predictive abilities of their markets.

Discrete outcome markets (like horse races) are perfect for betting but not nearly as useful for making predictions and the decisions based upon them.  Most of the markets generate predictions that are too general or too public to be useful.  The value of information depends on having it before someone else and being able to act upon it.  Since these markets are ill-suited for useful predictions, their success will depend almost entirely on the public’s desire for betting opportunities.  Personally, I think these types of markets should be excluded from the definition of prediction markets.  Horse race odds are considered to be pretty good predictors of the race outcome, but we don’t consider horse race betting pools to be prediction markets.

 

Public Prediction Markets

Most public prediction markets are not very useful, at all.  Even if they were proven to be accurate, no one would pay for information that is already publicly available.  With few ways to generate revenue, growth prospects are bleak. Hubdub ceased operations during the last year.  While it was fun to play on their prediction markets, participants became disinterested as the novelty of “betting” on trivial outcomes wore off.

No amount of explaining will convince participants that it was a good thing that Susan Boyle lost Britain’s Got Talent, even though she had a 78% chance of winning.  Once we’re done explaining that, we can take a stab at explaining why there was such a wide variance between Hubdub’s (78%) and Intrade’s (49%) likelihoods of her winning.  Personally, I think she should have won!

HSX and IEM run somewhat more useful markets, but neither is very good at accurately forecast long-term outcomes.  Forecasting short-term outcomes is not particularly useful.  Unless HSX can be turned into a real-money market, the prospects for any commercial success are minimal.  However, this and other public markets are still valuable for research purposes.

Don’t expect any growth in this sector.

 

Vendor Consulting Services

This is a growth area, because their clients are ill prepared to create useful prediction markets without guidance.  Failed trials mean the client companies will stop experimenting with prediction markets.  Vendors help their clients achieve reasonable prediction results.  None of the existing vendors can survive on software sales, alone.  Vendors should try to get as many trials as possible and investigate the unresolved prediction market issues (see below).

There will be few new vendors, because the prospects for enterprise prediction markets are not very rosy (more about this, below).

 

WHAT IS HOLDING BACK ENTERPRISE PREDICTION MARKETS?

It’s no secret that prediction markets have not taken off in the corporate world.  Don’t corporate decision-makers know a good thing when they see it or is there something wrong with the product?

Since getting involved with prediction markets, I have maintained a list of issues that remain unresolved.  In my opinion, not resolving these issues is the reason enterprise prediction markets have failed to take hold in the marketplace.  Despite several researchers – especially Robin Hanson as the most published adherent – stating that prediction markets are at least as accurate as other forecasting methods, the case has not really been made (at least not to my satisfaction).

As we will see, prediction markets are unable to accurately predict long-term outcomes, and they have poor records for accuracy and reliability, all of which are crucial for enterprise adoption.  I haven’t mentioned the issues of market design, participant training, number of participants, etc…, because these things are easily solvable.  It makes little sense to tackle these issues, unless the important issues are resolved first.

 

“Just in Time” is Not Timely Enough

Prediction markets need to be able to forecast long-term events.  In order to make long-term decisions, we need information about conditions, events and outcomes that will occur far off in the future.  Well, at least longer than a month or two!  While there have been several long-term prediction markets (public ones), not one has provided an accurate prediction of the future outcome, until very close to the time when the outcome was revealed.  Such predictions, no matter how accurate, are not actionable.  In other words, these markets have been wholly inadequate for management decision-making purposes.  The use of prediction markets to forecast any long-term outcome is questionable, if not down-right dangerous.

The following two graphs of historical prices in two long-term (14 year) prediction markets are  from Ideosphere.  In both of these markets, the predictions only became reasonably accurate during the last year before the outcomes were revealed.  Of course, some prediction market advocates will argue that the markets were accurate throughout the trading period.  The market price, at any point in time, accurately reflects all available information in the market at that time.  Consequently, the markets are considered “accurate”.  However, they aren’t accurate, if our purpose is to rely on them to make decisions about outcomes in the long-future.

Unfortunately, even if these long-term markets are “accurate” several years away from the outcome, we have no way of knowing whether they can be relied upon.  It is impossible to verify the calibration of these markets (though it has been claimed that they are – 30 days before the market close).  It is difficult to imagine that these markets were calibrated back in 1998, where the market prices were approximately 75% – 80%, yet the eventual closing prices were 0%.  It’s possible, but highly unlikely.  It is much more likely that these markets were reflecting a significant amount of uncertainty about the outcome.

Ideosphere 14 year Cancer Cure Market

Ideosphere 14 year Earthquake Market

The longer the trading period of the market, the more sources of uncertainty there will be.  The steady march of time gradually reduced the uncertainty in these markets.  It is as simple as that.  Even if it were possible to acquire enough information to reduce the uncertainty surrounding the outcome, it is highly unlikely that the incentives would be enough to cover the search costs.

I don’t have the answers as to why these markets have not worked, but here are a few possibilities:

  • Traders are not patient enough to bet on long-term events.  They want to make a trade and quickly find out whether they have won.
  • The longer the time period between the prediction and the outcome, the more likely it is that there will be more random, intervening events that affect the outcome, increasing uncertainty.
  • Intervening events that have a complex influence on the outcome will increase uncertainty around the prediction AND increase the likelihood of a wrong prediction.  Such outcomes may not be predictable by any method.

As the markets move closer to the outcome, uncertainty about intervening events decreases.  Generally, about 30 days before the outcome, the markets become reasonably accurate.  In fact, for most of the period the prediction markets were in operation, the predictions were wildly inaccurate! The question is whether there this is enough advance notice for the prediction to be acted upon, making them useful.

Here is an example from IEM, used to show how even fairly heavily traded markets are unable to make actionable predictions until very near the market close.

IEM 2006 US Congressional Control Market

Note in the Congressional Control Market for 2006, the market prediction was inaccurate until a few days before the election.  For decisions that need to know which way the election would go, the prediction would likely be too late.  Most long-term markets exhibit this characteristic.

Accuracy

The Hewlett Packard pilot was one of the first studies of enterprise prediction markets (my commentary, here).  Even though it is over 10 years old, it is still the most often cited case!  This pilot study found that 6 of 8 markets outperformed the company’s internal forecasts.  That’s pretty good, except that the “better” predictions were only slightly better and three of the predictions were really poor (greater than 25% error).  One of the study’s authors commented:   “The accuracy improvement was not high enough to be adopted,” says Chen. “You need to be a lot more accurate before it’s worth it to implement a new process.”

We can say that these markets were effective aggregators of participant information.  When you consider that the participants in the prediction market trials were also involved in making the internal forecasts, it is not difficult to understand why the prediction markets were better at predicting the internal forecasts than they were at predicting the actual outcomes!  Unfortunately, prediction markets need to be good at predicting the future outcomes.

The General Mills trials showed that prediction markets were as good as internal methods, but they were not significantly better and some of the internal forecasters were also participants in the prediction markets.  It should be kept in mind that these were very short-term predictions, such that it would have been almost impossible to act upon the predictions.

Pennock et al showed that prediction markets were accurate (in the cases they studied), but they were not significantly more accurate than alternative prediction methods.  They concluded that in order for prediction markets to be useful, they must be significantly better than alternative forecasting methods.  In the cases they studied, they found prediction markets were only slightly better than other methods.  In previous posts, I introduced the concept of materiality to the analysis of prediction markets.  Essentially, for a prediction market to be useful, it must be more accurate than the next best predictor, such that the more accurate prediction would make a difference to the decision-maker relying on the forecast.  Then, we need to look at the costs and benefits to determine whether the use of prediction markets is a wise course of action.

One of the measures of accuracy is calibration.  We can be fairly sure that horse race odds are well-calibrated with race outcomes, because we can analyse thousands of homogeneous races to prove the claim.  Unfortunately, we are hard pressed to find more than a handful of similar PMs from which we might test the PM’s calibration with the outcomes.  Yet claims are made that PMs are reasonably well-calibrated and “therefore, they are accurate.”

Given the above comments about long-term PMs, we have to ask, when is a PM “well-calibrated”?  Is it when the market closes?  If so, the prediction is useless, because it cannot be acted upon, even though it may be quite accurate.  Is it 30 days before the outcome of a long-term PM?  If so, this is a bit better, but still pretty useless.  Is it near when the market opens and continuously until the market closes?  This would be ideal, but it is highly unlikely to be the case.

Galton’s ox and the missing submarine stories are examples of collective intelligence, not prediction markets, yet they are frequently cited as proof that prediction markets are accurate.

Reliability

In order to be useful in an enterprise setting, prediction markets must reliably provide accurate predictions of future outcomes.  Furthermore, they must be at least as accurate and timely as other traditional forecasting methods, and hopefully, make predictions at a lesser cost.  Here, reliability means consistency.  The same type of prediction market must consistently provide more accurate forecasts than other available means.

In the discussion about long-term markets (above), we found that PMs were very unreliable until close to the time the outcome is revealed.  This brings up a couple of crucial questions.  How far in advance can prediction markets make accurate predictions?  How will we know the point in time when a prediction is “accurate”?

Recall the Susan Boyle Britain’s Got Talent markets.  Why are there wildly different predictions of the same outcome in different prediction markets?  How do we know which market is accurate?  Is it a matter of prediction market efficiency?  If so, how do we know whether a market is efficientRajiv Sethi provides us with an approach to determining which market is more efficient, but not whether the market is sufficiently efficient.  Are there differences in participant information in the two markets?  Is there a lack of diversity in one of the markets?  Evidence of Cascading?  Herding?  Are there inadequate incentives to acquire and reveal information in the markets?  Does sufficient information exist in one or both of the markets?  If not, both markets may be aggregating guesses rather than informed opinions.

Prediction markets are touted as being excellent information aggregation methods, and by all accounts, they probably are very good at this.  It almost seems too obvious to mention, but I will anyway.  In order for the markets to provide accurate, reliable predictions, there must be a sufficient amount of information available to be aggregated.  No one is really looking at this issue, yet it is crucial to success of prediction markets.  This is the issue of information completeness.

 

THE FUTURE OF PREDICTION MARKETS

Where to from here?  Despite the significant unresolved issues, I still believe prediction markets have potential (though not as much as we all once thought).

 

Can PMs ever replace traditional forecasting processes?

Probably not. As discussed, the HP and General Mills prediction markets used individuals involved in the internal forecasting process.  Accordingly, the HP predictions were closer to the internal forecasts than they were to the actual outcome.  At General Mills, both the predictions and the internal forecasts were very close.

The nagging question is, if the internal forecasting processes had not been in place, would the prediction markets have been as accurate as they were?  We may never know, because I doubt there are any companies willing to test this proposition.  My intuition tells me that stand alone prediction markets would be less accurate than internal forecasts as well as PMs in conjunction with internal forecasts.

I’m not arguing that prediction markets are poor aggregators of information.  The reason for the lesser accuracy of stand-alone prediction markets is that there is much less information to aggregate (without the internal processes to search for information).

 

Is there a place for PMs to supplement traditional forecasting methods?

Yes.

Prediction markets involve a relatively small marginal cost.  So, it is relatively painless to implement key prediction markets to supplement traditional forecasting methods.  Some of the benefits are:  the ability to quickly check the internal forecast for significant deviations from the prediction (which can be investigated), more information by incentivizing participants to search for more information, and a reduction of forecasting bias.

The real benefit, in my opinion, is that prediction markets provide a better measurement of uncertainty around the outcome than do traditional forecasting methods.  It does this in the form of a distribution of predictions, which can be seen visually and measured by the standard deviation.  The information can be used to identify the need for further information and can be used in risk management and contingency planning.  In addition, management can measure the reduction of uncertainty over time as new information is revealed or possible sources of uncertainty are removed.

One of the most promising applications is in project management.  Task and project completion forecasts involve the most bias, and prediction markets have the potential to significantly decrease this bias.  While long-term predictions are not particularly useful, short-term ones appear to be reasonably accurate and prediction markets have been shown to quickly aggregate known information.  In managing projects, it is important to obtain very short-term forecasts for task completion, so that corrective action may be taken.  Prediction markets appear to be particularly well suited to this task.

Projects can be separated into tasks along the critical path, and PMs can be put in place to predict completion dates for these tasks.  Because completion dates are continuous variables, coming close to the actual outcome will often be good enough, even if the prediction market is not a perfect predictor.

An interesting avenue of research would be to create a combinatorial prediction market in which all of the critical tasks are linked to the total project completion date.  (See additional comments below).

 

IDEA PAGEANTS

While they are not really prediction markets – they’re more like weighted opinion polls or high-tech suggestion boxes – they are usually counted as being “prediction markets”.  Oddly, these types of information markets make up the majority of “prediction markets” in use.  They also have the greatest growth potential.

Idea pageants generate ideas quickly, at a very low cost.  They are relatively easy to understand and implement.  These applications don’t need a high level of accuracy to be useful – companies can investigate the top 10 ideas vs. needing to know the best one.  Management doesn’t have to delegate all authority to the market.  Weak or impractical ideas are quickly filtered out, but decision-makers are free to investigate all ideas, not just those that have high probabilities of success.

Based on the knowledge that the further away from the outcome, the greater the possible number of events occurring that would affect the outcome, predictions will be inaccurate and/or widely dispersed, until near the time the outcome becomes known.  These intervening events are random, but the likelihoods are not (in most cases).  Another possible application is to create markets, similar to idea markets, except that they would identify possible future events that might affect the outcome that we are trying to predict.  This information, combined with prediction markets to estimate the likelihoods of these events occurring would add useful information to the market predicting the outcome of interest.

For example, we could predict the likelihood of a truckers strike during the third quarter, which could be used to make a better prediction of third quarter revenue (the outcome of another prediction market).  Eventually, it might be possible to link the potential intervening events to the outcome in a combinatorial prediction market.

 

COMBINATORIAL PREDICTION MARKETS

Continuing with the previous example, we might apply Robin Hanson approach.  Much of his work in the area of combinatorial prediction markets focuses on conditional probabilities.  He might run two prediction markets.  The first would predict 3rd Quarter revenue given a truckers strike.  The Second market would predict 3rd Quarter revenue, given no strike.  The difference between the two predictions would be the forecast cost of a trucker strike (in terms of revenue lost).  Robin calls these decision markets, and they form the backbone of his futarchy concept.  Decision markets represent one form of a combinatorial prediction market.

With great fanfare, Crowdcast released their innovative trading platform designed to make trading more intuitive.  Essentially, it is a mechanism to allow traders to bet on user-defined spreads. For example, revenues will fall between $1.2m and $1.4m or $1.85m and $2.12m.  It allows traders to make combination bets for any range they choose.  While I think this innovation has potential, there may be a number of tricky issues regarding the effects of assumptions required to make this platform work.  Still, it is a promising development.

Combinatorial prediction markets make an awful lot of sense, if they can be practically implemented.  The above types of combinatorial prediction markets are relatively easy to implement. Perhaps the most difficult to design and implement is the type of combinatorial prediction market developed by David Pennock and his group.  While it is used for sports betting (play market), the concepts may be applied to enterprise prediction markets.

Predictalot provides a working example of a fairly complex combinatorial prediction market, which involves combinatorial betting on the NCAA March Madness and the World Cup.  For example, if Duke is predicted to win the championship, this automatically increases the likelihood of Duke winning in all of the rounds leading up to the final.  Also, if Duke is predicted to win in the first round, this increases the likelihood of Duke winning the championship.  This platform allows bettors to bet only on those things that they have knowledge.

The same combinatorial prediction market concept could be applied to project management.  It is difficult to predict the completion date of a complex project (Predictalot Champion).  Some participants will have specialized knowledge of the task (Predictalot Team) they are working on, but little knowledge of other tasks along the critical path.  A combinatorial market would allow participants to trade on those outcomes in which they have knowledge.  The market structure will implicitly incorporate the predictions of tasks into the prediction of the overall project completion date.  Similarly, the prediction of the overall project completion date will influence the predictions of the various tasks along the critical path.

This is an important development, because traders may have specific or local knowledge about one or more components of an outcome, though they  have little knowledge about the eventual outcome itself.  A single prediction market for the project outcome may fail, because there is not enough information about the outcome to generate an accurate prediction.

While it is true that a project outcome could be split into several prediction markets to predict the required tasks.  The problem is that each prediction market may be too thinly traded to generate an accurate prediction.  Also, there is no automatic inclusion of the task predictions in the project outcome prediction.  A combinatorial prediction market has the potential to solve this problem and generate better predictions of the outcome.

Looking at a more generalized application, many outcomes are dependent (or conditional) on other events, actions or conditions.  In order to better predict an outcome, we would like to know the factors that will have an effect on the outcome (discussed in the Idea Pageant section, above), and we would like to know how likely these factors are to arise.  We could set up a series of separate prediction markets to predict the likely effects of each of the factors that will affect the outcome.  The results of these markets would be available to the traders predicting the outcome of interest. While this is better than existing prediction models, it’s not ideal.   Alternatively, the factors can be combined with the outcome in a combinatorial prediction market, allowing the likely effects of the factors to be automatically incorporated in the outcome prediction.

Certainly food for thought, and it is the reason that I selected Predictalot as the most important development in the area of prediction markets for 2010.

 

YOUR REWARD FOR READING THIS FAR!

No discussion of the future of prediction markets would be complete without commenting on the most comprehensive system of prediction markets ever conceived.  Of course, I’m talking about Futarchy , one of the New York Times buzzwords for 2008.  Sadly, for Robin Hanson, its creator, Futarchy has failed to take hold, anywhere.  If the concept had any merit, Surely, at the very least, it would have been implemented in some small, South Pacific island nation by now (it hasn’t happened).  About a year ago, I commented on the Future of Futarchy, where I dismissed the concept.  Despite this, I see that in December 2010 Robin Hanson is still trying to promote the idea!  While I disagree with Futarchy, I do heartily endorse his use of decision markets.

If there were a long-term prediction market on whether Futarchy would be implemented anywhere in the world in Robin Hanson’s lifetime, the price would be flat-lining on $0.00.  Occasionally, the market price it would jump up to $0.50 (reflecting Robin’s trades), only to be smacked down by Mencius Moldbug’s trades.  I suspect there will be a smirk on Robin’s face each time the market corrects his attempt to manipulate the market.

This market illustrates another key aspect of prediction markets.  The outcome must be clearly defined.  In this market, “Robin Hanson’s lifetime” is defined to mean his lifetime in his current body.  It’s no secret that Robin wishes to have his head lopped off (when he dies, not before) and cryogenically frozen, to be thawed at some time in the future when bodies will be more “durable” or when brains can be downloaded into some robot-like “life” form.  No word, yet, about whether the good professor’s wife will be similarly decapitated.  Without this clear definition of the outcome, we wouldn’t be able to collect our bets, and it is likely that if brain cloning is possible, so is Futarchy!

So, my forecast is that Futarchy will never come to fruition and it should be cryogenically frozen now, too.

 

FINAL THOUGHTS

It has been quite an undertaking putting this paper together.  Undoubtedly, I have missed a few key items, for which I apologize.  As always, your comments are appreciated.  While there have been few new developments, there are still many tasks to be completed, if enterprise prediction markets are to gain traction in the market.  In writing this paper, it became evident how most of the major issues remain unresolved.  I hope that some of the researchers will get over their disillusionment and ascend the slope of enlightenment!  If so, I promise to get out of my own trough of disillusionment with respect to prediction markets!

Today, it was announced that David Pennock, and his team of researchers at Yahoo, has been given the prestigious Futurology Research & Astrology Foundation award for Best Prediction Market Development of the Year for 2010.  Their work in developing and launching Predictalot is a ground-breaking achievement in the field of collective intelligence.  While Predictalot has only been used to predict sports tournament winners (NCAA basketball and the World Cup) thus far, the combinatorial prediction market and related software will provide an important new platform for more accurate enterprise prediction markets in the future.  Other team members included:  Mani Abrol, Janet George, Tom Gulik, Mridul Muralidharan, Sudar Muthu, Navneet Nair, Abe Othman, Daniel Reeves and Pras Sarkar.

Commenting after the awards ceremony, Mr. Pennock said that, “it is a great honour and I’m proud of the entire team that brought this important concept to fruition.  Frankly, it’s a bit ironic – we just didn’t see this award coming at all!”

Posted by: Paul Hewitt | March 23, 2010

Paul Krugman Makes a Boo Boo

In Paul Krugman’s blog entry, Done, at 4:39pm (EDT) on March 21, 2010, he commented:  “OK, nothing is sure in this world. Intrade is still giving Obamacare a 2.2% chance of failing, …”

He was talking about the InTrade market on Health Care Reform.  In theory, the market price in such a derivative market should equal the expectation of the underlying event coming true.  However, Paul Krugman (and many others) forgot one of the most basic assumptions of the market model!  Transaction costs.

When the market price is over 95, InTrade charges a transaction fee of 3 cents per contract (real money).  While market prices are quoted in percentages, the payoff for a winning ticket is $10 (real money).  Therefore, the transaction fee is 0.3% of the winning payoff.  In addition, InTrade charges 10 cents per contract on expiry (if you “win”).  That’s another 1.0%. 

So, when the market was quoting 97.8% likelihood of the HCR bill passing before June 2010, this didn’t really mean that there was a 2.2% chance of the bill not passing.  A winning ticket would be subject to 1.3% transaction fees.  The real likelihood of failure was 0.9% – approximating the uncertainty that Obama would be “hit by a bus” before signing the bill into law. 

No rational investor would wish to purchase a share for more than 98.7, given the transaction costs.  In a sense, this is the market’s “100%”.  Interestingly, at 1:49pm GMT today (March 23), there are 695 bids at 99.1 and 413 asks at 99.2.  Clearly, some traders are not subject to the full transaction fees at InTrade.  More about that here.

I love Paul Krugman, but this time, he made a silly little mistake.  Of course, all of this assumes the market price is accurate in the first place!

Posted by: Paul Hewitt | March 22, 2010

Health Care Reform Explained

American Health CAre Presentation

I watched part of the U.S. Health Care Reform bill passage on Sunday, March 21, 2010.  Combined with the political commentary, it was pretty clear that there is a lot of misinformation.  A particularly extreme right-wing viewpoint (along with moronic comments), can be found at the Cafe Hayek blog.  I’ll warn you, before you click on the link (if you really have to), that most of the comments are remarkably irrational.  Don’t even think of trying to debate with these oddballs (I’m being charitable).

For a much more balanced and logical explanation of the American Health Care Reform, click here for an excellent (and easy to understand) PowerPoint presentation.  This is an award winning presentation by Dan Roam and C. Anthony Jones, M.D.  Enjoy!

Posted by: Paul Hewitt | March 14, 2010

Truth in Advertising – Meet Prediction Markets

Most published papers on prediction markets (there aren’t many) paint a wildly rosy picture of their accuracy.  Perhaps it is because many of these papers are written by researchers having affiliations with prediction market vendors.

Robin Hanson is Chief Scientist at Consensus Point.  I like his ideas about combinatorial markets and market scoring rules, but I think he over-sells the accuracy and usefulness of prediction markets.  His concept of Futarchy is an extreme example of this. Robin loves to cite HP’s prediction markets in his presentations.  Emile Servan-Schreiber (Newsfutures) is mostly level-headed but still a big fan of prediction markets. Crowdcast’s Chief Scientist is Leslie Fine; their Board of Advisors includes Justin Wolfers and Andrew McAfee.  Leslie seems to have a more practical understanding than most, as evidenced by this response to the types of questions that Crowdcast’s prediction markets can answer well: “Questions whose outcomes will be knowable in three months to a year and where there is very dispersed knowledge in your organization tend to do well.”  She gets it that prediction markets aren’t all things to all people.

An Honest Paper

To some extent, all of the researchers over-sell the accuracy and the range of useful questions that may be answered by prediction markets. So, it is refreshing to find an honest article written about the accuracy of prediction markets.  Not too long ago, Sharad Goel, Daniel M. Reeves, Duncan J. Watts, David M. Pennock published Prediction Without Markets.  They compared prediction markets with alternative forecasting methods for three types of public prediction markets: Football and baseball games and movie box office receipts.

They found that prediction markets were just slightly more accurate than alternative methods of forecasting.  As an added bonus, these researchers considered the issue that prediction market accuracy should be judged by its effect on decision-making.  So few researchers have done this!  A very small improvement in accuracy is not considered material (significant), if it doesn’t change the decision that is made with the forecast.  It’s a well-established concept in public auditing, when deciding whether an error is significant and requires correction.  I have discussed this concept before

While they acknowledge that prediction markets may have a distinct advantage over other forecasting methods, in that they can be updated much more quickly and at little additional cost, they rightly suggest that most business applications have little need for instantaneously updated forecasts.  Overall, they conclude that “simple methods of aggregating individual forecasts often work reasonably well relative to more complex combinations (of methods).”

For Extra Credit

When we compare things, it is usually so that we can select the best option.  In the case of prediction markets it is not a safe assumption that the choices are mutually exclusive.  Especially in enterprise applications, prediction markets are heavily dependent on the alternative information aggregation methods as a primary source of market information.  Of course, there are other sources of information and the markets are expected to minimize bias to generate more accurate predictions.  

In the infamous HP prediction markets, the forecasts were eerily close to the company’s internal forecasts.  It wasn’t difficult to see why.  The same people were involved with both predictions!  The General Mills prediction markets showed similar correlations, even when only some of the participants were common to both methods. The implication of these cases is that you cannot replace the existing forecasting system with a prediction market and expect the results to be as accurate.  The two (or more) methods work together. 

Not only do most researchers (Pennock et al, excepted) recommend adoption of prediction markets, based on insignificant improvements in accuracy, they fail to consider the effect (or lack thereof) on decision-making in their cost/benefit analysis.  Even if some do the cost/benefit math, they don’t do it right.   

Where a prediction market is dependent on other forecasting methods, the marginal cost is the total cost of running the market. There is no credit for eliminating the cost of alternative forecasting methods.  The marginal benefit is that expected by choosing a different course of action than the one that would have been taken based on a less accurate prediction.  That is, a slight improvement in prediction accuracy that does not change the course of action has no marginal benefit. 

Using this approach, a prediction market that is only “slightly” more accurate, than those from alternative forecasting approaches, is just not good enough.  So far, there is little, if any, evidence that prediction markets are anything more than “slightly” better than existing methods.  Still, most of our respected researchers continue to tout prediction markets.  Even a technology guru like Andrew McAfee doesn’t get it , in this little PR piece he wrote, shortly after joining Crowdcast’s Board of Advisors.

Is it a big snow job or just wishful thinking?

Posted by: Paul Hewitt | March 13, 2010

Paralympic Games 2010

No predictions today, just a note about a truly spectacular event that took place last night – the Paralympic Games Opening Ceremony 2010.  The link will take you to the site with the complete replay of the ceremonies.

A few weeks ago, I watched the Olympic Games Ceremonies and was quite unimpressed.  Parts of them were downright embarrassing, and I’m not talking about the torch that wouldn’t rise.  The world probably thinks Canadians are a bunch of tap dancing, tatooed fiddlers or really bad comedians (not true on both counts).  The Opening Ceremony for the Paralympic Games should change all of that.  I couldn’t be more proud to be a Canadian than I am right now.

In stark contrast to the Olympic ceremonies, this was a high-energy, happy event with a cast of hundreds of smiling children.  There were special tributes to Rick Hansen (Man in Motion) and Terry Fox (Marathon of Hope).  Don’t miss Luca “Lazylegz” Patuelli’s spectacular breakdance performance.  The music was great, the speeches heartfelt and inspiring, and the entire evening was a beautiful welcome to these amazing athletes.

Enjoy the show!

Posted by: Paul Hewitt | March 8, 2010

Oscars Prediction Markets Get it Right

My wife follows movies a lot closer than I do.  She thinks she can pick the Academy Award winners more accurately than I can.  I took up the challenge, knowing that I could visit a few prediction market sites, like hubdub and hsx.  I’m writing this portion of the blog before the Oscars take place.  I also wrote the title before the results were in. 

My picks were the front-runners in each award category, based on the market predictions on Saturday morning.  I should note that this took me all of about five minutes (compared with my wife’s hours and hours of reading and actually watching all of the movies). 

Now for the results…

I’m happy to report that the prediction markets for this year’s Oscars were 100% accurate!  I wasn’t very surprised, really, but my wife is still very skeptical about prediction markets.  How can this be?

The HSX prediction markets were not very good at picking the winners of the Best Screenplay awards.  Inglourious Basterds was “supposed” to win (52.24%), but The Hurt Locker (25.72%) did win.  Another oops.  Up in the Air was supposed to win (63.16%), but Precious did win (and it was only expected to win 7.5% of the time)!  At this point, my wife is gloating about how crappy these prediction markets are at picking winners.  While they were handing out a bunch of lesser awards, I tried to explain to her that the prediction markets were still perfectly accurate, even though a long shot actually won. 

I explained to her the concept of calibration, and how the markets were really accurate, because they were not picking winners with a 100% certainty.  In fact, the markets’ failures were validation that they were, in fact, accurate.  She thinks I’m an idiot (about prediction markets).

Up won for Best Animated Feature.  It was expected to win 98 out of 100 times (if it were to be nominated 100 times, that is).  Christoph Waltz won with 87% for Best Supporting Actor.  In these cases, the markets were both “accurate” and making accurate, useful predictions.  My wife’s not impressed.  Everyone picked those categories, apparently.  There were no other surprises in the Oscar Awards. 

Essentially, when a prediction market picks Mo’Nique to win the Best Supporting Actress Award with an 86% likelihood, she would be expected to win the award 86 times out of 100 Oscar ceremonies.  Of course, it isn’t possible to nominate her for the same role (along with the other nominees) every year for 100 years, to test the calibration of the market.  However, if the market were well-calibrated, Mo’Nique would lose the Oscar 14 times out of 100.  The market will still be considered “accurate” but fail to predict the winner when she loses.  Expressed another way, when Mo’Nique loses, it helps validate the accuracy of the market (so long as she loses only 14 times in 100). 

Unfortunately, we don’t know which 14 of the 100 trials will be losses.  Consequently, we are going to be disappointed when the losses occur.  This is why my wife is skeptical about prediction markets.  In a horse race, like the Oscars, coming close to winning means nothing.  Apparently, coming close means you’re an idiot. 

We tied in our correct picks.  However, I “won”, because I made my picks in five minutes and used the time I saved to work on my golf game.

As a side note, the predictions between hsx and hubdub were consistent.  Virtually all similar prediction markets generated expected probabilities within 5%.  Not bad, I suppose.

Though we can’t prove it, I’ll stand by my title and state that the prediction markets were 100% accurate.  But I’ll qualify this by saying they were not very useful.  If I can’t convince my wife that prediction markets are useful (she’s a corporate executive), I don’t see much of a future for enterprise prediction markets – at least not for the “horse race” types of markets.

Posted by: Paul Hewitt | January 4, 2010

The Future of Futarchy

I’ve been meaning to write this post for quite some time.  While it is an interesting concept, on paper, I’m afraid that the only place you are likely to see futarchy implemented is in a future Star Trek movie (no offense to bona fide Trekkies intended).  And, I’m sure the mythical planet, “Futarchy”, is doomed and Spock will show no mercy towards its inhabitants.  I apologize if I got any Star Trek details wrong – I only watched a couple of episodes when I was a kid.  I only refer to Star Trek to show that this idea of futarchy is “out there” – really out there, actually.

I think Robin Hanson agrees, at least partially, with this assessment.  In his paper, “Shall We Vote on Values, But Bet on Beliefs?”, he explains that rather than use a scientific approach to assessing the viability of futarchy, he uses an “engineering” approach, which merely seeks to determine whether a concept is deserving of further study, prototype development, etc…  Interested readers should probably read Robin’s paper before proceeding.  I will explain the basic idea and assumptions behind futarchy, but many of the details will not be repeated, here.

While this should be a very short read, it isn’t.  Robin Hanson used 20+ pages to explain why futarchy is “plausible” (and continues to be hopeful of its acceptance), and Mencius Moldbug used 7,400+ words to conclude that futarchy is “retarded”.  Many more words were wasted in blog comments.  This started out as a quick post to dispose of futarchy once and for all, but one fault leads to another, and it can be hard to stop.  Anyway, read as far as you like.  The conclusion doesn’t change.

The structure of this post is as follows:

  1. What is Futarchy?
  2. Discussion of the Assumptions that support considering Futarchy
  3. Decision Market Mechanics
  4. Three Scenarios
  5. The National Welfare Measure
  6. Design Issues
  7. Other Considerations
  8. Conclusion

 What is Futarchy?

Futarchy is Robin Hanson’s term for a form of government where decision markets are employed to forecast the likely effect of a proposed policy on some measure of overall welfare, such as GDP+.  If a decision market indicates that a proposed policy is likely to generate a positive welfare benefit (relative to the status quo), the policy is automatically implemented.  Actually, Robin uses the word “immediately” to determine when the proposed policy is to be adopted.  A careful reading of the papers indicates that “immediately” really means the adoption of the policy is “hard-wired” to, or directly follows from, the decision market’s forecast.  Citizens vote for elected representatives, who administer the definition and annual calculation of the welfare measure.  Using decision markets, citizens (speculators) place bets on the likely effects of proposed policies.  In this sense, “We Vote on Values (what to do), but Bet on Beliefs (how to do it)”.

The Assumptions

According to Robin, there are three assumptions that support the concept of futarchy.  Here they are, with a brief discussion of each.

1.     Democracies fail by not aggregating enough available information

Basically, Robin states that governments make bad decisions, largely because they have to appease ignorant voters.  In a democracy, every citizen has one vote, but not all citizens are equal, at least not in terms of the validity of their opinions.  He argues that relevant information exists about whether proposed policies will achieve the desired objectives, but that it is not being aggregated accurately, so that politicians may make more correct choices.  If the politicians knew which policies were unlikely to succeed, fewer of them would be adopted. 

While it is true that the majority of the public are poorly informed and lack incentives to become informed, there is a subset of informed “elites” that would be able to make trades in these speculative markets to aggregate accurate, relevant information.  Robin cites a number of studies that lead to the following statement:

“The straightforward interpretation of this data is that experts and those who are better educated actually know more than the general public about which policies are better.”

In making his case, he comes to the conclusion that the general public is not only ignorant, but “fundamentally non-truth seeking” as well.  This presents a problem in developing good public policy, unless the uninformed, irrational “chimps” allow public policy to be determined by the informed, rational elites, “such as perhaps academic advisors.”  He cites a few examples of “contrarian” public opinions, such as “52% of Americans believe astrology has some scientific truth.”  I’m almost convinced that informed traders will have a better chance of aggregating more accurate information for policy decisions, but it doesn’t sound very democratic.  It sounds more like marketing of professors’ services and turning “chimps” into “chumps”. 

On balance, I’m going to give this one to Robin.  Many (perhaps most) of society’s problems can be traced to a lack of accurate, timely information.

2.     Speculative markets are the best known method of aggregating available information

This is where Robin does his usual cut-and-paste job, briefly touching on a variety of prediction (and betting) market “success” stories over the years (none recent, by the way).  There are the following examples that we have all seen before (many, many times):  racetrack odds are better than experts, OJ commodity futures improve government weather forecasts, Oscar markets beat columnist forecasts, gas demand markets beat gas demand experts, US presidential betting markets beat opinion polls about 75% of the time, and the granddaddy of them all, prediction markets beat HP official forecasts 6 times out of 8.

The HP results were “better” by an insignificant amount and were heavily dependent on the official forecasting process (read my analysis, here).  The Oscar markets would not have beaten a poll of the actual Oscar voters.  Isolated “successes” in disparate types of markets does not imply that public policy decision markets will be equally successful.  Yet we see, time and time again, this conclusion being reached on the basis of a very small number of diverse information aggregation field studies.

In a recent op-ed article, Robin explains that speculative markets are the way to go, because they are “an exemplary way to collect and summarize information, at least when we eventually learn the outcome.”  More proof that the more often you state something (anything) the more likely it is to be believed (even if it is beyond belief)!  Note carefully, the actual outcome must become known, for the markets to have any chance of aggregating information accurately.  I’ll have more to say about this, below, as it is not as straight-forward as Professor Hanson would have us believe.

One criticism of Robin’s approach to speculative markets is that he seems to believe that a small number of well-informed traders will always counteract the irrational trades of the uninformed.  In the area of public policy, it is quite likely that some issues will have a very, very small number of “informed” traders relative to the “chimps”.  To me, it is not clear that the informed traders will overpower the chimps.   I will have more to say about this later, but for now, I just want to make the point that speculative markets can work well, but not always.  Not only that, but no one, not even Robin Hanson, seems to care much why some markets appear to work while others clearly do not. 

Reliance on this assumption is very shaky and threatens the entire institution of futarchy.

3.     It is easy to identify rich, happy nations from poor, miserable ones, after the fact.

While agreeing that it may not be the best measure, Robin suggests that GDP may be a sufficient metric for measuring policy recommendations, at least initially.  The measurement (or metric) could be refined to take into account other factors that contribute towards national “welfare” (GDP+).  Policies that are expected to improve national welfare should be implemented.  Subsequently, the measurement of national welfare will identify whether policy decisions have been good.  The logic in favour of futarchy is as follows: 

If a statistical analysis indicates that a policy is likely to have a beneficial effect on national welfare, a speculative market would be expected to indicate the same (unless there were other, valid, reasons for this not to be so).  If it is advisable to consider a policy on the basis of statistical analysis (current practice), it should be equally advisable to consider it on the basis of a speculative market (futarchy).

In a very broad, simplistic sense, this assumption may be true-enough to proceed, though it is an open question whether this is the appropriate metric to be used to assess all (or even most) policy proposals.

Decision Market Mechanics

The mechanics of decision markets are not as simple as Robin would have us believe.  Essentially, these markets are attempting to estimate a form of net present value of the expected welfare measure (GDP+) where the policy is adopted and where it is not (status quo).  The difference between the two estimates is considered the expected benefit of adopting the particular policy.  Given the very long time horizon for most policies that might be considered, it is clear that there is  tremendous uncertainty attached to the calculations. It is almost inconceivable that such markets could provide accurate forecasts before any actual policy effects could be identified.

In order to provide the necessary incentives for trading, the market must be capable of settlement.  This is a required characteristic of all prediction markets.  i.e. the actual outcome must be revealed at some point in the future.  Informed traders generate profits by buying low and selling high while the market is open or by buying at a lower price than that in which the market is ultimately settled.  The smartest traders are those that identify and trade on the largest difference between the current expectation and the eventual outcome.  They also know this information before the less-informed traders.  If the market cannot be settled for 20 years or longer, even using some form of indexed security for the payoff, I would argue that the settlement payoff loses its incentive for all but the most patient traders.  We should note that Robin is a bit vague as to the settlement of these decision markets.  We do know that whichever market contains the condition that is not true will be cancelled (i.e. if the policy is  approved, the status quo market is cancelled).  He discusses the possibility of calculating some welfare measure over a 20 year period, using various weights and discounts, and an implied assumption about future values for the infinite time period after 20 years hence.  So, settlement is a long, long way off in the future.

Such markets must continue to trade until settlement.  If not, the very long holding period for almost every decision market, would mean that active traders would be limited in how many markets they could participate.  If they continued to invest in markets before they received any “winnings”, they would, presumably, run out of investment funds.  Most importantly, we would not see the “cream rising to the top”  That is, the best predictors becoming wealthier, relative to the chimps, until a number of markets were to settle, 20 years (or more) hence.  That is an awfully long time to identify the “experts” and give their trades more weight  (in subsequent markets).  It also assumes that they will still be alive and willing to trade.  Traders will tend to be young ones, too, in order to enjoy the benefits of their smart trades.  Perhaps Robin had Associate Professors in mind for his model “elites”. It assumes, too, that they will be equally adept forecasting policy effects for the issues that will arise 20 years hence.

In the op-ed article cited above, Robin clarifies the settlement problem by allowing trading to continue in the market that is not cancelled, which would allow some traders to cash out, without waiting for the final outcome (and payout).  He indicates that, through such trading, the market will continue to improve the prediction (or forecast).  But, who cares?  The policy decision will already have been made.  Any continued trading in the market and the very long wait until the market settles merely determine the final rewards for the better forecasters and the penalties extracted from the dolts.  There are two reasons why this would be a necessary feature of futarchy.  First, assuming the informed traders are able to cash out before the market settles, this will return liquidity to the marketplace for all policy decision markets.  Of course, there will have to be a sufficient number of chimp-chumps available to facilitate such trading.  Second, the futarchy process requires informed traders to distinguish themselves from the uninformed.  Allowing them to do so, in fewer than 20 or so years that a typical market may span until settlement, is the only practical method.

Three Scenarios

Realizing the this concept of futarchy is a bit of a stretch, Robin proposes a gradual approach to adoption, starting with corporate governance, moving on to agency decision-making and finally national governance.  I only make mention of these, to see whether we can dismiss this whole concept at an early stage.

Corporate Governance

Robin describes how corporations are like small democratic governments.  He considers a simple speculative market involving conditional “dump-the-CEO” and “keep-the-CEO” stocks.  If the “dump-the-CEO” price was “clearly” higher than the “keep-the-CEO” price for “90% of the last week of a quarter”, the CEO would be dumped for the next quarter.  It is not hard to imagine that once such a guinea pig corporation experienced one CEO dumping, many more would follow.  The success of a corporation is not (and should not) be dependent on quarterly results.   Such an institution would require a steady stream of increasingly able CEO candidates (who would be able to hit the ground running, on a moment‘s notice).   A continuous learning curve, constant change and massive severance costs would threaten the very existence of any corporation stupid enough to consider such “decision-making”.  Truly shocking in its naivety.

Thankfully, Robin Hanson appears to be well-ensconced in academia, safeguarding corporate America from the havoc this nonsense would create.

The only reason I note this scenario, at all, is that the next level involves agency governance, which would follow “after some successful examples of using speculative markets in corporate governance”.  We should be able to quit right now, but there are 20 more pages of Robin’s paper to plough through, and so, we press on.

Agency Governance

While this paper was written before the current economic recession took hold, Robin cites monetary policy as a prime candidate for using speculative markets to set policy.  Apparently, most agree on the variables to be manipulated to achieve a good outcome, and they agree on the statistics that may be used to determine whether a quality policy outcome has been achieved after the fact.

To counter this proposed application, one need only consider the current (sad) state of monetary economic intelligence among the “elite”.  If a monetary expert, like Alan Greenspan, can be so wrong for so long, what chance do the “unthinking masses” have?

Somehow, Robin believes that all we would have to do is make economic information available to the public, including speculators, and a speculative market would determine which expert to believe, setting an accurate market price and the most appropriate interest rate policy.  Sheer Madness.  It is the equivalent of handing out hammers and nails to a crowd of chimps and expecting them to build a house.

But we continue on… to national governance. Once enough people are living in these chimp houses and driving around in chimpmobiles, the case will have been made for hard-wiring speculative markets to the policy enactment process.

 National Governance (Futarchy)

Elected representatives define a formal measure of “national welfare”, GDP+, and markets would continuously forecast this metric.  As policy proposals arise, new prediction markets would be implemented to forecast GDP+ conditional on the new policy being enacted and another conditional on the status quo.  Once it has been clearly shown that there would be a forecasted improvement in GDP+ (national welfare) under the proposed policy, it would be immediately implemented.

There are so many ways to be scared by this, it is hard to know where to begin. Few, if any, policies are adopted on the basis of a single metric or desired outcome, yet Robin Hanson is proposing that we do just that.  While it is true that he makes provision for the metric to be a composite of a variety of metrics, this doesn’t solve the problem.  Elected officials are in charge of defining the metric and its composition.  One can only imagine the intense lobbying efforts to influence the definition of GDP+ which could hinder the enactment of beneficial policies or promote harmful policies that should not be passed into law.

Invariably public policies have a variety of objectives.  Selecting one metric (even a composite one) to measure the success of all policy proposals is naïve and simplistic in the extreme.  The effect of any particular policy on the metric will not be observable.  The only way to observe the actual effect of a change in policy on the metric, is to hold all other things constant, which is, of course, impossible to do. 

Robin counters that it is only the difference between the two markets that matters.  However, once a policy has been approved, based on the difference between the status quo and the policy adoption decision markets, the status quo market is cancelled.  The policy adoption decision market always has been, and always will be, attempting to forecast total national welfare measure (GDP+) assuming the policy is enacted, which is based on 20 or more years’ of future statistics.  In those intervening 20 years or so, many new policies will be enacted, and every one of them will be expected to improve the national welfare.  What are the odds of such a prediction market being able to accurately (and consistently) predict the actual national welfare that will be determined over a 20 year period? 

I’m sure Robin will counter with the fact that prior to arriving at a policy decision, both decision markets were subject to the same uncertainty about the national welfare measure.  Of course they were.  This only means that both markets must have been equally accurate prior to the policy decision being triggered.  What are the odds?  How could we prove their accuracy?

We don’t have very many long-term prediction markets that can be tested.  David Pennock did look into the issue of calibration of long-term prediction markets on ideosphere.com, here, finding that they were, indeed, calibrated.  However, I commented on Midas Oracle about the problems with his conclusion as it relates to decision-making.  To summarize, David Pennock’s analysis looked at the calibration of long-term markets 30 days prior to settlement.  By that time, almost all of the uncertainty had been eliminated from the prediction.  We would be more surprised if the markets had not been well-calibrated.  Unfortunately, those same prediction markets were consistently inaccurate for the vast majority of the time they were actively traded.  They only became “accurate” as they neared settlement, when the actual outcome was about to be revealed. 

Unless prediction markets can be understood and developed to the extent that they are capable of consistently providing accurate predictions well in advance of the actual outcome, they will not be of any use, at all, for decision-making.  If the ideosphere.com markets are any indication (and they are), it appears that such speculative markets are not very good at predicting outcomes in the face of uncertainty.  Long-term policy benefits are subject to very high levels of uncertainty.  Consequently, the prospect of relying on these markets to guide policy decisions is dangerous, to say the least.  Chimps, even elected ones, might make fewer mistakes.

The National Welfare Measure

“A very simple definition of GDP+ would be a few percent annually discounted average (over the indefinite future) of the square root of GDP each period. A not quite as simple GDP+ definition would substitute a sum over various subgroups of the square root of a GDP assigned to that subgroup. Subgroups might be defined geographically, ethnically, and by age and income. (Varying the group weights might induce various types of affirmative action or discrimination policies.) A more complex GDP+ could include measures of lifespan, leisure, environmental quality, cultural prowess, and happiness.”

This is Robin Hanson’s description of the national welfare measure that would ultimately be used to assess whether the policies adopted were “good”.  In the design issues section of his paper, he discusses the possibility of basing the calculation on a 20 year period of national welfare figures. 

This is a lovely intellectual exercise Robin has embarked upon.  The vast majority of the individuals that Robin believes would take part in these speculative markets will not have a clue as to how to forecast GDP+, even in the very simple case.  Many will be perplexed as to how to discount future GDP+ figures.  The vast majority will be unable to calculate a square root of anything.   The intermediate complexity definition involves breaking down parts of the metric into sub-groups and applying weights. We’re now down to a wee fraction of the public that might be considered “expert”-enough to make a considered forecast.  But Robin’s not finished, it could be even more complex, involving environmental quality, lifespan, leisure and a host of other highly subjective factors.  Even the best actuaries will have difficulty here.  Continuing, there is no turning back from globalization, so any definition must take into account the effects of policy changes on foreigners (and other countries’ policy consequences to us).  Finally, no country stands still in time.  Demographic changes will have to be built into the metric definition.  The meek shall inherit the earth, but only if they are fully accredited actuaries!

We can’t be too hard on Professor Hanson, after all, it is a noble cause.  It’s just that, as I noted at the beginning, it belongs more in a Star Trek episode than it does an academic paper.  It’s just so out there.

Design Issues

In this portion of the paper, Robin Hanson outlines 33 design issues that might prevent the new institution, called futarchy, from operating successfully.  Some appear to be relatively minor concerns, given the discussion points raised so far, so I will focus on those that appear most crucial.  Note that Robin phrases the issues in terms of objections to futarchy.

The Rich Would Get More Influence 

Should the rich be able to undermine the accuracy of the prediction markets, Robin proposes to tax them more (a market distortion) or limit how much each person can trade in a market (another distortion).  Robin thinks that the market forces will see to it that the rich do not have as much influence as they have now, because they will not have proportionately more or better information than the speculators.  Robin’s belief in market forces is unwavering.  As we shall see later, this is a very naïve view.

One Profits Little by Supporting Unlikely Proposals

Here, Robin considers the case where you think you have a strong proposal, but few others agree, holding down the welfare measure such that the policy is never adopted.  It seems unfair that you never get rewarded for your good policy, and they are never penalized for “your being right.” 

In this case, Robin suggests (and he is probably correct) that all political systems suffer from this problem.  Consequently, it may be possible to get the policy implemented on a smaller, local scale and keep trying to convince others that the larger proposal has merit.  One can only wonder as to who might possess the resources to embark on this course of action.  As we will see, later, there is a large cost of proposing a policy initiative. 

OR… could it be that you are wrong and deserve not to have the policy adopted?  OR…  could it be that the uninformed or the manipulators are able to set the market price with their “incorrect” information?  Robin doesn’t believe it is possible for manipulators (or uninformed “noise” traders) to “game” speculative markets, so it can’t be the latter possibility.  In fact, he goes so far as to say that manipulators make the market more accurate.  Maybe the market is working properly after all by preventing you from “being right.”  Maybe you’re not “right”.  OR… maybe manipulators can game these markets.  I think they can, as explained here, here, here and here.  These references apply to several points that follow regarding manipulation of speculative markets.

Some Markets May be too Thin

Robin considers that some markets may be too thinly traded to arrive at accurate estimates, making it possible for a few traders to push the market to favor a bad proposal.  By assuming that pro and con traders are similarly funded, each will try to influence the market, eliminating the thin market condition.  Alternatively, he assumes that the speculators would find out that one side was willing to manipulate the market and make trades to counteract the manipulation.

As I noted in, my post these are highly unlikely assumptions.

One Rich Fool Could Do Great Damage

Here, Robin considers the case where Bill Gates might try to manipulate the market.  If speculators knew which way Bill Gates was trying to move the market, they could easily counteract his trades, as it is assumed that, collectively, they have much more power than he.  Even Robin agrees that it is more likely that the speculators would allow the price to be pushed somewhat by Mr. Gates, because they would assume that Bill knows something that they do not.

People Could Buy Policy Via Trades

Similar to the “Rich Fool” situation, Robin claims that someone could not buy a policy by making the “right” trades, because other traders will only let prices move when they suspect that this new trader has new (accurate) information.  Robin states that if the other traders, with deep pockets, are able to clearly observe a particular person is trying to manipulate the market, they will not allow the price to change.  Failing to possess such oracle-like market knowledge, the other traders need only know the total quantity and direction of the noise trades in order to make their corrective trades.  Even if the other traders do not know the direction and strength of the manipulation and they are unsure as to whether the manipulator has relevant information, the manipulator’s trades will merely add a bit of noise to the market price.  The sheer weight of the other, informed, traders will nullify the effects of the manipulator’s trade. 

I refer to my posts on manipulative trading, above. 

Corrupting the Welfare Measurement Metric

It is possible that the measurement of the metric that is being forecast could be corrupted to influence the policy decision.  This can be counteracted by having multiple estimates of the metric and using the median estimate as the official one.  I agree, except that, just as we have auditors attest corporate financial statements, we will need appropriately trained, independent, “auditors” to ensure the accuracy of the national welfare measure.

Welfare Metric Definition

The welfare metric must be defined independently from the policy process.  It is a simplified summary of the values voted upon by the electorate.  Government representatives could improperly influence the definition of the welfare measure.  Robin raises the issue in terms of manipulation designed to support a specific policy proposal. 

In addition, there is likely to be substantial lobbying efforts directed at components of the welfare measurement that are detrimental to powerful interest groups.  For example, large carbon emitters and polluters would seek to minimize the impact of their negative externalities on the welfare measurement, which would lessen the likelihood of punitive legislation coming into force.  If we think lobbying is a problem now, just wait.

Defining When a Market “Clearly” Estimates

Basically, this means determining when the market becomes accurate.  Essentially, Robin considers the need for taking a conservative approach, which would require a minimum of one year of a consistently “clear” price differential, followed by a one or two week (continued) price difference for policy approval to become effective.  It is a good idea to make sure that the market consistently indicates a policy will be beneficial before implementing it.  One major problem is that long-term prediction markets are notoriously inaccurate until shortly before the outcome is revealed (as discussed above).  Do we really want to take chances in setting public policy, based on long-term prediction markets that are completely unproven and most likely inaccurate at the time the decision is made?

Institutional Costs

It is costly to evaluate proposals, so there must be a framework to limit the flow of new proposals.  Robin suggests a fee to be paid to have a proposal considered (which would be refunded or rewarded if the proposal is adopted).  The fee might be set at $10 million (or $10,000), but could be reduced by a subsequent policy change proposal.

Interesting that Robin wants trading input from the public, but most assuredly wishes to exclude them from the proposal process.  Only the rich, corporations and special interest groups will have deep enough pockets to initiate proposals.  It ignores the fact that at least part of the responsibility of our government is to identify issues , propose solutions and implement policies for the benefit of society.  Granted, there are precious few examples of governments setting policies to prevent or avert future problems, but how might futarchy make policy setting more effective in this regard?

What about emergency policies?  Surely, these must be exempt from the process.  Assuming they are, what is to prevent the government, the rich, the corporations and the special interest groups from adopting a do nothing policy until an issue becomes so acute that an emergency policy is required?  Well-oiled lobbying machines will kick into gear, giving us the same, broken process for setting policy.

Fixing Bad Decisions

Here, Robin addresses the issue of a “bug” in the welfare function, probably due to oversimplification.  The elected government must have the power to amend the welfare function and/or reverse the policy decision.  Unfortunately, the process may be too slow to avoid substantial harm and it may be quite expensive to undo a policy. 

Robin proposes that once a policy proposal has been approved, it could be vetoed within the next year, if another market “clearly” estimates bad welfare consequences, using the welfare metric as defined in one year.  That is, he’s proposing an appeal process for policymaking.  Those with the deepest pockets will be in control of veto powers (or at least substantial delaying powers).  Lobbyists will have immense incentives to influence the welfare metric.  Business as usual.

It Seems “Hard” to make one Measure Encode all of our Values

It’s not just “hard”, Robin, it’s downright impossible.  You propose a simplified measure, initially, that would be incrementally amended over time, by the elected representatives.  Lobbyist heaven!

Even your most complex measure of welfare is, still, a remarkable simplification of “national welfare”.  Values in one part of the country will be different from those in another, on many key issues.  At best, “national welfare” will be an “average” of the values held by the citizens.  Every policy decision involves tradeoffs, and one could argue that every policy is different in this respect.  Yet, the national welfare definition “hard-wires” the same tradeoffs for all decisions.  This is far too simple.  I’ll stop here, as this could be the topic of an entire book (and we may not ever need to know the “answer”).

Other Considerations

Budget Constraints & Policy Adoption Ranking

Under futarchy, as long as it is clearly shown that a proposed policy would improve national welfare compared with the status quo, the policy is to be adopted.  You don’t have to be much smarter (if at all) than a chimp to understand that no nation would be financially able to implement every policy that met this standard for adoption.  Simply put, there are budget constraints, now and in the future.  All but the simplest of policies involve financial commitments in the future.  Accordingly, policies adopted in the current period will have budget implications in future years, which will limit the ability to adopt future policies that may be proposed (and that should be adopted).  Futarchy makes no mention of budget constraints

Consequently, there must be a method of ranking policies that are slated for adoption, so that the most beneficial policies are adopted ahead of weaker (though beneficial) ones.  Given the multi-year aspect of all policies, there must be some consideration of a policy’s adoption on the budget resources of future years, which may prevent the adoption of future policies (either under futarchy or in emergencies). 

If a policy is slated for adoption, based on the decision markets, but it cannot be adopted under a budget constraint, then both decision markets need to be voided – the policy adoption market and the status quo market.  Futarchy makes no mention of this possibility and the potential effects it may have on the decision markets.  I wouldn’t even hazard a guess at this point.

Complex Trader Forecasting

Futarchy assumes that if all available, relevant, information is made available to the public, speculators will be able to discern fact from fiction and forecast the national welfare measure accurately.  This assumes that at there are a sufficient number of informed traders that have a very good understanding of the issues and information and that they have decision models able to make accurate predictions.   I’m reminded of the super-human, computer-brained, all-knowing beings that I met during neoclassical economic theory classes.  I thought they had died off, but apparently, they’re back!

Forecasting national welfare under futarchy is an incredibly complex problem.  I don’t think it is even possible for speculators to make reasonably accurate forecasts of national welfare.  They simply do not possess the knowledge or understanding, let alone a decision model, that would allow them to make accurate predictions.  Even if the institution of futarchy provides speculators with forecasts and asks them to bet on the most likely one, they still do not have the necessary tools to make that decision. 

If the traders don’t have enough information to make an accurate forecast, the market will not create it.  Prediction markets merely aggregate available information held by the participants, they don’t create new information through trading.  Prediction market proponents understand that each trader’s prediction is an “accurate” estimate combined with an “error” factor.  The assumption is that the errors cancel out, leaving only the accurate information reflected in the market price.  I think this is likely to be true, but not in every case.  Where the individual errors are large, relative to the known, accurate information, the predicting algorithm is likely to break down.  If you were to consider a large number of traders, each with a very small amount of information, it is highly unlikely that the market will function like a jigsaw puzzle, putting all the “good” pieces together and cancelling the “errors”.  The large error factors will prevent any algorithm from generating a reasonably accurate prediction.

For example, if we were to run a decision market for a policy designed to combat global warming, the forecast would be wildly inaccurate.  The participants simply do not have enough information to make a reasonable forecast.  The market will not create any information that is not already possessed by the traders.  Yet, the market will look the same as an “accurate” prediction market.  Even worse, it is not possible to determine whether the market is accurate.

Uncertainty

There will always be random events that influence the actual outcome.  If markets are “efficient”, it is not possible to predict the effects of future random events on the outcome, based on the information held today.  Prediction markets reflect the level of uncertainty about the actual outcome by providing a distribution of outcome predictions.  When uncertainty is high, the distribution will be relatively flat.  As uncertainty is reduced, the distribution will tend to be tighter.  No prediction market can fully eliminate uncertainty surrounding the actual outcome being predicted. 

To some extent, the longer the time between the prediction (forecast) and the actual outcome (national welfare measure), the greater the uncertainty.  Consequently, most decision markets are likely to exhibit a fairly flat distribution of forecasts at the time the decision will be made.  While Robin Hanson disagrees with me, I believe that such markets are much more likely to be gamed by manipulators.  Furthermore, even if these markets are well-calibrated, they will not forecast the actual outcome, accurately, very often.

Decision Markets vs. Prediction Markets

Back in May, 2009, Mencius Moldbug posted Futarchy Considered Retarded on his blog, Unqualified Reservations.  It was an interesting smack-down of futarchy.  One point he made (among the 7,400 words) was that prediction markets is a fine idea, but decision markets are retarded.  I found this to be an odd comment, because all prediction markets are decision markets.  His distinction didn’t support his argument, and it clearly confused a number of commenters on his blog site and Robin Hanson’s, Overcoming Bias, when he posted his Reply to Moldbug.

Apart from frivolous applications of prediction markets, they all generate predictions about an outcome, and the prediction is used in some decision model to make a decision.  In this sense, they are decision markets.  Robin Hanson uses decision markets to mean a pair of prediction markets that work together to predict the difference between two predictions.  Typically, the difference is the effect of implementing a particular policy (or decision).  Futarchy goes one step further to hard-wire the decision markets to a hard-coded decision model. 

If prediction markets are fine, so are decision markets, but futarchy is still retarded.

Information Asymmetry

Mencius Moldbug made the following point:

“A prediction market, like any other market, functions only in the general absence of asymmetrical information. It is with some pain that I absorb the realization that a member of the George Mason School is unable to correctly apply this concept. … The rational approach to a market in which other players have more information than you is not to play. … This is one of the many reasons why insider trading is illegal.”

Robin replied, correctly, that virtually every market has information asymmetry, to some extent.  Markets still function, albeit not perfectly.  Only in cases where the asymmetry is severe is it possible that the market will cease to exist, and even then, over time, such markets seek to reform their institutions to alleviate the information asymmetries.  Moldbug’s assertion is a bit naive, relying much too heavily on the theoretical effects of information asymmetry in markets.  It is a wonderful, logical theory, but it is about as useful as the neoclassical framework for analysing real world markets.

“After the Fact it is Quite Easy to Test Forecast Accuracy”

Robin Hanson stated this in his reply to Moldbug. 

I find this to be a surprising statement by Robin.  It is not “quite easy to test for forecast accuracy” after the fact.  This involves measuring the degree of calibration between the market distribution and that of the outcome.  In fact, given the uniqueness of the outcomes being forecast, it is nearly impossible to measure calibration.  The best we can hope for is to estimate calibration of specific types of prediction markets with some set of homogeneous (more or less) outcomes.  Without calibration, a necessary condition, it is not possible to pass judgement on the accuracy of a prediction market.  Simply arguing that because one prediction market (pick one) possessed the calibration condition, all prediction markets must have it, is simplistic, without any support, and just plain dangerous.

Consider also that Robin Hanson is looking at a 20+ year measurement of the outcome for most public policy decision markets, under futarchy.  At best, there is a tremendous time lag (20 years or more) before it would be possible to test the calibration of any decision markets.  Remember, David Pennock’s analysis involved the calibration of markets 30 days before settlement.  To argue that these markets will be as well-calibrated (and accurate) as horse race betting markets is a ridiculous assumption.  Race track bettors at least read a racing form before making their bets.  In decision markets, we are merely pointing the chimps toward the dart board.

Conclusion

Robin Hanson doesn’t really give us his conclusion in the paper, but we can infer that he thinks futarchy is “promising”, based on his handling of the 33 design considerations and the list of next steps in the evolution of futarchy.  Further support comes from his op-ed piece in August, 2009 and his upcoming futarchy debate with Mencius Moldbug on January 16, 2010.

My conclusion is that futarchy has no chance of success, whatsoever.  It is a hopelessly flawed concept, even if its aim is true.  Decision-making, especially public policy decision-making cannot be done properly with such a simplistic process.  Inevitably, important considerations are left out of the decision, leading to bad decisions.

Robin believes that the information necessary to make good decisions exists, but that it has not been aggregated accurately.  I do believe that this is at least partly true.  However, I also think a large portion of information that is needed to make proper decisions does not presently exist.  Perhaps more of our resources should be directed to uncovering the missing information. 

In particular, market prices in the real world do not reflect externalities from economic activity.  Current proposals for a carbon tax or for cap and trade are attempts to include the cost of carbon emissions in economic decision-making.  If successful, either of these policies would have an impact on market prices for all goods throughout the economy, reallocating scarce resources to better economic uses.  Placing values on pollution, fresh water and other critical resources might be a far more important solution to the information problem in public policy decision-making.  That’s my “out there” idea for the decade to come.

Posted by: Paul Hewitt | January 1, 2010

Tories to Pay Dearly for Common Knowledge

“Maybe the Tories are so out of touch they don’t know what’s out there, but they shouldn’t waste £1m of public money reinventing the wheel.” – Jenny Willott, Liberal Democrats’ spokeswoman.

On Wednesday, December 30, the Guardian.co.uk reported that the Tories announced a new offer to pay £1m for the development of an online platform to harness the “wisdom of the British crowd” to solve problems related to British governance.  Recognizing that the collective knowledge of the British people is much greater than that of a “bunch of politicians”, the Tories believe that such a platform will generate solutions to vexing problems.  If it works, the “winning” entry will receive the payout.  However, it isn’t quite clear what needs to be developed in order to win.

Based on the few “starter” problems that might be addressed by the new platform, it appears that they are looking for an idea pageant.  But these are readily available, now.  Hence, the reinventing the wheel comment.  Perhaps the Tories think it needs to be large enough to accomodate all British citizens – it doesn’t.  Once you have a crowd, a bigger one isn’t much better.  The Tories show their lack of understanding about how information aggregation markets work.  They require incentives for participants to reveal their private information.  Maybe most of the funds should be devoted to rewarding those that come up with the winning ideas and those that recognize (and bet on) a good idea when they see one. 

While it may sound a bit wacky at first, there is a lot of potential.  It is sure to generate more good ideas than are being developed by the government on its own.  Without having to pay exorbitant consulting fees to generate garbage ideas, it is sure to be cheaper than their current problem solving process.  Over time, the idea market will come to recognize the top idea creators and those who are able to recognize them.  Maybe these people could form a future, wiser government? 

At least they could operate a legal, real-money, betting market if they choose to go that route.  A caution:  even if the idea pageant (or idea market) works, it will be up to some intelligent life form within the government to make sure that good ideas don’t go to waste through bad implementation. 

Two final points.  One, this will not be an example of public prediction markets.  It will be an example of information aggregation, but there is no prediction involved.  Two, it is, perhaps, the best possible use of an information aggregation framework for helping governments improve their decision-making.  In my next post (or two), I will turn my attention to the other information aggregation framework for good governance, the “retarded” futarchy of Robin Hanson.

Posted by: Paul Hewitt | December 21, 2009

The Essential Prerequisite for Adopting Prediction Markets

Prediction markets have been promoted as the best thing since “sliced bread” for forecasting future outcomes and events.  The truth is that the case has not been made to justify this position.  Today’s post will examine the necessary prerequisite for adopting prediction markets, and build a case for the seemingly incongruous conclusion that more prediction markets need to be put into practice now.

I have been very interested in the potential of prediction markets to accurately predict future events and outcomes, but I have been equally frustrated that not only is there very little proof that they work “as advertised”, very few researchers are even looking at the issue of accuracy.  It is as if the vendors and leading academic proponents simply repeat (over and over) a few past “success” stories, quickly conclude that prediction markets work (and if one works, they all work), and proceed to describe their newest application.

Robin Hanson and others have advocated the use of prediction markets, where they can be shown to be better than alternative methods of forecasting future outcomes.  It is hard to take exception with this statement, other than to question how one might implement it.  That is what prompted today’s paper.

“Better”

In order to be considered “better” than an alternative forecasting method, a prediction market must  generate a marginal net benefit by forecasting more accurately than the next best alternative method.  This implies that a more accurate prediction causes the decision-maker to choose a different course of action than the one that would have been chosen had a less accurate prediction (or forecasting) model been relied upon.  Not only that, the better course of action must generate a net benefit. 

So, the prediction must be materially more accurate than the alternative forecasts.  That is, the improvement in accuracy must be large enough to cause the decision-maker to change his or her decision.    The decision-maker must be able to choose a more beneficial course of action (it must exist as a possible action).   Finally, there must be sufficient time to implement the better course of action.  Most of the real world prediction markets have been unable to meet these conditions.  The HP markets showed that there was some potential for prediction markets, but none of the pilot markets generated materially more accurate predictions than the official forecasts.  The General Mills prediction markets, using much larger crowds than the HP markets, were no better than the internal forecasts, too. 

It is very questionable, at this point, whether it is possible to achieve accurate predictions from markets sufficiently far in advance to implement more beneficial courses of action.  There are very few long-term prediction markets, and even these are wildly inaccurate until very close to the actual outcome revelation.  Operating a long-term prediction market is pointless unless it is possible to take some beneficial action based on an accurate prediction.  One can only imagine the harm that could be caused by basing public policy on an early prediction of a long-term market, only to find that the policy was completely inappropriate.  Until their advocates work out this little problem with long-term prediction market accuracy, these markets should never be used to support any important decisions. 

Most of the real world prediction markets are very short-term in scope.  Even when they provide accurate predictions, in most cases, it is almost impossible for the decision-maker to make any significant changes to the course of action.  We can see this in the General Mills prediction markets (follow link above), where the markets only arrive at accurate predictions during the second month of a two month sales forecasting problem.  Not actionable.  Hardly useful information.

One exception to this general observation is the case of markets to predict project milestone completion dates.  The reason that these markets offer some promise is that decision-makers can use this information profitably on a daily basis.

Calibration

So, how do we determine whether a prediction market is accurate?  David Pennock helps us out by stating, “the truth is that the calibration test is a necessary test of prediction accuracy.”  As he comments, this is a necessary condition for statistically independent events.  The problem with this definition is that calibration is impossible to prove.  The best we can do is empirically estimate the calibration of a large number of similar prediction market predictions with the distributions of similar outcomes.  To date, no one has researched the calibration of specific prediction markets in any useful way.  True, there have been studies of horse race betting markets that have shown a very strong calibration with actual horse race outcomes, but this only proves calibration of these types of pari-mutuel markets.  Such results indicate that it may be possible to obtain well-calibrated prediction markets, but it certainly does not prove that such markets are, in fact, calibrated.

For more information about this, please refer to my previous post on calibration, here.

Why does calibration matter? 

As the number of uncertain future outcomes (or events) grows, they form a distribution, which provides us with the likelihood of each outcome occurring.  If we knew the distribution of actual outcomes before one occurred, we could make an optimal decision.  We would choose to base the decision on the most likely outcome.  This does not mean that we would always be right.  In fact, if we were to make this decision a number of times, we would only expect to be “right” about the same number of times as the likelihood of that outcome occurring would suggest.  But this is a hypothetical example where we know the actual distribution of the outcomes.  In order to make an optimal decision in the real world, we would like to find a method of estimating the distribution of actual outcomes.  The better the estimate, the better the decision-making result.

Some situations involve outcomes that are discrete and have no relationship between the alternatives.  Examples might include the selection of a future Olympic host city, the winner of a horse race, or who will win a contest.  Decisions involving these types of problems require a very high percentage of correct predictions, in order to be useful.  Since there is no relationship between the possible outcomes, it is not possible to “just miss” and be “almost right”.  Coming close is no good at all.  We’re still dealing with a distribution of outcomes, and we will still base our decision on the most likely outcome, but unless one of the possible outcomes has a high likelihood of occurrence, we are likely to be wrong more often than we are right, even when the prediction distribution is accurate.  The higher the likelihood of one outcome occurring, the less uncertainty there is about the outcome. 

Such discrete outcome situations are problematic for prediction markets.  The only way to minimize the percentage of incorrect decisions is to predict outcomes that have very little uncertainty associated with them.   If one of the outcomes is a near “sure thing”, we don’t need a prediction market to figure this out!  One potential use of prediction markets for these types of problems is to provide a ranking of the possible outcomes.  The decision-maker would make a decision based on the most likely outcome and develop contingency plans for other reasonably likely possible outcomes. 

Many outcomes are points along a continuous variable, such as dates (on a time line) or sales volumes (part of all possible sales volumes).  In these types of situations, making decisions based on a reasonable range surrounding the most likely outcomes may be quite acceptable.  It depends on the tightness of the distribution and the sensitivity of the decision to the outcome being relied upon.  That is, if the decision would not change when the outcome falls within a certain range, and the outcome can be expected to fall within this range a high percentage of the time, the risk of a “wrong” decision will be minimal.

The closer the distribution of predictions matches that of the actual outcomes, the more often the prediction market will provide an accurate prediction of the actual outcome.  This is not to say that the prediction market will always be correct.  It only says that it has the greatest chance of being correct most often.  Consequently, over a large number of trials, a well-calibrated prediction market will generate the best overall results from decisions that rely on the market predictions.

A prediction market provides a distribution of predictions around a mean market prediction.  Most decisions would be made based upon the mean market prediction.  If the market is calibrated with the distribution of actual outcomes, this will maximize the number of occasions that the decision will be correct, based on the actual outcome.  Furthermore, in non-discrete outcome cases, coming close to the predicted outcome will be the next most likely outcome to occur.  Coming close may be good enough.

Comparing Forecasting Methods

Our original problem was to determine whether a prediction market is better than another method in forecasting an outcome.  Now that we know a bit about distributions and calibration, we can proceed.

Most forecasting methods provide subjective distributions of forecasts, if they provide any at all.  Prediction markets offer a significant improvement over other forecasting methods, by providing an objective distribution of predictions, which can be compared with the distribution of actual outcomes.  This gives us the possibility of measuring the calibration accuracy of a prediction market, if we can obtain enough data points to consider.  At least it is possible.  Most other methods can create a rough distribution of possible outcomes which may be tested for calibration.  A good example is a sales forecast with a “worst case”, “most likely” and “best case” scenarios.  Likelihoods would be applied (subjectively) to create a rough distribution of possible outcomes.

Next, we need a fairly large number of trials.  This is a problem for almost every type of prediction market we may wish to consider.  Technically, each outcome or event is unique.   We can’t obtain a large number of trials for a particular outcome.  However, maybe we can obtain a larger number of trials for a set of homogeneous prediction markets and outcomes.  Ideally, each prediction market should have approximately the same “crowd” of participants and be attempting the predict the same type of variable outcome, such as quarterly sales of a product.  Another crowd could predict project completion dates, etc… 

After a reasonable number of trials, we would measure how well the distribution of predictions matched the distribution of actual outcomes.  That is, across all of the prediction markets, prediction ranges that had, say, a 10% probability of occurrence should capture the actual outcome 10% of the time.  If this is true for all (or most) of the prediction probabilities, we can conclude that type of prediction market is “well-calibrated” and may be used for future predictions of that type, using that “crowd” of participants.  Of course, we would also measure the calibration of the distributions (however crude) from the alternative methods.  Whichever method consistently develops the best-calibrated distribution of predictions should be the primary information model for that particular type of decision-making.  This doesn’t necessarily mean that you can drop all of the other forecasting methods.  These other methods may be generating the information that is being aggregated by the prediction market.  If we were to eliminate the source of critical information, the prediction market may not be as accurate.  In both the HP and the General Mills markets, some or all of the prediction market participants were also part of the internal forecasting process.  At HP, it appears that the markets were better predictors of the internal forecast than they were of the actual outcome.

Every “crowd” is different, and each type of outcome has unique information required to make a reasonable prediction.  Consequently, it would be ridiculous to assume that, because one prediction market is considered accurate, all prediction markets are accurate.  Yet, this is exactly what we are told on vendor web sites, and worse, by academic researchers.  It can probably be taken as a “fact” that horse race pari-mutuel markets are well-calibrated, so it is not surprising that we find almost everyone assuming that these markets are accurate.  Add a tie-in about how similar pari-mutuel market are to prediction markets, and we’re half way home. 

A few prediction market successes in political election markets and one “success” in enterprise prediction markets are trumpeted, in just about every academic paper on prediction markets, as evidence that prediction markets are “more accurate” than alternative forecasting methods.  On the basis of a mere handful of prediction market success stories, they conclude that prediction markets are the future of forecasting.  This is simply wishful thinking and leads one to question the motives of those who continue to promote a model that they know (or ought to know) is not nearly as accurate or useful as they claim and has precious little proof  that it works for each type of promoted application.  The worst part about this is that the research has slowed to a trickle.  There seems to be no need to prove that prediction markets work.  It has already been done.  Now it is all about getting an application on the market.

By now you may be thinking this guy really has it in for prediction markets.  They’re nothing but high-tech “snake oil” and the sooner these defective products are removed from the market the better.  Fair enough.  I do think that the vast majority of prediction markets could be categorized as “snake oil”.  Completely unproven.  However, I do think they have some potential to improve decision-making enterprise applications.  

Since the only way to determine the accuracy of a prediction market is to determine its degree of calibration with that of the distribution of actual outcomes, we need to focus on calibration.  The only way to measure calibration is empirically.  Since this will require as many trials as possible, I am actually going to advocate that their use be promoted even though there are few benefits right now.  As they are promoted, the clients must be told that they aren’t proven, yet, but that there is a possibility that they will develop into very useful tools in the future.   

Since calibration is not a characteristic of prediction markets in general, we need to assess calibration for each type of market and for each “crowd”.  That is an awful lot of work, but without it, prediction markets are nothing more than a crap shoot.

Posted by: Paul Hewitt | December 1, 2009

Measuring Decision Market Accuracy

I came across this post: On Prediction Markets for Climate Change by Rajiv Sethi, an economics professor at Columbia University.  In his post, he makes a very interesting point that I have yet to see in any research paper about prediction markets.  He was commenting on the recent debate between Matt Yglesias and Nate Silver, regarding the use of prediction markets to help guide policy about climate change.  By way of a very brief summary, Matt believes that big business (coal and oil) will manipulate the market to influence the setting (or not) of policies that would be detrimental to their interests.  Nate thinks this is rubbish.  If the markets are broad-based and have sufficient liquidity, attempts to manipulate the market price will not succeed.  Nate thinks the markets would be “efficient”, providing market prices that accurately aggregate available public information.

Compelling Logic?

Here is where it starts to get interesting.  Rajiv comments that the logic of Nate Silver’s position is so compelling, it simply must be true.  That is, broader participation and more liquidity makes for efficient markets that generate more accurate prices.  To his credit (and I might add that he seems to be the only one), Rajiv set out to see whether this holds up in the real world.  He used Intrade and IEM markets about the 2008 election.  The hypothesis was that the IEM markets, with a more limited base and lower trade volumes, should have been less efficient than the Intrade markets.  Instead, he found the opposite!  Compelling, indeed.

How Do You Measure Efficiency?

“First of all, let’s think for a minute about how one might determine which of two markets is aggregating information more efficiently. We can’t just look at events that occurred and examine which of the two markets assigned such events greater probability, because low probability events do indeed sometimes occur.   If we had a very large number of events (as in weather forecasting) then one could construct calibration curves to compare markets, but the number of contracts on IEM is very small and this option is not available. So what do we do?”

This paragraph from Rajiv’s post, summarizes the problem of determining whether a market is “accurate”.  We believe that if a market is well-calibrated, the distribution of its market prices will be “accurate”, reflecting all market information about the outcome.  Consequently, it will be described as “efficient”.  He points out the difficulty (in most cases the impossibility) of measuring the calibration of a market and asks “what do we do?”

Essentially, he comes to the conclusion that it is impossible to measure the efficiency of a market.  However, it is possible to say which market is more efficient.  In other words, we can determine relative efficiency of two markets.  He outlines a cross-market arbitrage mechanism that could be used to eliminate price differentials for identical contracts in different markets.  You can read the approach in his post, cited above.  While he did not actually run the arbitrage experiment, he did perform an informal test to determine which of two markets was more efficient. 

The market with the smaller change in price is the more efficient of the two markets.  Effectively, then, the more efficient market’s price will be a better predictor of the future market price in the other market.  This was how he determined that the IEM markets were more efficient than those in Intrade, despite there having a limited participant pool and lesser liquidity.

So far, we have been able to determine which of two markets is the more efficient, but we don’t know how much more efficient.  Also, we don’t know whether either market is  sufficiently “efficient” for the purpose of determining its accuracy.  Both markets may be “inefficient”, yielding inaccurate or misleading market prices. 

How did IEM do it?

Rajiv gives two possible explanations as to why the IEM markets were more efficient than the Intrade ones.  Neither explanation is good news for Nate Silver’s position.

One explanation has manipulative traders moving into the Intrade markets, in order to influence the prices (odds) quoted in the media and in political blogs.  The argument is that Intrade prices were much more widely cited than those of the IEM markets.  The reasoning goes that temporary dips in market prices can be eliminated through manipulative trading.  A political party may wish to see this done, so as not to upset campaign contributions or to minimize the impact of negative information.  The author argues that the benefit of such manipulative trading could be far in excess of the cost.  Since IEM’s markets were not as widely cited in the media or blogosphere, there was a lesser incentive to manipulate prices there.

Even if we believe the research (limited) on manipulation in prediction markets, it is more than likely that a short term (maybe even a very short term) manipulation could persist long enough to achieve the intended objective.   For example, the price could be manipulated just prior to when news stories are being finalized for the following day’s paper.  Once the paper hits the streets, the manipulated price may have been corrected, but the damage has already been done.  And this is the “best case” scenario regarding prediction market manipulation.  In the worst case, the manipulation is successful as the market is unable to correct the inaccurate price.

I’m not an expert on US campaign finance, but I wonder whether an Intrade market manipulator would need to declare the amount of funds used to implement the price manipulation scheme (or whether such a person or corporation would be considered a donor at all).  If the answer is no, it would provide an additional incentive for parties or candidates to manipulate the markets for political purposes (without having to account for the funds used).  We all know what happens when incentives are strengthened.

The other explanation is that inefficient markets attract higher participation rates and market liquidity, as traders seek to profit from inaccurate prices.  Efficient markets have fewer profit opportunities and less trading is required to keep prices accurate.  As Rajiv explains, Nate Silver is caught in a paradox.  Nate’s attempt to design a market with high participation and strong liquidity, in order to achieve efficiency (and hence, accurate prices), conflicts with Rajiv’s finding that it is the market inefficiency that generates the high participation and liquidity.

The Road Ahead

Despite all of these arguments, Rajiv Sethi believes that prediction markets on climate change topics should be tried.  He suggests that corresponding markets be offered in other marketplaces, such as the IEM, so that market efficiency comparisons can be performed and studied.  I’m sure useful information could be gleaned from this effort. 

We need to keep in mind that some (or most) prediction markets may not work, however.  The objective of prediction markets is to accurately aggregate information held by the market participants.  If those participants do not have the information (or are unable to get it and profit from it), the market will be unable to generate an accurate prediction or there will be too much uncertainty about the prediction, rendering it useless for decision-making.

Personally, I like the idea of decision markets, but I think we will find that our efforts to use these markets to help guide climate change policy will ultimately fail.  There is simply too much information that is needed to accurately predict the important metrics.  It is hopeless to think that, not only will there be “informed” traders, they will be able to counteract the trading of the uninformed traders and the manipulators.  Any useful standard of “informed” traders might result in a mere handful of individuals spread throughout the world.  The impact of manipulators would swamp any efforts of the informed to set the “right” price in the market.  That said, there may be metrics that can be predicted (with reasonable accuracy) by a large number of traders.  Such predictions could be used as inputs into public policy decision models.  As with all prediction markets, the predictions must be accurate and consistently so.

Posted by: Paul Hewitt | November 27, 2009

Traders DO Need to Know the Direction of Manipulation

Information Aggregation and Manipulation in an Experimental Market Robin Hanson, Ryan Oprea, David Porter

This study looks at price accuracy in experimental (laboratory) markets, where there are price manipulators.  The overall finding is that non-manipulative traders compensate for the bias inherent in the offers from manipulators, by setting a different threshold for trading.  The authors acknowledge that the “identification of manipulation in the field is difficult” and empirical evidence is scarce and tenuous.  Hence the need for a controlled, laboratory experiment.  For background on the experiments, please refer to the original paper.

There were two parts to this experiment.  In the Replication Treatment, there were no manipulators present, and in the Manipulation Treatment, one-half of the participants were given an incentive to increase the median price at the close of the market.  All participants knew that half of their number had this incentive to manipulate, and they knew the direction that the manipulation would take (upward). Where the non-manipulative traders knew that the manipulative traders would attempt to bid up the price in the market, they lowered their threshold for accepting offers, effectively counteracting the manipulative influence in the market. This makes intuitive sense, but only in the case where the non-manipulative traders know the direction of the manipulation.

In my previous post, I indicated that it would be necessary for the non-manipulative (“informed”) traders to know which direction the manipulators would try to move the market.  Robin Hanson commented that this is not necessary.  I think he is wrong, now, but he was right when this paper was written!   I think the authors are saying that it is required.  In fact, in the paper, they go a step further and allow all participants to know the strength of the incentive to manipulate.  We should keep in mind that, while this experiment demonstrates the concept of market manipulation and whether it can have a persistent effect on market prices, it is a pretty simple, controlled example.  The real question is whether it can be generalized to more complex, real-world situations.

Posted by: Paul Hewitt | November 26, 2009

Decision-makers May be Smarter than Manipulators

Can Manipulators Mislead Market Observers? – Ryan Oprea, David Porter, Chris Hibbert, Robin Hanson and Dorina Tila.

This study showed that uninformed third parties (observers) are able to make significantly better forecasts of asset values based on market prices (of those values) in an experimental market.  Even when half of the traders attempted to manipulate the market, the observers’ forecasts were no less accurate.

It appears that the observers are able to adjust the market price to remove most, or all, of the effects of manipulation.  To me, this means the observers were using some other form of decision model to arrive at their forecast.  Such a model used the market price along with other trade data, enabling the observer to alter the forecast from that determined by the market price alone.  The authors note that the observers were able to do this, despite the fact that the non-manipulative traders and the observers did not know which direction the incentives for manipulation ran.

This is quite a remarkable result.  It would have been nice to know how they were able to make these accurate forecasts with market price data that had been manipulated.  One of the findings was that upward price manipulation resulted in about a 7% increase in the market price (though there was no similar effect for downward manipulation).  The authors note that further study is required along with robustness tests.  I agree that it might yield very useful insight into the process of making a forecast based on prediction market prices.

In a sense, the observer should be considered a decision-maker.  If decision-makers are able to filter out the effects of manipulation in a real public policy prediction market and make an accurate forecast of the underlying metric, perhaps there is a role for such markets.  I would feel a lot more comfortable if we knew how the decision-maker (observer) is able to accomplish this feat.  Finally, we need to know if this was only possible, because it was a fairly simple experimental model.  Will the same decision-maker’s  ability exist in extremely complex public policy markets?

Posted by: Paul Hewitt | November 26, 2009

Is It Enough to Provide Incentives?

In their paper, A Manipulator Can Aid Prediction Market Accuracy, Robin Hanson and Ryan Oprea use a theoretical model to show that a market can become more accurate when manipulators are present, by increasing the returns to informed trading, which provides incentives for traders to become informed.  However, given the number of assumptions made in the model, the authors caution that the “findings may not be robust” and that “since this is not a fully general model, it cannot by itself support strong general claims about the price effects of manipulation.”  So far, so good, we are in complete agreement, at least for some markets!

There are quite a few assumptions in the model, including:  “risk-neutrality, normally distributed values and signal errors, interior choices of information quantity, no transaction costs of trading, no budget constraints, and a single rational manipulator with quadratic manipulation preferences and a commonly known strength of desire to manipulate.”  The authors do examine the potential effects on their conclusions, if some of the assumptions don’t hold true in practice.   Let’s look at some of them.

A Manipulative Conspiracy

For example, if there were a conspiracy among most (maybe “many” would be enough?) traders to pursue a common manipulation objective, the supremacy of the informed over the manipulative traders could be upset.  This isn’t as far-fetched as it may sound.  Large, politically affiliated groups, unions, and industry association groups of members could be inspired to conspire either directly or indirectly (propaganda).

Uninformed by Choice or Constraint

The authors assume that providing an inducement for traders to become better informed, they will actually become better informed. 

What if it is not possible for traders to become sufficiently “informed”?  This could be the result of the issue being too complex or uncertain, or it could be that the cost of becoming sufficiently informed outweighs the benefits of using that information. 

For example, is it even possible for the average bettor to “read up” on climate change research to the extent necessary to determine that the market has been manipulated or that the market reflects too much uninformed, noise trading?  I highly doubt it. 

It could very well be that some issues (like this one) are so complex and so uncertain as to be unpredictable, until very soon before the market closes.  It is only the march of time that whittles away the uncertainty.

Relatively Speaking

The authors state that “when potentially informed traders have deep pockets relative to the volume of noise trading, increases in trading noise do not directly effect price accuracy.”  This assumes that traders can be “informed” and have a sufficient volume relative to noise trade volume (including that of manipulators).  I would argue that a market (such as a climate change PM) does not meet this condition. 

Creating a Prediction out of Nothing

The authors state that “historical, field, and laboratory data, however, have usually failed to find substantial effects of such manipulation on average price accuracy.”  Though this may be true, what happens in a market that has no clear average price (i.e. has a flat distribution)?  Couldn’t a manipulator create a misleadingly “accurate” market price?  The existence of a flat distribution (before manipulation) indicates the market does not have sufficient information to make a prediction.  The traders are uninformed.  Such a market would be ripe for a manipulation, and the market would not have enough informed traders to know what was happening or to do anything about it. 

Conclusion

On balance, I think the authors realize that there is a potential for markets to eliminate the effects of manipulation in some markets, if the necessary conditions and assumptions hold true.  In highly complex, uncertain situations, some of the key assumptions are unlikely to be met, calling into question the conclusion of the paper.  This is what I was trying to get across in my previous post.  Perhaps the single most important condition or assumption in their model is that the informed traders have relatively more trading volume than the noise traders (manipulators).  I explained why I didn’t think this would hold true for all markets.  In the authors’ paper discussed here, they simply state this condition as a fact.   They did warn us, however, that the findings may not be robust or generally applicable to all markets.

One down, three more to go (papers that is).

Posted by: Paul Hewitt | November 25, 2009

Use and Abuse of Public Policy Prediction Markets

Robin Hanson and others have suggested that prediction markets be used to help shape the direction of public policy.  The current hot issue is how to combat global warming and its effects on the environment. 

Matt Yglesias has argued that big money can manipulate markets.  So, we should not use prediction markets for this purpose.  On paper, prediction markets provide monetary or ego-related rewards for truthfully revealing private information by trading.  In this sense, prediction markets are said to “incentivize accuracy”.  When the incentives for manipulating the market price are greater than the incentives for not doing so, it is obvious how traders will act.  Matt argues that prediction markets that are prone to manipulation, such as climate change futures, will make inaccurate predictions, and any policy that is based on these will be inappropriate.  I agree.

Robin Hanson, on the other hand, believes that big money manipulators can only improve the accuracy of prediction markets.  He goes so far as to say that prediction markets are “especially incorruptible”.  I need to read all of his papers on this subject in their entirety, however, based on his own summary of the findings, I will make a few comments, now.  [I promise I will read them, fully, and update this post if necessary]

Robin (and others) argue that prediction market accuracy improves “as more big money powers are known to want to manipulate them.”  Manipulators are in essence noise traders.  Markets with more noise traders are more accurate, because informed traders are attracted to the possibility of profiting by trading with the noise traders. 

He qualifies his conclusion by stating that “this isn’t an absolute guarantee.”  Then, he suggests that we try it before we condemn it.  However, before we do so, I suggest we look at the theory more closely.  We may find that it works as well as the neoclassical economic framework in economics.  It works fine in a hypothetical, assumption-simplified world, but fails miserably in practice. 

Let’s look at some of the simplifying assumptions in Robin Hanson’s application of prediction market theory.  One, the informed traders are more powerful than the manipulators, or noise traders.  In Hanson’s experiments, the manipulators are able to affect the market price, but the informed traders quickly bring prices back to an accurate level. 

What if informed traders aren’t wealthier (than the manipulators)?

In a typical prediction market, greater trader wealth is accumulated by being better informed than other traders and making trades that payoff more frequently.  By virtue of their greater wealth, informed traders have more power to influence the market than uninformed traders.  This is a necessary condition to mitigate against manipulative behaviour.

In a public, real money market, trader wealth may have nothing at all to do with knowledge about that, or any other, outcome.  Manipulative traders can simply bring wealth to the market.  Furthermore, if such wealth is known to other traders, it may send a false signal to all traders about the manipulator’s “expert” status.  That is, rather than being viewed as a manipulator, the trader may be seen as an expert.  This is especially likely in markets where it is difficult (or impossible) for any individual to have enough knowledge to make an “informed” trade.  Even if you place restrictions on wealth that may be traded, so as to prevent a small group of traders from manipulating the market, if the stakes are high enough, the big money manipulator will simply finance a large number of other traders to carry out the manipulation.  I think many big money players would find the incentives large enough in the global warming debate.

What if most (or all) of the traders are uninformed? 

I would argue that as long as the collective information set is sufficiently complete, the market could obtain a reasonably accurate prediction.  If this is not the case, we will likely see a very flat distribution of predictions, reflecting the high degree of uncertainty.  Such a result would be practically useless for policy decision-making, other than to indicate that we need much more information about the subject.  Unfortunately, for an extremely complex issue, like climate change, it is highly unlikely that the market participants will have a “complete” set of information.  It is doubtful whether any of the participants would be able to properly weigh and assess all of the information, in order to make a truly accurate prediction of any climate change metric.  There are simply no, known frameworks for making such assessments, which leads us to…

Another possibility is that if traders have very little personal information about the subject, they will instinctively look to the others (the market) for guidance.  The prediction market principle of independence begins to break down.  If the market price has been manipulated, there is a good chance that the non-manipulative traders (notice I didn’t say “informed”) may “read” information in the price that isn’t true and place their trades accordingly. 

Public vs. Enterprise Prediction Market Manipulation

One of the reasons I haven’t looked into the issue of market manipulation is that it isn’t much of a problem in enterprise prediction markets.  Generally, we expect EPMs to have a sufficient number of informed traders, who tend to be “wealthier” than manipulators.  There are some noise traders, but not too many.  I agree with Robin Hanson’s assessment that manipulation will be overcome in enterprise markets.  Consequently, I’ve had little interest in looking at this issue.

However, prediction markets on public policy issues are different.  Apart from the market participants, there are many groups that have vested interests in the implications that might flow from a public policy prediction market outcome, and they will seek to influence the market prediction, by trading or by other means.  For example, big business may try to influence the information available to all traders to achieve the desired prediction.  This may take the form of advertising, public announcements, privately funded research, and all forms of lobbying activity.  Governments issue their own propaganda.  This information may be corroborated with price changes in the prediction market, lending credibility to inaccurate information.  Unless these prediction markets can be insulated from the manipulative influence of non-trading interest groups, they will not be able to prevent or eliminate manipulation of the market predictions.

How Manipulation is Nullified (or not)

Robin Hanson states that the informed traders must know that the noise traders want to manipulate the market.  In order to profit from this knowledge, they also need to know which way they wish to manipulate the market price. 

In a global warming market, big business, carbon emitters would likely exert downward pressure on any metric that shows adverse effects from their activities, so that legislators would be less likely to impose costly laws to prevent such activities or to compensate others for the effects.  On the other hand, “tree-hugging” organizations may wish to increase the market price, so that such legislation is more likely to be enacted.  In both cases, the truly informed trader must know who the trader is and the trader’s motive for trading.  Since there is no way to prevent a trader from diguising his identity, it is impossible to properly match the motive with the trader.  It also begs the following question.

How might the informed trader distinguish between a manipulative trader and a misinformed honest trader?  I don’t have that answer, but unless it can be answered,  it may be impossible to ensure that attempts at manipulation will lead to more accurate predictions, at least in complex, public policy prediction markets. 

Conclusion

In theory, it is a nice idea to try and accurately aggregate as much information as possible in order to determine the best course of action in public policy decisions.  Most public policy decisions are remarkably complex with numerous tradeoffs among competing interests.  All decision-making benefits from more information that is more accurate and more timely.  Unfortunately, simply inserting a prediction market framework into the decision-making process does not eliminate the political biases that have been, and will always be, there. 

While it may be possible to operate public policy prediction markets for some issues, their use in the climate change or global warming debate is questionable.  Not only can there be no guarantee of manipulation-free markets, we wouldn’t even know if market predictions had been manipulated.  If actual public policy were to depend on false readings from such markets, the potential for significant misallocation of resources is immense.  It is simply too great a risk to consider at this time, in my opinion.

Posted by: Paul Hewitt | November 24, 2009

Idea Pageants = Prediction Markets?

Recently, McKinsey released an interactive summary of their Global Survey of Enterprise 2.0 applications.  About 2,000 companies take part in the survey each of the last three years.  One of the categories surveyed is Prediction Markets.  Apparently, in 2007 the level of adoption among the responders was less than 1%.  In 2008, the adoption rate jumped to 9% and slipped slightly to 8% in 2009.  A pretty remarkable achievement, don’t you think?

Let me ask you, based on published reports over the last two or three years, does it seem that prediction market adoption has jumped by this much?  Certainly, I don’t see it.  Maybe there are other reasons for the results. 

  1. The survey sample may not have been representative of the population in any of the years. 
  2. The definition of “adoption” may include very limited trials and pilot projects involving prediction markets. 
  3. The definition of “prediction markets” may include some collective intelligence applications that aren’t really true prediction markets.

I happen to think that the third reason is the most likely culprit.  Most, if not all, of the prediction market software vendors include an idea pageant type of “prediction market” in their offerings.  I’m willing to bet that McKinsey includes these types of markets in the definition of prediction markets.  I’d also be willing to bet that these are the types of markets that are growing in adoption over the last couple of years.

Idea Pageant Growth

Relative to true prediction markets (I’ll get into the distinction below), idea pageant markets are a pretty easy sell to senior management.  There is very little downside, if any, to trying them out.  There are no political “feathers” to ruffle in the process.  There is tremendous upside potential in identifying previously undisclosed new ideas.  Senior management doesn’t have to rely on the crowd to filter out the weaker ideas (but it should).  Essentially, it is a high tech electronic suggestion box with a built-in feasibility filter.  What’s not to like?

But is it a Prediction Market?

Typically, idea pageants are set up to solicit new ideas from the participants (usually employees).  The same individuals place investments on the ideas they think are most likely to be adopted or receive funding.  This investment aspect is carried out in a market that resembles a prediction market.  It also looks a lot like a horse race market (without actually running the race).

In a true prediction market, the participants make investments in shares that represent potential outcomes.  In effect, the shares are derivatives of the actual future outcomes.  When the outcome is revealed, the share that represents the actual outcome is paid off and all other shares are worthless on expiry.  Note that the outcome is determined or occurs independently of the results of the prediction market.  The market attempts to predict the outcome, it does not determine the outcome.  The “horse race” must be run to determine the outcome.

In an idea pageant (think of a beauty pageant for ideas), the mechanism is similar to that of a prediction market.  Participants place investments on binary share contracts.  If a company is trying to find the best idea to pursue, the idea pageant becomes a poll of the opinions of the participants.  It is a weighted poll, because those who are more adept at guessing the ideas with the best potential will have more wealth to invest (vote).  The market determines the idea with the best chance of success, as determined by the weighted “votes” of the participants.  A true prediction market predicts a future outcome, which is determined independently of the operation of the prediction market.  In an idea market, the future outcome is the “prediction” of the market.

Alternatively, some idea pageants are set up to “predict” the idea that will be pursued (or receive venture capital).  This means that someone (or a panel) will make a decision about which potential outcome will be “true”.  In this case, the market is really being asked to predict the idea that the judge will select, not the idea that is “best”.  There is a big difference. We are constantly finding examples of prediction market failures in these types of markets.  Olympic site selection, Nobel Prize in Economics, etc…

Conclusion

Oddly enough, I think there is a place for idea pageants in the corporate world.  I just don’t think there’s a place for them in the definition of a prediction market.  If we were to remove all of the idea pageants in the McKinsey survey, I’m willing to bet that the true prediction market adoption rate is still around 1%.

Posted by: Paul Hewitt | November 6, 2009

The Future of Prediction Markets – Part II

As a followup to my previous post, this one covers Public prediction markets.  Up front, I have to admit that my interest in public prediction markets is minimal, mainly because I see very little potential for these types of markets to improve decision-making (public or private).  If they are unable to do this, what good are they?  I started writing this post in May, just after I completed my post on the future of enterprise prediction markets.  Instead of completing this post, I published posts on noteworthy failures of public prediction markets and about market calibration.

Recently, Chris Masse, on his Midas Oracle site, documented the very public failures of prediction markets to forecast the IOC’s eventual decision to hold the 2016 Olympics in Rio and to forecast the winner of the Nobel prize in Economics (or any of the other Nobel prizes, for that matter).  I made several comments on Midas Oracle about these failed markets, and the process has renewed my interest (ever so slightly) in public prediction markets.  Here is the result.

Is there a future for Public Prediction Markets?

Bet on it.  In fact, you may have to.  Exchanges, such as Betfair and InTrade, may be the only sustainable, profitable applications of prediction markets that are available to the public.  Let’s face it, people love to bet on uncertain outcomes.  Even when the odds are against them, people will try to beat the house.  In casinos, the odds are always against the bettor, yet there is no shortage of gamblers and the casinos become glitzier each year.  It’s no mystery where the money is coming from.

Internet-based prediction markets offer the public the convenience of betting at home.  They have the potential to greatly expand the variety and types of things on which wagers may be placed, from political races to trivial events, such as who might win the latest “star search” or who is the best dancer”.   By adding to the variety of betting options, it expands the potential market for bettors.

Take away the real money component, and these prediction markets become nothing more than trivial pursuits.  Hubdub is a good example of a play money marketplace.  While it appears to be well-run, its use for anything other than “entertainment” is questionable.  Eventually, public prediction markets like these will fade away as newer fads invade the consciousness of the play money, esteem-seeking, public bettors.

There is some potential for real (serious) money prediction markets that might provide investors with a hedging mechanism against future events for which there may not be any form of insurance.  For example, a company could hedge against the risk of a particular piece of legislation becoming law (and having adverse effects on the company).

While there is a glimmer of hope that the U.S. anti-gambling laws may be relaxed in the future to allow real money prediction markets, the amounts that may be wagered are likely to be too small to attract any investors who wish to hedge against an uncertain event.  The betting limits will, however, provide a sufficient opening to allow betting exchanges to reach a vast new market in predictions.

Is there any real value in Public Prediction Markets?

Since public prediction markets operate in the same manner as enterprise markets, we can learn more about how these markets work and what makes them work well, by analysing the much more prevalent public prediction markets.  We can learn which types of markets tend to work well and which do not.  This may be useful in identifying appropriate uses for Enterprise Prediction Markets.  We could test public prediction markets to determine their consistency (or lack thereof).  We could make incremental changes to the markets to assess the effects on accuracy, consistency and the potential length of forecasting ability.

We could learn much about the role of information completeness by monitoring the information sets of market participants and comparing markets with similar participants but having differing information sets.  This may lead to insights about using prediction markets to replace some of the costly components of enterprise forecasting processes.  For example, if a public prediction market is able to more accurately (and consistently) forecast key components of an enterprise’s annual budget than the internal corporate methods, it may be possible to improve the efficiency of the planning process.  There may be additional benefits from engaging the enterprise’s customer base in the decision-making process, too. 

Apart from the knowledge gained from operating public prediction markets, one is hard pressed to find any significant benefit of these markets.  Do they help allocate resources to their best uses?  This may be a possible benefit, if the results of certain prediction markets are used to help shape public policy.  But, prediction markets are unproven in their abilities to consistently and accurately forecast or predict future outcomes and events.  Until they overcome these substantial limitations, their use for anything other than trivial pursuits will be rare.

Posted by: Paul Hewitt | October 20, 2009

More Public Prediction Market Failures

Recently, there have been several very glaring public prediction market failures, including the IOC site selection and the Economics Nobel Prize markets.  Some followers of prediction markets are a bit shocked and concerned, but most, like Chris Masse (Midas Oracle), me, and others are not.  These particular types of prediction markets never had a chance to be accurate.  Had any of these markets actually managed to “pick” the right outcome, it would have been nothing more than a fluke.  Why we continue to waste our time on these types of markets, I’ll never understand.

Jed Christiansen (Mercury’s Blog) is an occasional commenter on Midas Oracle.  I may not always agree with him, but I respect his positions in a number of areas.  However, in response to these very public failures of prediction markets, Jed provided a number of factors that influence the accuracy of prediction markets.  It appears that his comments apply only to outcomes that are determined by a group.  Essentially, he means outcomes that are determined through some form of voting or polling, including elections, IOC site selection, Academy Awards, Nobel Prizes, etc…  While I applaud his efforts to identify the factors affecting prediction market accuracy, I find some of his comments confusing.

For example, Jed mentions that “more members/voters will be better than fewer” (in terms of improving the accuracy of prediction markets).  In these types of markets, the members/voters are determining the actual outcome.  This is entirely independent of a prediction market attempting to predict that same outcome.  Consequently, having more members involved in determining the actual outcome will have no effect, whatsoever, on the accuracy of any related prediction market.  Jed’s comment makes no sense.

Jed is absolutely correct to say that “more objective criteria will be better than less.”  However, all this means is that the more objective the determinants of the outcome, the more likely the market participants will be able to figure them out and predict the outcome.  The fewer the factors and the less uncertainty surrounding their roles in determining the outcome, the easier it will be to predict the actual outcome.  In the extreme, if a condition arises that determines (or causes) the future outcome with a high degree of certainty, the market will be able to predict with uncanny precision.  However,  if the outcome is this easily predicted, perhaps a simple decision model (If… Then…) would have provided the same “prediction”, without the bother of setting up a prediction market.

Generally, I would agree with Jed that “constrained choices will be better than unconstrained choices.”  In keeping with this statement, the fewer the choices, the more likely it is that the outcome will be predictable (only because there are fewer incorrect options)!  However, the IOC markets showed that, even with only four choices, the markets failed.  The real problem is that these markets did not have the necessary information to choose among even a very small number of alternatives.

Again, I agree with Jed that “voters signalling choices before a vote is better than if they don’t.”  Where the outcome is determined by a vote, any prior information about how some or all of the group intends to vote will be important information to be assessed by the market participants.  This merely supports the information completeness principle.  We see many examples of this type of information being accessed by participants (in the IOWA political prediction markets) where political polling influences the market prices.

Finally, Jed made a curious statement about “secretive and less secretive” committees that make decisions and that “neither will likely be as accurate as traditional open prediction markets.”  I have no idea what he means, here!  The committees (secretive or not) are the ones determining (creating) the actual outcome.  The committee has nothing to do with being “accurate” or predicting the outcome.  Traditional markets are expected to predict actual outcomes.  Jed is simply wrong to try and compare these two concepts!

Panos Ipeirotis asked if there is a more principled method of capturing the determinants of prediction market accuracy.  In response, I would suggest that we look to the first principles of prediction markets.  Perhaps the most important of which is that the market possess a sufficient degree of information completeness.  In the examples noted, the prediction market participants did not have an adequate level of information completeness to be able to arrive at accurate predictions, because the method of determining the outcome was far too complex and subjective, even when the choices were limited to four.

The only way, to provide the necessary information to the prediction market, in order for it to accurately determine the otucome, would have been to make all (or many) of the outcome voters (committee members) participants in the prediction market.  Of course, this would be a needless redundancy.  Note that in most of the enterprise prediction markets, many of the participants also take part in the internal forecasting process, effectively including the body of corporate information in the prediction markets.  If internal forecasting processes were to be replaced by prediction markets, it is highly doubtful that the markets would be able to provide accurate predictions.  The required information to make those accurate predictions would be missing.

These types of markets suffer from a fatal flaw, as well.  They are trying to predict a discrete (non-continuous variable) outcome.  “Coming close” means being completely wrong.  These types of markets are only suitable for betting purposes, and even then, only if they are proven to be “well-calibrated”.  It is questionable whether these particular markets were well-calibrated.

I have written fairly extensively on the determinants of prediction market usefulness.  I am especially concerned with their accuracy and consistency, for without these, their use in decision-making is not warranted.  I draw your attention to the following posts:

The Forgotten Principle Behind Prediction Markets

Calibration = Prediction Market Accuracy?

To answer Panos, we do have a general, principled model for assessing prediction market accuracy.  Now, we need to fill in the details.

Posted by: Paul Hewitt | September 30, 2009

Corporate Prediction Market Success is Elusive

A new study of prediction markets in the corporate world was released, recently.  It’s called Forecasting Consumer Products Using Prediction Markets, by Kai Trepte and Rajaram Narayanaswamy.  Lo and behold, the prediction markets failed to provide any significant improvement in accuracy over that of the traditional corporate forecasting process.  The authors submitted their paper as part of their masters program requirements.  They don’t appear to have been beholden to any software vendor, though they did use the services of Consensus PointToday’s entry will focus on the accuracy and usefulness of the prediction markets that were part of the study.  A subsequent entry will cover other aspects of prediction markets that were discussed by the authors.

The good news is that the authors planned the operation of the markets well, and they used more participants than most studies we have seen.  There appears to have been a conscious effort to maximize the diversity of the participants, but, like most of these studies, many of the prediction market participants also had involvement in the corporate forecasting process.  Consequently, we could pretty much expect that the predictions would be fairly well correlated with the corporate forecasts, and they were.  So, how did they compare? 

The prediction markets weren’t failures, but they weren’t able to do any better than the established corporate forecasting process at General Mills, where 20 prediction markets were put in play.  Despite the efforts of many academics, researchers, vendors and corporations,  the breakthrough success story about enterprise prediction markets remains as elusive as ever. 

FINDINGS & COMMENTARY

Correlation of Predictions and Forecasts

The Mean Absolute Percentage Error (MAPE) of the prediction market and the operations forecast (internal process) were highly correlated.  As mentioned above, this is not surprising, given that many of those involved with the internal forecasting process were also involved with the prediction markets.  Furthermore, the initial probability distribution for the potential outcomes were based on normal distributions around the internally forecasted mean.  That is, the starting point for the prediction market was the corporate forecast.  There were good reasons for doing this, but still, it may have introduced some bias toward the internal forecast.

The authors of the study found that the prediction market forecasts were virtually identical to those of the internal operations forecasting process, as evidenced by their means falling within one standard deviation of each other.  Consequently, we could say that both processes/methods were good aggregators of available information, and any information that was generated internally was also available to the market participants. 

Some Predictions are Better Than Others

The authors included three types of markets:  Volume, Product Category and Promotional markets.  The Volumemarkets were characterized by products that might be considered staples, with fairly stable consumption patterns.  Internal forecasts and market predictions were both able to accurately gauge the future outcome.  Product Category markets were a bit more difficult to predict or forecast, due to the nature of the products and strategies used.  Finally, the Promotional markets, which were characterized by products that had very significant promotions planned, were the most difficult to forecast.  Not even the corporate marketing people were very good at forecasting the effectiveness of the promotional activities.  Again, both the internal forecasts and the market predictions were even less accurate, but still they were basically the same.

It appears that, if it is difficult to analyse data to come up with an accurate forecast, as was the case with the promotional markets, the use of a prediction market will not magically generate the information necessary to make a better prediction.  We have seen this in other studies and examples, where there is a significant amount of uncertainty about the outcome.  This is the information completeness principle that I’ve discussed previously.

Very Short Term Markets

I should note that the prediction markets were in operation for no longer than 10 weeks.  The authors described some of their prediction markets as being “long term”, but in reality, there were anything but.  In our quest for a useful enterprise prediction market, it must be able to generate consistently accurate predictions, sufficiently in advance, so that decision-makers are able to change their tactics, based on the predictions.  In the study’s “longer term” markets, none were able to generate accurate predictions until very near the time when the actual outcome would have been set.  In these cases, management would not have had time to change their tactics or decisions, once the market prediction had become known.  Therefore, even if the prediction had been perfectly accurate, it is completely useless for any decision-making purposes.

Costs vs. Benefits

The authors did not discuss the issue of costs and benefits of prediction markets, but perhaps we should.  Given that both the traditional forecasting process and the prediction markets provided equivalent forecasts, should General Mills’ management scrap their costly forecasting process and adopt these neat new tools?  We can’t know for sure, right now, but if they were to discontinue the internal forecasting process, most of the useful information that needs to be aggregated in the prediction markets would not have been available to the participants.  Accordingly, we would expect the predictions to become very inaccurate. 

It would appear that the accuracy of the prediction markets depends upon the information created by the forecasting process.  If you can’t have prediction markets without the internal forecasting, why would General Mills add prediction markets to the process?  One reason might be to verify the accuracy of the internal forecast, but I’ll bet they already know that, historically, their forecasts are reasonably accurate for their decision-making purposes.  They might consider eliminating the internal aggregation function, while continuing to generate forecasting information.  Prediction markets would be relied upon to perform the aggregation of the information more efficiently.  Finally, prediction markets generate distributions of possible outcomes along with the mean prediction or forecast.  This information can be used to assess the risk and uncertainty surrounding the forecast, enabling management to make better contingency plans.

Filtering Bias

One of the benefits of prediction markets is their ability to filter out bias during the aggregation process.  Consequently, I (and the authors) expected the prediction markets to provide significantly more accurate forecasts than those generated from the internal forecasting process.  The fact that they were not more accurate means, to me, that General Mills’ internal forecasting process performs its function in a reasonably unbiased fashion.  We should be studying why they have been able to minimize bias in their planning!   Another possibility, which I find too scary to contemplate, is that prediction markets aren’t as good at filtering out the bias as we have been led to believe!

Calculating Accuracy

The authors don’t discuss the method of calculating the forecast or prediction error, other than to note that General Mills uses the MAPE (see above) to calculate their own internal forecast errors.  I have a couple of issues with this approach (which was also used in the HP study).  Using the absolute value of the error provides only the magnitude and no information about whether the prediction was an over or under-estimation.  Accordingly, the actual error could be as much as twice the amount of the absolute error quoted.  Also, the authors (and others) use the actual outcome as the denominator in the calculation of the average.  This is incorrect, because it is the forecast (or prediction) value that is being evaluated, rather than the actual outcome.  Management relies upon the prediction in order to make decisions.  They don’t rely on the actual outcome (which isn’t known), when they are making decisions.  Accordingly, the prediction value should be used in the denominator and not the actual outcome.

My next blog entry will cover the authors’ comments about the operation of these prediction markets and how well they appear to aggregate available information.

Adam Smith (1776) detected the “invisible hand” that seemed to be able to efficiently allocate resources in free markets, without the intervention of a central “planner”.  Against the backdrop of the USSR and its planned economy, the neoclassical model provided a substantial improvement in explaining the allocation of scarce resources among competing uses.  And so until very recently, most developed-country governments (and their constituents) embraced this model of economic theory as if it was, in fact, the way economies worked.  Many policies were designed to “free up” markets by eliminating constraints to economic activity.  The objective was always to allow Adam Smith’s “invisible hand” to do its magic for the benefit of all.

Most people who try to understand “economics” do so within the traditional, neoclassical framework that has dominated economic thinking for most of the last century or so. This is the theory that is taught at the introductory and intermediate level economics.  Even those who have yet to study economics have come to embrace this model of economic thought.  They have been swayed by political and cultural institutions,  media and the like to believe that the “machine” just needs to be oiled, gassed up, and the speed limits removed for economic prosperity to come to all.

Until very recently, if you need any proof of this, all you had to do was review media reports and political commentary.  You would get the impression that Adam Smith was the greatest, smartest economist of all time, and that all we needed to do was follow his principles more closely and the economy would flourish.  They were advocating an almost religious belief in the “invisible hand”.  Scary.  Now, with all of the economic problems that have come to light, we are beginning to see a change.  More people are coming to see that the “economy” doesn’t work like the model, and no amount of tinkering will bring back the ideal model world.  Why?

Essentially, the neoclassical framework is a model of the economy that is simple enough to be understood, yet sufficiently robust to be able to describe and explain basic economic phenomena.  The real world is highly complex and likely impossible to model, without making generalizing assumptions about markets and their participants.   Among others, the neoclassical model assumes perfectly competitive markets, firms that always attempt to maximize profits, homogenous households that always attempt to maximize their utility, and perfect information.  In addition, there are no externalities and all markets always clear.  The introduction of a “shock” to a market, results in an immediate jump to a new equilibrium.

Introducing Information Economics

Born out of a disillusionment with the ability of neoclassical models to explain real world economic phenomena, a new paradigm emerged, the role of information imperfections in understanding economic conditions.  Joseph E. Stiglitz, Nobel laureate in 2001, identified information imperfections as one of the main reasons why the neoclassical model failed to explain many real economic conditions.  Where neoclassical models were characterized by a single market clearing price at equilibrium (quantity supplied equals quantity demanded), Stiglitz proved that with imperfect information, not only would markets exhibit a distribution of prices, but an equilibrium may not even exist and markets may not clear.  In extreme cases, caused by information problems, markets may be thin or fail to function at all.  I could go on about the achievements in information economics, but for now, these findings have particular relevance to our study of prediction markets.

Implications for Prediction Markets

Much of the theory behind prediction markets rests on the standard neoclassical model of markets.  Buyers and sellers interact, resulting in a market clearing price, which incorporates all of the available information about the market.  But, from information economics, we find that even small information imperfections can have profound effects on market functioning.  If the neoclassical model often fails to explain real world economic conditions, how can we expect prediction markets, based on the same theory, to explain, or describe, the underlying reality of their markets?

In the real world, information imperfections lead to market price distributions, rather than a single market clearing price.  Similarly, if prediction markets function like other product or asset markets, inevitably, the result will be a distribution of prices.

If prediction markets do not reach an equilibrium, and the market is characterized by a distribution of prices, can we still rely on the market “price” to convey all of the information available to the market?  If so, which price should we use?  How will we know when the market has incorporated enough information for the current price (or distribution) to be “accurate”?

I’m afraid I don’t have the answers to these questions.  Perhaps a group of information economists will join the discussion.

Posted by: Paul Hewitt | June 11, 2009

Why Public Prediction Markets Fail

Public prediction markets have the potential to be more accurate than enterprise markets, because they may be able to attract a larger “crowd” of traders, who may be able to aggregate a more complete information set, and the markets may be more efficient.  However, they often fail to predict actual outcomes, and their use in decision-making is dubious, at best.  They have some value as betting venues, if well-calibrated, and some nominal entertainment value (if you enjoy such trivial pursuits).  In contrast, the focus of enterprise prediction markets is on their value for decision-making purposes.

While my focus is on enterprise prediction markets, when public prediction markets fail to work properly, we need to understand why.  My attention has been drawn to a couple of recent cases (among the many) where public prediction markets failed (miserably) to predict the future outcomes.  They concerned the outcomes of American Idol (Betfair) and Britain’s Got Talent (Hubdub, Intrade).  Admittedly, these are very frivolous markets, but if prediction markets do work, shouldn’t they have been better at predicting the outcomes in these cases?  If public markets fail, why would we expect enterprise ones to work?  Both are based on the same theory.

Background – The Failed Markets

Betfair predicted Adam Lambert as the winner of season 8 on the American Idol show.  On May 19, he garnered 76% of the bettors’ money.  Kris Allen, the eventual winner, had 24%.  A few days later, Chris F. Masse blogged about the failure of prediction markets to select the eventual winner of Britain’s Got Talent, where the overwhelming favourite, Susan Boyle, failed to win.  On Hubdub, she closed with 78% of the trade money (none of the other nine competitors had more than 6%), while Intrade sent her off at about 49%.  Both of these prediction markets failed to accurately select the correct outcome.

Prediction Market Theory

Before trying to answer the question as to why these markets failed, we need to review the theory that supports markets having the ability to predict outcomes.  I’ll make this very brief, as I have covered this in my other posts.  I have also put together a companion post “A Lesson in Prediction Markets from the Game of Craps”.

The Efficient Markets Hypothesis holds that market prices accurately reflect all available information.  Since the prediction market shares (or “states”) have binary payoffs ($1 if right, 0$ if wrong), the market price should represent the likelihood of that state coming true when the outcome is revealed.  If the market is not efficient, the market prices will not represent an accurate reflection of the information available to the market.  Therefore, market efficiency is an essential condition for prediction markets to “do their thing.”

Let’s proceed under the assumption that prediction markets are efficient.  In a typical winner-take-all market, there are several shares (states) that may be traded.  Each share has a binary payoff.  Therefore, each share price represents the likelihood of that state capturing the true outcome.  Putting all of the states together provides the entire probability distribution of the market predictions, and with perfect, accurate, complete information, this distribution would be an exact match with that of the actual outcomes.  That is, it would be perfectly calibrated.  The dispersion of the predictions reflects the underlying uncertainty of the outcome.  This uncertainty is caused by future random events that might affect the outcome.

What if the information available to the market participants is incomplete?  By definition, there will be some piece of information that the market participants are unable to consider in making their investment/betting decisions.  Since the market prices are only able to incorporate known information, the market prices will be inaccurate.  Consequently, the market distribution will not match that of the actual outcomes.  As a result, the market must have sufficient information completeness.

If the essential conditions are satisfied, the market distribution will be well-calibrated.  This is the best case scenario for any prediction market.  But can it predict?

Professor Panos Ipeirotis provided an excellent explanation as to why prediction markets must fail to predict actual outcomes some of the time.  He points out that “such failed predictions are absolutely necessary if we want to take the concept of prediction markets seriously. If the frontrunner in a prediction market was always the winner, then the markets would have been a seriously flawed mechanism.” He is entirely correct. A prediction market provides a distribution of predictions that is a proxy for the distribution of actual outcomes.

What he means is that, if the frontrunner in a prediction market always wins, rational traders would always buy the frontrunner shares prior to the market closing, bidding up the price to $1, or just below that.  Any price significantly below $1 would indicate an inefficient market.  Let’s look at the first failed prediction.

American Idol (Betfair)

If Betfair’s market was efficient, Adam Lambert’s winning could not have been a sure thing.  There must have been uncertainty, and so, it is more than reasonable to say there was a 24% chance that he would lose.  If Betfair’s prediction market participants held accurate, complete information (collectively) and the market was efficient, we could say there was an unknowable uncertainty that prevented the market from pushing the Lambert price to $1.

From a decision-making viewpoint, we would be compelled to predict Adam Lambert as the winner (76% likely).  When we rely on a prediction of a discrete outcome, we need it to be correct almost all of the time, which means that the probability of the prediction must approach 100%.  By selecting Lambert, we will be either 100% right or 100% wrong when the contest is over.  There is no “almost right” with discrete outcomes.

We’re presented with a problem.  In order for prediction markets to generate accurate predictions, they must be efficient.  Such markets provide a market price that represents the probability of winning (in this case 76%).  If we needed to make a decision based on the outcome (yeah, right), we would like that probability to be closer to 100%.  However, if that were to occur, we’d hardly need a prediction market to point out the “sure bet”.  We could still make the decision, knowing that about one in every four seasons the “Adam Lambert” prediction will fail to win.  The problem is, we don’t know which season (or trial) this will happen.  This is the problem with discrete outcomes.

In some markets, future random events would introduce uncertainty as to the outcome.  As we all know, random events are unpredictable.  However, in this particular case, the prediction market closed shortly before the outcome was revealed.  The potential for random events to significantly affect the outcome would have been minimized.  Accordingly, I am left to conclude that the market participants did not possess sufficiently complete information about the outcome to make an accurate prediction or the market was not efficient.  In the latter case, the market would not have been good for a betting market, let alone a prediction market.

Public vs. Enterprise Prediction Market Design

Often Public prediction markets are designed as winner-take-all markets, with the shares (or states) corresponding to discrete outcomes.  Think of horses in a race or contestants on Britain’s Got Talent.  Many enterprise prediction markets also utilize the winner-take-all format, with the shares corresponding to ranges of a continuous variable (outcome), such as quarterly sales.  This difference in market design is one of the main reasons why public prediction markets fail to predict outcomes accurately.

Prediction markets may provide accurate distributions of possible outcomes.  Even so, the most likely prediction may not be useful for predicting the next actual outcome.  Where the outcome being predicted is a continuous variable (e.g. quarterly sales), if the market fails, but comes close, it may still be useful, whereas a market of discrete outcomes will only be useful, if it is virtually 100% accurate.

Let’s turn to the public markets for Britain’s Got Talent…

Contrast of Public vs. Enterprise Prediction Market

The prediction market for Britain’s Got Talent had 10 “horses”, the frontrunner being Susan Boyle, with 78% of the trades on Hubdub (49% on Intrade).  On Hubdub, none of the other contestants had more than 6% of the trades and two had 0%.  It was a reasonably tight distribution, yet again, the market failed to predict the actual winner.  Here’s a graph of the distribution of trades on Hubdub.

Hubdub Prediction Market

Hubdub Prediction Market

Chris F. Masse (Midas Oracle) gleefully points out these and similar prediction market failures, as he questions the accuracy and usefulness of prediction markets. He is correct in his reasoning that, if someone wants to rely on a prediction market to forecast an outcome, he needs to have a high level of confidence that the prediction will come true.  With discrete outcomes, even the slightest miss is 100% wrong.  Even when the decision-maker selects the most likely option and it fails to be accurate, it is of little consolation to tell him that the distribution of predictions is accurate.

Contrast this with a similar, winner-take-all market to forecast quarterly sales.  Let’s take the identical distribution of trades from Britain’s Got Talent and match them to a series of states corresponding to quarterly sales ranges.  Then, sort them to create a somewhat normal distribution, as shown.  In this example, quarterly sales is a continuous variable.  Accordingly, the prediction market provides us with a best forecast of $15.220M.  Without having to do the calculations, we can see that 90% of the time, actual quarterly sales will fall between $14.5M and $15.999M.  Given the mean prediction of $15.220, the maximum error (90% of the time) will be 5.1% or $0.779M.  This is the kind of prediction market accuracy that “can be taken to the bank.” Two markets with identical distributions: One predicts very accurately (continuous), the other is a bust (discrete).

Quarterly Sales Prediction Market

Quarterly Sales Prediction Market

Based on this, we can conclude that for most prediction markets involving discrete outcomes, the predictions will be questionable.

It appears that the uncertainty of the outcome depends on future events that may occur between the prediction time and the actual outcome.  When the outcome is revealed, there is no more uncertainty.  It makes sense, then, to say that uncertainty will be correlated with the time remaining until the outcome is revealed.  Therefore, as long as the future random events are reasonably unlikely to occur, or their effects will not be too significant, the prediction market may still provide a useful distribution of outcome predictions.

We see this in a wide variety of prediction markets, where as the market gets closer to the actual outcome becoming known, there are fewer random (unknown) events that might have a significant effect on the outcome.  Uncertainty is increasingly minimized the closer the market gets to the outcome revelation.

In the Britain’s Got Talent markets, we can see another problem with prediction markets – consistency.  Hubdub had Susan Boyle at 78% and Intrade had her at only 49%.  How could they have a 29% difference in the frontrunner likelihood?  Which one is more correct (less wrong)?  Are either of them “accurate”?  These are questions for another paper.

Implications

Prof. Ipeirotis is correct to require prediction markets to be efficient (he’s not convinced they are).  I may be correct to require information accuracy and completeness (at least a sufficient amount) to be contained among the participants.  These are the essential pre-conditions for possibly using prediction markets to accurately predict future outcomes.  Finally, Chris F. Masse is correct to require the prediction markets to have a high degree of confidence in predicting the actual outcome.  This pretty much precludes discrete outcomes from the public prediction market arena (except for “entertainment” or gambling purposes).

Prediction markets should only be used where they are efficient, participants have (collectively) reasonably complete, accurate information and the degree of randomness that is unknown is within an acceptable level at the time the decision is made.  The prediction market must be able to accomplish this feat sufficiently far in advance that the decision-maker is able to formulate an appropriate response to the predicted outcome.  Finally, the prediction market forecast must be more accurate (subject to cost benefit analysis).

Posted by: Paul Hewitt | June 11, 2009

A Lesson in Prediction Markets from the Game of Craps

This is a background paper on several of the important concepts in prediction markets that may be learned from the game of craps (and other dice games).  While the prediction markets discussed are ridiculous (given that the outcome is completely unpredictable), I believe the concepts are well demonstrated.  The concepts in this paper will have direct relevance to my next paper that concerns prediction market accuracy (actually, failure) in public prediction markets.

The betting game of craps is played by rolling a pair of dice.  Each roll is a random event, with known probabilities.  No one knows which number will come up on the next roll, but everyone knows the distribution of outcomes over a larger number of rolls.  Sound familiar?  It looks like this:

Distribution of Dice Rolls

Distribution of Dice Rolls

A Perfectly Calibrated, Accurate Prediction Market (that is Useless for Decision-making)

Now, let’s suspend logic for a minute and run a hypothetical prediction market to predict the outcome of one roll of the dice.  We would expect the distribution of bets (or trades) to form a distribution that is very well calibrated with the actual distribution of outcomes.  For now, let’s assume that it is perfectly calibrated.  At closing, the prediction market indicates that ‘7’ is the most likely outcome, which should occur once in every six rolls on average.  As a decision-maker, relying on the prediction market forecast, you would choose the market prediction as your best guess about the outcome.  When the dice are rolled two sixes come up, making the value ‘12’.

Chris F. Masse (Midas Oracle) is unhappy (maybe not), because the prediction market failed to predict the actual outcome.  Jed Christiansen is somewhat happy, because the market is perfectly calibrated.  He might even be ecstatic if the prediction market involved as few as 12 participants. (I’m joking here, sorry Jed) Professor Ipeirotis says, “What did you expect?”

While the prediction market is perfectly efficient and perfectly accurate (calibrated), it is also perfectly useless for the purpose of predicting a future discrete outcome.  All relevant information is contained among the market participants (information completeness).  The market’s failure to accurately predict the next roll is caused by the randomness of the outcome.  The market is, however, perfectly useful for the purpose of betting (i.e. craps), as the odds are calibrated, perfectly, with the outcomes, making it a fair game.  As in the game of craps, we are dealing with random outcomes, which by definition are unpredictable.

Now, let’s drop one die and continue to explore the properties of these prediction markets…

Calibration Loss Caused by Information Incompleteness

Distribution of Single Die Rolls

Distribution of Single Die Rolls

To the right is the distribution of all potential outcomes for rolls of a single die.  If we were to run another hypothetical prediction market on the outcome of a single die roll, the distribution of trades should perfectly match this distribution, assuming a fair die is used and all participants know this.  That is, the market has perfect, complete information about the die roll.  The resulting distribution is perfectly calibrated with the distribution of actual outcomes over a large number of trials.

Now, let’s add a twist.

All participants are told that the die being used is “loaded”, such that it will turn up one number more often than any of the others.  Everyone has accurate information, but it is incomplete:  no one knows which number is more likely to be rolled.  What will the prediction market distribution look like?  It should be identical to the first case, because the participants would be expected to evenly spread their trades across all possible outcomes.  In this case, we have accurate, but incomplete information, and the resulting market distribution will not be well-calibrated with the distribution of actual rolls.  The market can be said to be efficient, accurately reflecting the information held by the participants, but it is not an “accurate” prediction market, because it is not well-calibrated.

Calibration Restored, with Completeness Overcoming Information Inaccuracy

Distribution of Loaded Die Rolls

Distribution of Loaded Die Rolls

Let’s try another variation.  One of the traders is told that the die is loaded to turn up the number ‘4’ twice as often as any of the other numbers.  All other traders are kept in the dark (i.e. their information is inaccurate and incomplete).  Assuming the knowledgeable trader has sufficient wealth to move the market (i.e. the market is “efficient”), the prediction market distribution should be accurately calibrated with the actual outcomes.  In a perfectly efficient market, we would expect to see the following distribution.

Even though almost all participants had inaccurate information, the market does contain all of the information necessary to determine the true distribution.  If the prediction market did reveal this distribution, we could say that it operates as an efficient mechanism for revealing the true, complete information held within the group.

Lessons Learned

  1. Markets must be efficient to accurately reflect the information available within the market.
  2. Not all market participants need to have accurate or complete information, so long as the market is efficient and the market, collectively, holds complete information.
  3. These conditions are necessary for a prediction market to provide a distribution that is well-calibrated with that of the actual outcomes.
  4. Even a perfectly calibrated distribution, based on perfect, complete information may not be useful for predicting an outcome.  This is particularly true when dealing with discrete outcomes.
  5. A prediction market is even less likely to be useful, when there is a significant randomness inherent in the process of generating an actual outcome.
Posted by: Paul Hewitt | June 4, 2009

Isn’t it the truth

“Don’t worry about people stealing your ideas.  If your ideas are any good, you’ll have to ram them down people’s throats.”

Howard Aiken

Posted by: Paul Hewitt | May 26, 2009

The Forgotten Principle Behind Prediction Markets

Background

In his book, “The Wisdom of Crowds”, James Surowiecki gave an interesting and insightful account of the conditions for, and methodology of, predicting future outcomes.  To summarize, markets are able to predict outcomes where there is sufficient diversity, independence and decentralization of the market participants.  It is explained that to the extent these conditions hold true, the market will provide accurate predictions.  It works, because the law of large numbers ensures that uncorrelated errors “cancel out”, leaving behind “pure information”, as reflected in market prices.  Not only does it make a lot of sense, intuitively, there is ample support from economic theory.

Economic support for the efficacy of prediction markets ultimately derives from Adam Smith’s “invisible hand”, Hayek’s “The Use of Knowledge in Society”, and Eugene Fama’s Efficient Market Hypothesis.  Taken as a whole, they support the position that market prices fully reflect all available information about the product or asset under consideration.  A Prediction market uses this concept to make the same assertion about a future event, condition or action, to produce a “best estimate” of the uncertain outcome.

The Key to Market Efficiency

Kenneth J. Arrow and Gerard Debreu proved that free markets are able to optimally allocate resources, under certain circumstances.  One of the key assumptions behind their general equilibrium theory is that all market participants possess complete information.  Every trader in the market knows the price that each participant is willing to pay or receive for each good.  Surowiecki described Vernon L. Smith’s classroom laboratory experiments that were designed to test the economic efficiency of markets.  While these “markets” were highly simplified, they were able to show that markets can allocate resources efficiently, even when every participant does not have “complete” information.  However, Surowiecki neglected to mention why the experiments continued to work when the information completeness assumption was not met.

They worked, because the market participants, collectively, possessed “complete” information, even though none did, individually.  The market mechanism served to induce each participant to reveal his or her information in the marketplace, ultimately revealing the “complete” set of information through the supply and demand functions and the market clearing price.  While these markets were very simple, with a single product and a relatively small number of participants, they did reveal the power of markets to assemble the information necessary to perform complex, efficient resource allocations.  They operated as if all participants were privy to all of the information.  They worked because all of the information was available within the group.

Out of the Classroom… and Into the Real World

In the real world, with significantly more complex markets, products and human relationships, the ability of markets to perform similar feats of information revelation is heavily dependent on the collective information held by the market participants.  To the extent that the participants’ information (taken as a whole) is incomplete, this will be reflected in uncertainty, or price dispersion, in the market.  Where there are significant pieces of information unknown to any of the market participants, the markets are highly unlikely to provide accurate prices or tight dispersions.  In short, the market will not be able to create information that is not already known to the participants. Which leads us to…

The Forgotten Principle Behind Prediction Markets

Prediction markets must have sufficient information completeness to accurately predict outcomes with a reasonable degree of certainty.

Each of Surowiecki’s prediction market conditions (diversity, independence and decentralization) relates to this overarching principle and serve to improve the pool of information held by the market participants, but it is not enough to simply have them present; they must operate to amass a reasonable level of information “completeness”, too.  Of course it is difficult to know, in advance, whether a pool of market participants holds enough information to be considered “complete”.  Hence, we tend to rely on Surowiecki’s three conditions, but we have forgotten that they are really a collective proxy for information completeness.

Very simple markets, with few variables influencing the future outcome, are likely to provide accurate predictions from small groups of traders, because most of the information necessary to make the prediction is known within the pool.  The more complex the factors affecting an outcome become, the greater the information required to be known by the pool of traders.  One can easily imagine exponential growth in the information requirement, as the outcome becomes subject to additional causal factors.

Most (if not all) researchers and academics seem to have lost sight of this information completeness principle.  We have seen a significant volume of work directed at solving the problem of market liquidity through the use of automated market maker mechanisms.  These solutions only became necessary, because it was difficult to gather together a sufficient number of participants and provide adequate incentives for them to keep trading and revealing their private information.  Some markets were simply too thinly traded to be “efficient”, without the use of an automated market maker.  Of course, it was cheaper to operate a market with fewer participants, too.

While this solved the liquidity problem of some prediction markets, it created other problems.  With fewer traders, there was often less diversity, independence and decentralization.  All of these factors, combined with the fewer traders meant that information completeness was bound to suffer in all but the simplest of markets.  Also, with too few traders, the process of cancelling out the uncorrelated errors of the traders breaks down.  While such markets may appear to operate efficiently, this may not be so, and worse, we will not know whether there was sufficient information completeness within the market.  Consequently, it may not be appropriate to rely on the predictions of such markets.  Their predictions will be unreliable, inconsistent and subject to too much uncertainty.

My point, here, is that you can’t “fake” an efficient market and hope to achieve the level of accuracy and certainty that a truly efficient market might provide.  An automated market maker may be acceptable, but not when it is used in place of a sufficient number of diverse, independent, decentralized traders.  There is no way to replace or create the information that is not brought to the market by the traders themselves.  However, an automated market maker mechanism may be acceptable when there are an insufficient number of active traders from time to time.

Where to Now?

If a reasonable degree of information completeness is a necessary precondition for prediction market accuracy, how will we know if it has been satisfied?  As stated above, we don’t really know whether the condition is satisfied for most prediction markets.  We have to rely on optimizing the quantity of traders, while maximizing their diversity, independence and decentralization, under a cost constraint.  To a large extent, this requires trial and error in the field.  Market specifications for one market may not work as well with other markets.  It will be necessary to increase the number of real markets and learn what works, what doesn’t, and why.

Over time, it may be possible to identify trader pools that are particularly strong in predicting certain types of outcomes, because of their combined knowledge, diversity, etc…  Over time, we may be able to identify certain types of outcomes that may be predicted with a reasonable degree of uncertainty (and others that are not so predictable – e.g. earthquakes).

With a greater number of real world prediction markets, we will learn more about the factors that enhance their calibration to actual outcomes.  Right now, there are too few examples to say anything about individual market calibration levels.  More trials will provide valuable insight into the factors that generate consistency in specific market predictions.  So far, the published trials have not shown any reasonable level of consistency.

Is there an effective method of pre-screening traders that will help ensure that the total pool of information is maximized for a particular market (or class of markets)?  This might be quite costly for an individual market, but if the costs can be spread over a class of markets, run multiple times, it may be worth the effort.

If a greater degree of information completeness helps reduce uncertainty, and it should, the resulting distribution of predictions will tell us whether the outcome is predictable with a reasonable degree of uncertainty.  If a particular class of markets is unable to reduce uncertainty to an acceptable level, we can stop using it for predictive purposes.  We may still be able to use it as a measure of uncertainty for risk management, however.

Posted by: Paul Hewitt | May 26, 2009

Calibration = Prediction Market Accuracy?

In response to a recent paper I wrote on prediction market accuracy (or lack thereof), I received counter arguments claiming that a prediction market may still be “accurate” even though the prediction fails to accurately predict the actual outcome.  The argument was that “accuracy” is found in the calibration of the market predictions to the actual outcomes.  Let’s look at this concept as it applies to two types of markets:  a pari-mutuel horse race and a “winner-take-all” prediction market (using sales as an example).

Pari-mutuel Calibration

Pari-mutuel horse race markets (betting pools) are very well-calibrated (lots of examples and lots of proof of this).  That is, the odds generated from bets placed do, in fact, reflect the actual distribution of outcomes averaged over a large number of trials (races).  We shouldn’t find this particularly remarkable, as long as the market (bettor pools) possesses a reasonable degree of information “completeness”.  This probably holds true, given the fairly large number of diverse track bettors for most races.  Consequently, horses with a 10% chance of winning, based on the bets placed, will win about 10% of the races.  Here, the results are averaged over many, many races, just as they are when a coin is tossed many times and “heads” comes up 50% of the time.  Having a high degree of calibration in these markets ensures that the odds are “fair” to the bettors.

If we want to cash the most winning tickets, we would place bets on the favourite in every race.  The favourite has the highest likelihood of winning.  This doesn’t mean that the favourite will win, just that the odds of that horse winning are better than those for any of the other horses.  If there was no track “take” (i.e. no cost to play), it would be a zero-sum game.  You could bet on any (or all) horses and expect to come out “even” in the long-run.  Not much fun in that!

While pari-mutuel horse race markets are set up for the primary purpose of wagering, they do provide a frequency distribution of bets placed on each of the horses, which provides some predictive information about the future outcome (winner).  However, pari-mutuel horse races are different from “winner-take-all” prediction markets that attempt to predict the actual future value of a continuous variable (future sales for example).  In a horse race, the possible outcomes are discrete (horses).  In the horse race, unless the horse with the highest likelihood of winning does win, the market has failed to predict accurately, despite the fact that the pari-mutuel market is “well-calibrated.”  This is fine for a betting pool, but it is of little use in a corporate prediction market.

Enterprise Prediction Market Calibration

Many enterprise prediction markets are formulated as “winner-take-all” bets, which provide distributions of predictions about uncertain outcomes, somewhat similar to those of horse race markets.  Ideally, we would like these distributions to accurately reflect the distribution of actual outcomes that are being predicted.  Seems obvious enough, but how do we know when a prediction market is well-calibrated with the distribution of actual outcomes?  I have yet to see a study that has run similar enterprise prediction markets enough times to obtain an accurate distribution of actual outcomes that could be compared with prediction market distributions.  Aren’t we really assuming that prediction markets are well-calibrated?

In a prediction market, the decision-maker is hoping to derive an accurate prediction of the actual outcome.  Given that we are looking at an uncertain outcome, there will always be an error factor associated with the prediction.  If the most likely state (or share) does not capture the actual outcome, we hope that the next most likely state will.  That is, we want the most likely state to be as close as possible to the actual outcome.  Contrast this with a horse race.  There is no decision-maker other than the bettor.  The bettor selects a horse to win.  If the horse does not win, it doesn’t matter which of the other horses actually won, the bettor loses.  In an enterprise prediction market, the decision-maker does care which of the other “horses” (states) “wins.”

The difference is that an enterprise prediction market usually attempts to predict a continuous variable, such as quarterly sales, whereas a horse race market attempts to predict a discrete outcome.  In such a prediction market, we can derive the average sales forecast figure.  This is the figure representing the best estimate of the future outcome.  This is the figure that must be “accurate” for it to be useful in decision-making.  In a horse race market, such an average is meaningless (i.e. the 2.6th horse?), because the horse numbers (or names, or positions, etc…) are not related in any meaningful way.

The value of calibration is that it verifies the extent to which the distribution of predictions (bets) matches the actual distribution of outcomes.  This tells us the extent to which we may rely on the prediction market distribution as a proxy for the underlying uncertainty of the actual outcome.  It also tells us (if well-calibrated), how much uncertainty exists surrounding the future outcome.  If there is a great deal of uncertainty, the prediction market will not be very useful.

Now let’s look at a prediction market that has near perfect calibration, but a nearly flat distribution of bets (opinions).  Some might argue that the market is “accurate”, but it is useless for decision-making purposes.  The market is telling us that the outcome is too unpredictable.  However, the market could be used for betting, because it is well-calibrated. Think of a betting market on the outcome of rolling a fair die.

Conclusion

The point of this discussion is that prediction markets should be well-calibrated, but this is not a sufficient condition for their usefulness.  They must also provide accurate predictions, with relatively tight distributions.  The maximum allowable dispersion of the distribution will depend on the materiality of the forecast error.  That is, the prediction should be accurate enough, such that the maximum allowable error would not cause the decision-maker to alter his or her decision had the true value been known in advance.

Where the distribution of the prediction market is not tight, the market may still have some use, but not so much for being able to predict the outcome.  Instead, the market will be providing information about the degree of uncertainty surrounding the outcome.  This may indicate the need for greater care in assessing risks and the need for more extensive contingency planning.  A flatter distribution may indicate that the market is not functioning properly (lack of information completeness, perhaps).  Alternatively, a flat distribution may indicate that the variable being predicted is, simply, not predictable.

Posted by: Paul Hewitt | May 5, 2009

The Future of Prediction Markets – Part I

We would like to be able to run a prediction market to predict the future adoption of prediction markets (public or private), but we can’t.  There is no way to verify the outcome to determine which option would pay off.

Based on my research to date, this is where I think prediction markets are heading and where I think they should be heading.  In this paper, I will focus on Enterprise Prediction Markets.  A subsequent paper will cover Public Prediction Markets.

Private (Enterprise) Prediction Markets

In my view, these provide the most promise for future adoption, despite the almost insurmountable problems they have gaining acceptance in the corporate setting.  I am optimistic, however, because I believe prediction markets do have the potential to be better predictors of the future than other forecasting methods, at a lower cost.

My review of the literature and case studies (that have been published) indicates that prediction markets have improved the accuracy of forecasts, but the improvements have not been great enough to encourage widespread (or even minimal) acceptance.  Furthermore, these studies like to average their results over a number of markets, disguising the fact that some markets improve forecasts, while others fail to do so.  Some studies look at average absolute errors, covering up the fact that some predictions were underestimating the true outcome and others overestimating it.  This means the real errors are as much as twice as large as those reported.  Few, if any, explanations for the failures are ever presented. This raises the issue of consistency.  In case studies such as these, where there is no clear under- or over-estimation tendency, for which a correction may be made, the prediction errors are just too great.

Clearly, if similar prediction markets do not provide consistently accurate forecasts, they will not be relied upon for any important business decisions.

Businesses make estimates and forecasts in virtually everything they do.  Every decision model accepts inputs which are estimates, predictions or forecasts of likely scenarios for future conditions, events and actions.  Decisions made are only as good as the model used and the accuracy of the data being used.  “Garbage-in, Garbage-out” doesn’t just apply to computers.  There is a clear profit incentive for companies to improve their decision-making, by improving the quality of the data relied upon.  Traditional forecasting models have a spotty track record for accuracy.  Prediction markets may be a good alternative to, or add value to, traditional forecasting methods.

To be useful in the corporate world, prediction markets must provide forecasts that are more accurate than traditional methods, or be a cheaper alternative of providing equivalent forecasts.  Only if this pre-condition has been met, can we look at the other potential benefits.  It makes no sense to talk about how quickly or cheaply a prediction market gives a forecast, if the forecast is wrong!  Therefore, the focus must be on accuracy.  Let’s get it right, first.  Then, we can make it better or more efficient.

Once the accuracy and consistency issues have been met, prediction markets can be relied upon to provide a measure of the uncertainty surrounding the forecasts.  It does this with an objective distribution of “votes” around the mean prediction.  It is a particularly useful measure, with applications in risk management and contingency planning.

Assessment of Enterprise Prediction Markets (EPMs) to date:

  1. they have some ability to improve the accuracy of forecasts in specific situations;
  2. an ability to reduce (and measure) uncertainty of the forecast;
  3. perform a relatively fast aggregation of traders’ predictions, and
  4. are a relatively cheap forecasting method.

EPM Deficiencies:

  1. We don’t know why they don’t work in some cases (even with similar markets);
  2. Most forecasts are not significantly better than traditional methods (yet);
  3. They lack consistency;

Future Research (just a few):

  1. Prediction markets require a crowd of people, with as much diversity as possible, holding privately-generated independent information.  Future research must focus on how to achieve these characteristics.  Too often the research has focused on how to get around the need for a “crowd”, seemingly forgetting that reducing participation will also reduce diversity and completeness of the information contained in the crowd.  Mistake.
  2. We need to know the determinants of accuracy and consistency.  Find out what makes some markets work well, while others fail.  Find out why there is a lack of consistency in the predictions obtained from similar markets.  Then correct for these deficiencies.
  3. Find out which types of issues are best suited for prediction markets, and discard those that will never provide accurate, consistent predictions.
  4. Find out what makes a good “crowd”.
  5. Find out how to get a good crowd and keep them motivated to reveal their private information.

Of course there are many other issues related to EPMs, but I believe these are the crucial, must solve ones.  Without accuracy and consistency, EPMs will be nothing more than a novelty.

Posted by: Paul Hewitt | May 3, 2009

Prediction Market Accuracy and Usefulness

Consensus and Differences of Opinion in Electronic Prediction Markets Thomas S. Gruca, Joyce E. Berg and Michael Cipriano (2005)

I came across an obscure paper that delivers some interesting findings about the capabilities of prediction markets in the real world. Google Scholar indicates that this paper has only six citations, yet I found it to be very useful, because it involves a real world case study that examines three aspects of prediction markets:

  1. How well do prediction markets capture private information held by traders?
  2. Do prediction market prices reflect the dispersion of trader forecasts in addition to the consensus?
  3. How does the composition of the trader pool affect the disclosure of private information?

The authors conclude that prediction markets are able to aggregate privately held information quite well, they are able to aggregate information about the consensus of private information and its dispersion, and that ‘open’ markets result in better predictions than ‘closed’ markets of homogeneous traders.  Consequently, corporate prediction markets should not be restricted to in-house participants.  In this blog, I critically examine these conclusions and provide additional insight into the issues raised.

Background

The authors start with the premise reached by Plott and Sunder (1982, 1988), who were able to show that markets are able to disseminate information from “informed” traders to the uniformed traders.  Where there is perfect information (no uncertainty), it is effectively communicated from the informed to the uninformed.  Where the information is “complete” (sum of all information reveals the true state), market prices accurately predict the outcome.  Where there is uncertainty or the information set is not complete, prices may deviate from their expected values and lose the power to predict accurately (Sunder 1995). Their conclusions were based on laboratory experiments, involving a simple, hypothetical situation.

The authors of the current paper decided to test these conclusions in the real world.   They chose to run a series of markets, similar to those run by the Hollywood Stock Exchange (HSE), involving predictions of four-week box office receipts for 11 different movies openings (November 1998 – November 2002).  Each market involved 4 – 6 “winner-take-all” securities.   Trading took place on the Iowa Electronic Market (IEM), using its continuous double-auction mechanism with real money trades.  Trading commenced between four and 14 days before each movie opened in the theatres.  A Market prediction was obtained immediately before each movie opened, though trading continued during the movie’s run.

In order to test the market’s ability to aggregate private information held by traders, the authors collected forecasts from traders before they started trading.   This provided a measure of the private information held by the traders (as opposed to public information revealed by prices or other means).   Most of the traders were marketing students who completed a project in which they were asked to forecast movie box office receipts, performing their own analyses, using any information they could find.  There were four “closed” markets, in which all of the traders were students who had submitted their private forecasts before trading.  There were also seven “open” markets in which other self-selecting traders were allowed to participate, using their own money.   Here, the term “forecasts” refers to the students’ prior forecasts, and “predictions” refers to the prediction markets’ predictions.  This will make it easier to follow the analyses.

Do Prediction Markets accurately incorporate Private Information?

Yes. The authors compared the means of the students’ forecasts before trading in the market with the mean prediction implied by the market prices just before the movie opened.  They found a correlation of 0.99, indicating that the prediction market prices were accurately reflecting the private information held by the traders.

Do Prediction Markets reflect the Dispersion of Traders’ Forecasts (based on private information)?

The traders’ private information was incorporated and reflected in their forecasts (made prior to trading).  The degree of dispersion of these forecasts is described by the standard deviation.  Similarly, the authors calculated the standard deviation implied by the contract prices obtained from the prediction market.   They found that the market standard deviation was smaller than that for the students’ forecasts in every market, indicating a tighter distribution in the prediction markets and, presumably, a less uncertain prediction.  Some of the reasons put forth to explain the tighter distribution were that:

  • extreme forecasts get changed by some traders, when they see the other traders’ forecasts, as reflected in   market prices;
  • the number of contracts in the market may have affected the standard deviation, and
  • the assumption of a normal distribution may affect the true standard deviation.

So, they compared the actual market contract prices with those that would be expected if the entire distribution of students’ point forecasts (private, prior) were used to determine the contract prices.  That is, using the frequency data from the point forecasts, they estimated the probability of each contract paying off.  The expected contract prices should correspond to those observed in the market, if the entire distribution of students’ private information is being reflected in the contract prices.  Here, they found that the correlations were significant in 7 of the 11 markets, with the average being 0.81.  However, the correlations were particularly poor in two markets.  They cite three possible reasons for the poor correlations:

  • Additional information was obtained by traders after their point forecasts were made (and reflected in market prices only);
  • Other, non-student, traders (no prior forecast) were more influential in setting market prices than were the student traders (these markets appear to have been dominated by non-student traders, who had very different information), or
  • There was a market failure.

No conclusion was reached.  We might say that if either of the first two explanations is true, that is a good thing.  We want prediction markets to incorporate new information and the information provided by new participants.   Also, we want the market to determine which traders will be most influential in setting prices, based on their own individual predictions and degrees of certainty.  That is, just because the students did some research doesn’t mean that their forecasts should dominate in the prediction market.  They may not be very good forecasters.

Does the Composition of Traders Affect Market Accuracy?

There were two classes of markets – ‘open’ and ‘closed’.  The closed markets included only students who had completed the project of forecasting movie receipts before they began trading.  Open markets included other real money traders, who self-selected into the markets.

In order to estimate the accuracy of the prediction markets, the authors looked at the absolute percentage error of the predictions and forecasts (private, priors).   They found a mean average percentage error (MAPE) of 0.29, or 29% across all markets.  The MAPE for the seven open markets was 17%, but for the four closed markets it was 50%.   The authors conclude that adding additional traders to the mix improves the accuracy of the prediction markets.   They imply that corporate prediction markets should consider opening the markets to traders not normally involved with the forecast, in order to improve the accuracy of the predictions.

There are several problems with this analysis. The authors’ conclusion is wrong.   Looking at all of the students’ forecasts, we find that the MAPE was 33%.  We also find that it was 57% when they were in ‘closed’ markets, but only 20% when they were in ‘open’ markets.  The students did not know which market they would be in prior to making their forecasts, so it should be irrelevant.  We need to look, solely, at the overall accuracy.

By applying a bit of my own math, I find that the percentage improvement of the market predictions over the initial student forecasts is about 11.7%, and it does not matter much whether the market is open or closed.  Both open and closed markets experienced gains in accuracy (11.5% and 12.0%, respectively).  However, two of the seven open markets actually had a higher error than the initial forecasts made by the students prior to the market opening.  This was not explained by the authors.  I will provide one explanation, below.  We cannot attribute any effect on accuracy to whether the market was ‘open’.  Instead, the average error appears to be more dependent on the particular movie’s receipts being forecasted.  Some movies are harder to predict than others.  Maybe these markets are not appropriate for obtaining useful predictions, given the makeup of the trader pool.

UPON FURTHER EXAMINATION…

I took the data disclosed in this paper and ran it through my own analysesMy Analysis.   I segregated that open and closed market data, so that all analyses could be compared between the two groups, if necessary.   I calculated the average percentage error for the student forecasts and for the market predictions, to see how much of an improvement (if any) was obtained by running the prediction markets.   I calculated the decrease in the standard deviation between the student forecasts and the market contract prices, to see whether the prediction market helped to reduce the uncertainty of the prediction over the students’ initial forecasts.

The authors calculated the percentage error with the actual outcome on the denominator.  They also looked only at the absolute error (i.e. didn’t matter whether the market under or over-estimated the outcome).  If the Hollywood executives were to use the forecasts or market predictions in their decision-making, the error should be calculated using the forecast figure as the base (denominator), as this is the figure they would be using to make decisions.   I made this adjustment.   I already had the standard deviations for each market, for the students’ forecasts and for the market predictions.  Armed with this, I thought it would be interesting to see whether the prediction markets outperformed the students in their forecasts of the actual movie receipts.

Would Hollywood executives rely on these prediction markets?

The answer has to be ‘no’.

As mentioned above, the average absolute error of the market predictions was 29%, which is only an 11.7% improvement over the students’ initial forecasts.  This shows that prediction markets do bring about some improvement in forecasts of the future, but is it good enough to be used in decision-making?  The answer has to be ‘no’ in the case of predicting future movie receipts (at least with these trader pools).

Using the absolute percentage error disguises the fact that the errors go both ways (some were under- and others were over-estimated).   Further, the prediction markets provide no guidance as to which way the error is likely to fall.  Therefore, the real error is much larger than the absolute (value) of the percentage error.  It is, perhaps, as much as twice the error calculated by the authors.  Consequently, the real prediction market error might be as high as 58% in these markets.

We also saw that the predictions in two of the markets were worse than the initial forecasts (and we don’t know why this happened).  This speaks to the consistency issue.   If prediction markets cannot provide consistently accurate predictions in similar situations, how can they be relied upon for decision-making purposes?

What went wrong?

The authors considered the information that students obtained through their research and analyses as being “private”.  Except to the extent there may have been “collusion” in the development of individual forecasts (i.e. “study groups”), the students’ conclusions were privately held.  However, students would not be privy to industry information that would be available to Hollywood executives, film distributors, theatre owners, film critics, etc.  Instead, the students only had access to publicly available information on which to base their forecasts.  So, I think it is safe to say that the information available to the traders (collectively) was not “complete.”

Since completeness a pre-condition for market prices to predict the true outcome, it is not surprising that these markets failed to accurately predict movie receipts.  The trader pool was not diverse enough to have in their possession enough information to predict the outcome accurately.

These markets showed that prediction markets are able to reflect participant information fairly accurately, but if there isn’t enough information from the traders, the prediction may not be very good.  The conclusion has to be that diversity in the trader pool must be sufficient to include most of the relevant information needed to make an accurate prediction.

Perhaps a reduction in uncertainty has value?

In my analysis, I calculated the improvement of the dispersion in the prediction markets, relative to the initial forecasts.  Overall, the standard deviation in the prediction markets was about 35% tighter than that of the student forecasts.  It appears that trading in a prediction market helps to focus the participants’ estimates closer to the mean.   On the face of it, we would say this is a good thing.  The market is less uncertain about the forecast than a flatter distribution would indicate.  But, in these markets, the predictions have very large errors.  In a word, they were inaccurate.

Let’s examine this from a decision-making point of view.  We would expect a range, one standard deviation around the mean, to capture the actual outcome 68% of the time, if the distribution is normal.  The actual movie receipts were contained within this range for the students’ mean forecasts in 7 of 11 markets.  Perhaps about what one might expect, given that the students were not “experts” in forecasting movie receipts.  Here’s the kicker: The market predictions failed to fall within this range in 8 of the 11 prediction markets! Put another way, had the executives making decisions on a range of potential movie receipts, that was within one standard deviation of the market prediction, they would expect their prediction to be correct 68% of the time.  This did not happen in these markets.  We aren’t even looking at whether this level of accuracy is adequate for their decision-making purposes.   (I doubt it would have been).  So, even though the prediction markets had tighter distributions, they did not appear to be usefully more accurate than the students’ forecasts.

We find that a tighter distribution around an inaccurate forecast can make for very poor decisions.

It makes no sense to be “more sure” (or less uncertain) of a wrong forecast.

Posted by: Paul Hewitt | April 29, 2009

A Cheaper Alternative to Prediction Markets?

Recently, I came across this article in The Economist that discusses the genius and extraordinary abilities associated with autism.  It was thought provoking.

Genius locus – The Economist, April 16, 2009

In the article, it mentions some of the tasks that autistic people seem to have an uncanny ability to perform.  One of them was cited here:

“It helps them, too, with other tasks savants do famously well—proofreading, for example, and estimating the number of objects in a large group, such as a pile of match sticks.”

The public popularity of prediction markets jumped with the publishing of James Surowiecki’s book on the Wisdom of Crowds.  He introduced the topic by describing how the crowd’s wisdom was far superior to that of any individual at the county fair.  Other examples include guessing the number of jelly beans in a jar.  It seems to me that, if an autistic individual is able to estimate things like the above example, perhaps they might be just as good at estimating other things.  In effect, they are predicting the answer.  If so, maybe we don’t need a crowd, we need just a few – but autistic ones.

Next, the article discusses whether similar types of feats can be “learned” and considered London taxi drivers as a possible example.

“There are, however, examples of people who seem very neurotypical indeed achieving savant-like skills through sheer diligence.  Probably the most famous is that of London taxi drivers, who must master the Knowledge—ie, the location of 25,000 streets, and the quickest ways between them—to qualify for a licence.”

“The prodigious geographical knowledge of the average cabbie is, indeed, savant-like.  But Dr Maguire recently found that it comes at a cost.  Cabbies, on average, are worse than random control subjects and—horror—also worse than bus drivers, at memory tests such as word-pairing.  Surprisingly, that is also true of their general spatial memory. Nothing comes for nothing, it seems, and genius has its price.”

I might add another side-effect of their learning process.  There seems to be a very high correlation between back injuries and being a London taxi driver.  I’ve found this to be used as a convenient excuse for not lifting even the lightest of suitcases when picking up a passenger.

Maybe more accurate predictions are only a fare away!

Posted by: Paul Hewitt | April 25, 2009

Judging Accuracy in Prediction Markets

I’ve had a chance to review Emile Servan-Schreiber’s paper, Prediction Markets:  Trading Uncertainty for Collective Wisdom.  The paper indicates that it will be included in an forthcoming book on Collective Wisdom.  It summarizes some of the evidence in support of the accuracy of prediction markets.  I wholeheartedly agree with the author’s contention that diversity is a key determinant of prediction market accuracy.  I agree that characterizing prediction markets as being more like “betting exchanges” is appropriate, too.  However, I disagree that the established research has proven the case for the accuracy contention, as I hope to explain, below.

While the paper is a good summary of many of the key aspects of prediction markets, in arguing that prediction markets are accurate forecasting tools, the author cites the HP prediction market results (6 out of 8 performed better than the “official” forecasts) as one of the proofs.  It has been a decade since these prediction markets were run and still it is one of the most frequently cited proofs of prediction market accuracy.  However, the author denigrates this finding, somewhat, by noting that “beating official company forecasts isn’t always as hard as it sounds, because the goal of an official forecast is often more to motivate employees towards a goal than to predict outcomes.”

It appears that the author is saying that it shouldn’t be too difficult to beat an “official” forecast, because it is biased (in order to motivate).  Depending on the definition of “official forecast”, to some extent I might be able to agree with this assessment.  However, the fact that HP’s prediction markets did not beat the official forecast in every instance speaks to the contrary.  Furthermore, if it isn’t that difficult to beat an “official” forecast, why didn’t the HP markets do so by a significant margin? Prediction markets are supposed to reduce the bias inherent in other forecasting methods.  If the official forecasts are biased, we should not be comparing them with prediction market forecasts at all.  The true accuracy of  prediction markets depend on their ability to accurately and consistently forecast actual outcomes. If alternative methods are not trying to predict the same thing, we shouldn’t be comparing them.  Here are my comments…

What, exactly, was the “official” forecast that was used in comparison with the prediction market forecast?

Was it an internal sales budget? Such budgets (forecasts) are routinely used to set target quotas for sales teams.  The bar is usually set a bit higher than it should be, to motivate the team to “try harder” to meet the objective and earn a bonus.  The budget cannot be too high (optimistic), otherwise it will have a de-motivating effect.  If we look at the eight HP prediction markets that had official forecasts, we find one that was almost bang-on, four that were significantly below the actual outcome and three that were above.  Of the three that might be considered “motivationally-inflated” official forecasts, two appear to be reasonable, with errors of 13% and 4%, but the third was overstated by a whopping 59%!  Three of the four understated official forecasts were significantly below the actual outcome (28% – 32%).  None of the understated official forecasts could be described as “motivational”. After all, you don’t lower the bar to motivate higher jumping.  We might have expected all of the prediction market forecasts to be 5%-10% lower than the official forecast (if it was a sales budget), but they were not.  Bottom line: Even if the official forecast was really a sales budget, it would never have been lower than the expected (most likely) sales outcome, nor should it have been too much higher.

Was it an “official” forecast provided to market analysts? Obviously, not all product sales forecasts are provided to analysts (though some are), but certainly, sales projections by product line or division would be commonly disclosed.  These figures would be derived by aggregating the sales projections of individual products or lines.  Corporate management are required to disclose all significant, relevant information (public companies).  If management were to issue inflated “official” forecasts to the market, the analysts would clobber the share price when the true sales (outcome) became known.  If management is consistently optimistic in their forecasts, analysts will discount their forecasts and take it out on the share price.  Management is unlikely to be consistently pessimistic as this would serve only to put downward pressure on their share value.  Analysts are able to spot a company consistently “jumping” over a “low bar.”  Bottom line: If the official forecast is the one that is publicly disclosed, it is likely to be close to management’s best estimate of the sales outcome.

Management needs to make a variety of decisions (production, distribution, marketing, sales, HR and finance, etc…) that depend upon the best estimate of future sales.  To make such decisions using a biased forecasts would be foolish and potentially very costly.  The important (useful) forecast is the one that will help management make better decisions.  This is the forecast that management needs to predict more accurately, not a “tool” such as a sales budget.

Given that HP used the term “official” to describe the forecasts that were being compared with the prediction market forecasts, it is likely that the official forecasts were the true best-estimates of the future outcomes.  If they were, in fact, merely sales budgets, we would expect the prediction market forecasts to always be lower than the budget, and this was not the case.  Consequently, if a prediction market is able to beat the “official” forecast, consistently, it should be considered a better forecasting tool than that used to generate the official forecast.

I have already written about my objections to the HP study, where I recognized that most of the prediction market forecasts appeared to be better predictions of the official forecasts than they were of the actual outcomes.

Since I’m discussing the “official” forecasts, here, I would add that the HP prediction markets were run before the “official” forecasts and some of the participants were also involved in the setting of the “official” forecasts.  No wonder these forecasts were correlated.  The slight improvement of the prediction market forecasts over the official ones may indicate the slight effect of the small amount of additional diversity in the prediction market group over the “official” forecasting group.  It could also be explained by the internal “political” climate that influenced the official forecast, but not the prediction market forecast.  Either way, it is not a sound comparison for proving prediction market accuracy.

We still have a long way to go in proving the case for enterprise prediction market accuracy.  I believe the academics have given sufficient theoretical support, but the real proof is in the field.

Posted by: Paul Hewitt | April 12, 2009

an Analysis of HP’s Real Prediction Markets

The following article discusses the results of Hewlett-Packard’s trials with predictions markets in the late 90s.  I’m posting my comments as a review and critique of this paper.

Information Aggregation Mechanisms: Concept, Design and Implementation for a Sales Forecasting Problem Kay-Yut Chen & Charles R. Plott.

At the outset, I’d like to commend the authors for publishing their data. Even though these markets were run more than a decade ago, there have been virtually no other published results to date. Unless we are able to review actual case studies of real prediction markets, the future of the prediction market “industry” will be bleak (no prediction market is necessary to reach this conclusion). If I appear to be overly critical of some of the authors’ conclusions and methodology, I apologize. My intent is to point out areas in which prediction markets may be improved for use in a corporate setting.

Background

In this paper, the authors report on the results of HP’s internal prediction markets to forecast sales. The 12 prediction markets were run between October 1996 and May 1999. Their goal was to take prediction markets (Information Aggregation Mechanisms) out of the laboratory and into the field, to see how they work in a practical setting. Most markets attempted to forecast monthly sales of particular products, three months in advance.

To be fair, the design and implementation of these markets was constrained by management. Each market was open for one week only and for a limited time period each day. The number of active participants ranged from 12 to 24, with one that had only seven. Even the authors acknowledge that these markets could only be described as being “thin”. While the participants had access to HP data bases, they did not have access to the official HP forecasts (where available).

The markets were not operated continuously up to the start of the outcome month (or even during that month). This was unfortunate, as we might have learned more about how well (or not) prediction markets incorporate new information to revise market predictions.

Most likely a function of the market thinness (and the double auction market mechanism), the sum of the market prices for each potential outcome (range) did not add up to the market payoff (as it should), and the market prices were not “stable”. This says a lot about the need for a sufficient number of participants (however many that might be). It also says that maybe we do need some form of market scoring rule or a dynamic pari-mutual mechanism, to at least ensure that the probabilities add up the payoff.

The Results

The authors conclude that the results indicate that the HP prediction market is “a considerable improvement over the HP official forecast.” Basically, they’re saying that, because in 6 out of 8 events the prediction market error was smaller than the error of the official HP forecast, the prediction market outperforms the HP official forecast. It is true, but we need to take a closer look at the data.

In virtually every case, the prediction market forecast is closer to the official HP forecast than it is to the actual outcome. Perhaps these markets are better at forecasting the forecast than they are at forecasting the outcome! Looking further into the results, while most of the predictions have a smaller error than the HP official forecasts, the differences are, in most cases, quite small. For example, in Event 3, the HP forecast error was 59.549% vs. 53.333% for the prediction market. They’re both really poor forecasts. To the decision-maker, the difference between these forecasts is not material.

There were eight markets that had HP official forecasts. In four of these (50%), the forecast error was greater than 25%. Even though, only three of the prediction market forecast errors were greater than 25%, this can hardly be a ringing endorsement for the accuracy of prediction markets (at least in this study).

Without doing the math, it appears that there is a stronger correlation between the predictions and the HP official forecasts than there is between the predictions and the actual outcomes. But, to make the case for prediction market accuracy, the correlation has to be significant with respect to the actual outcome. It was noted in the study that, in several cases, there was evidence to suggest that the official forecasts were based, in part, on information gleaned from the prediction market exercise. Perhaps this explains the correlation with the HP official forecasts. It appears that many of the participants were also involved in setting the official forecasts. To the extent that they may have dominated the trading in the prediction markets, it is not surprising that the predictions would be closer to the official estimates than they would be to the actual outcomes.

Interestingly, in using the prediction markets to make forecasts, rather than using all of the trades, the authors chose to determine several forecasts based on the last 40%, 50% and 60% of the trades. They argue that the latest trades are more likely to be at or near the equilibrium. Yet, one of their observations is that there were no significant trends in trading (they looked at each 10% of the trades). They speculate that the market quickly aggregates a prediction, with subsequent trading moving the prediction around the equilibrium. If this is true, it makes little sense to exclude any of the trades from the determination of the prediction. Arguing from first principles, we would never want to exclude any trades, because it would interfere with the offsetting of trading errors. Excluding trades means we are excluding the information attached to those trades, which runs counter to the theory behind prediction markets.

Though the prediction market results were “better” than the HP forecasts, some markets were better than others. It would have been nice to know why this happened. To be useful, prediction markets will have to be consistently better performers than other forecasting methods. From this study, we aren’t able to make this conclusion. Unfortunately, the authors don’t delve into this issue.

Perhaps the sleeper conclusion is result 2: The probability distributions calculated from market prices are consistent with (those for the) actual outcomes. This is truly useful information. It gives us a measure of uncertainty or risk. Traditional forecasting methods do not provide this information (at least not objectively). Decision-makers can use this information to focus their efforts more wisely to reduce the uncertainty or more fully develop contingency plans where the uncertainty is greatest.

When I look at the graphs of the distributions, they appear to be fairly widely dispersed, rather than tightly focused around the mean. I’m guessing that the relatively small number of participants and the short trading period had something to do with this. It would have been nice to experiment with longer trading periods and greater numbers of participants to see whether this would have reduced the variance around the mean. It would also have been useful to keep these markets open, so that we could see how the distributions changed as they got closer to the outcome being revealed. After all, one of the major benefits of prediction markets is that they are able to dynamically update predictions.

Result 3 is valuable as well. They argue that the prediction markets were particularly good at predicting whether the actual outcome would occur above or below the HP official forecast. They looked at the direction the distributions of the prediction outcomes were skewed to predict whether the actual outcome would be higher or lower than the HP official forecast. It worked. In all cases they were able to make the correct prediction. Given that the official forecasts were usually wrong (as is the case with most forecasts), knowing whether the actual outcome is going to be higher or lower than the official forecast reduces the error (uncertainty) by at least 50%. There might be something to this analysis, at least for HP’s forecasting. It would be interesting to see if this holds up with other prediction market results. Too bad, no one seems to be looking at this.

My Conclusions (so far)

Run a lot of prediction markets, using a variety of participant sizes, to determine the effects on liquidity, prediction distributions, accuracy and speed of prediction. We need more than a sample of 12 prediction markets. We need more than 7 – 24 participants in each market. Keep the markets running after the initial prediction is determined, so that we can see how the market incorporates new information and how more accurate the prediction becomes. Perform more detailed post-mortem analyses. We need to know why the participants made their trading decisions. We need to know when the market has reached an equilibrium.

Run prediction markets on lots of different things. We need to figure out why some markets are more predictable than others.

Posted by: Paul Hewitt | April 10, 2009

Testing Prediction Markets?

Chris Masse (Midas Oracle) commented on this article, (Putting Predictive Market Research to the Test), calling it “truly bizarre research.” He’s right. It’s not a test of prediction markets at all.

I’m hard pressed to figure out where to start in critiquing this “research”. So, let’s begin with the fact there was no prediction market involved. Instead the researchers asked the participants what they thought their peers would do and compared the result with what the participants said they would do. Without a prediction market to aggregate the responses, we really have two polls going. Given the low cost of operating a real prediction market, why was one not used?

 

Next, we have the fact that all of the participants are oncologists. I think it is safe to say that this is a fairly homogeneous “crowd”, highly likely to be deficient in diversity, a pre-condition for prediction markets to operate effectively. The problem with using such a homogeneous group of participants is that many (most?) will have the same “pieces” of the puzzle to be determined, rather than having a diverse group that has many more pieces (however small) that would be aggregated into the outcome prediction.

 

There was no actual outcome in the study. It was a hypothetical treatment. The study’s authors draw conclusions about participant behaviour that are irrelevant. It seems to depend on the oncologists’ personal treatment approaches and opinions and the order in which the questions are posed. All the more reason for using a larger, more diverse “crowd”. They argue that in some cases, the predictive market result was “more optimistic” than that from the individual responses. In other cases, this wasn’t so. One result may have been more optimistic than the other, but which was more right? With no actual outcome, we will never know from this study. The authors note that traditional, survey-type responses, about what someone says they will do and what they actually do, are usually heavily discounted (as much as 50%). In short, such responses are unreliable. To compare the “predictive market” responses with these traditional responses, as they did in this study, is kind of ridiculous.

 

The study indicates that the predictive market results had “tighter” distributions, and concluded that fewer participants could be used to generate predictions (thus would save money in the future). False. Just because the distribution is tighter, does not mean you can use fewer participants. The more homogeneous the group, the tighter the distribution. A very small group may have a very tight distribution (or it may not). Furthermore, you really do need a “crowd” to run a prediction market. Optimally, we don’t want a “manufactured”, “tight” distribution, we want a good estimate of the true distribution.

 

Next time, they should run a real prediction market on a potential new treatment and compare the prediction with that obtained using a “traditional” forecasting method. Both predictions would be compared with the actual outcome (once known), to determine which provided the better predictive accuracy. That would be a true test of prediction markets.

Posted by: Paul Hewitt | April 5, 2009

Practical Enterprise Prediction Markets

Lately, there has been a lively discussion on-line regarding the slow adoption of prediction markets in the corporate world.  It seems that the major researchers and academics believe that it is just a matter of time until the corporate world wakes up and sees the incredible value of these markets.  Others, like Chris Masse (Midas Oracle), are more than a bit skeptical.

At first, I was very optimistic about the value of prediction markets and their eventual highly esteemed place in the corporate forecasting world.  The logic behind the basic theory of prediction markets makes a lot of sense.  You take a “crowd” (lots) of people, each with his own set of information and opinions, let them make choices (independently), and aggregate those choices.  Each person holds a piece of information with an associated error factor.  The law of large numbers ensures that the aggregated error will be quite small, leaving a combined chunk of “information” that is better than any individual’s piece of information.  Designing sophisticated markets would be able to reveal not only the most likely forecast outcome, but also the expected distribution of outcomes (or uncertainty).  And, all of this could be done very cheaply.  Seemed like a sure winner to me.

There were two major stumbling blocks in the corporate areana – anti gambling and insider-trading laws.  These are still issues, but I won’t get into them here, because I don’t think these were the main reasons holding major corporations back from incorporating prediction markets in their forecasting processes.

Prediction markets have been available for many years, yet the number of publicized, successful implementations is really quite small.  Many have been run as short-term “pilot” projects, which rarely seem to achieve a permanent place in the corporate forecasting process.  When you consider that most of the major international consulting firms (McKinsey, et al.), leading academics/consultants (Hanson, et al.) and several prediction market software providers, it is really quite amazing that there are so few bona fide enterprise prediction markets.

Here are my thoughts as to why they haven’t caught on:

Failure to follow First Principles

Unless firms (and their consultants) fully understand all of the prerequisites (first principles) for proper functioning of a prediction market and make sure the implementation addresses all of these requirements, the market is more likely to fail or provide inaccurate predictions.

For example, prediction markets need a large number of participants (and diverse ones at that).  Several academics have come up with innovative methods of facilitating trades through market maker mechanisms.  These have provided market liquidity that allows prediction markets to function (i.e. facilitate trades) even with a relatively small number of participants.  It is a neat little “trick” to make the market seem larger than it is in reality.  Unfortunately, the market maker mechanism allows the “crowd” prerequisite to be violated.  In addition, a smaller crowd lessens the diversity of the participants, at least partially undermining another key prerequisite.  As a result, a smaller crowd has the distinct potential to compromise the accuracy of the predictions.

The various market maker mechanisms also introduce a market distortion, which influences trading behaviour.  More work needs to be done on this, but it is my belief that market scoring rules create highly lucrative potential trading opportunities.  Combined with a “play money” market (where there is little to lose), I believe this creates disproportionate incentives for traders to undertake very risky investment decisions.  Few companies operate with a high risk profile, which calls into question the use of predictions based on risk-seeking traders.

It is interesting to note that the various software providers promote the ease of getting started in prediction markets.  True, it is easy to set up a market using the software.  The difficult part is making it function properly.  The software is merely a tool for aggregating the traders’ opinions.

Public Nature of Forecasts

Judging by the types of enterprise prediction markets that have published results, it appears that many companies have not been focusing on serious, high value forecasting issues.  Perhaps it is the public nature of the resulting prediction that is holding them back.

For example, in many cases, management has a vested interest in creating a forecast for the “market” that may not bear much resemblance to the “true” forecast.  The (“public”) existence of the “true” forecast would undermine their promotion of the official forecast for public consumption by the markets.  A bad situation, I know, but there is more than ample evidence that this is widespread phenomenon.

Existing forecasting practices utilize senior management and consultants to determine the official forecasts.  This group of strategic planners can be trusted to keep the forecasts confidential.  Prediction market forecasts are much more widely known throughout the company.  Most often, the forecasts are based on what they need to show, as opposed to what they might reasonably expect.  Then, of course, the forecast (budget) is pushed down to the lower levels to do whatever is necessary to hit the numbers.  As we see (rather frequently), this often results in many seriously wrong actions taken within companies.

If management is mildly concerned about prediction market results becoming public, it is highly unlikely that they will tackle the most important forecasting issues in this manner.  Perhaps the best way to break into the market is to operate in parallel with existing forecasting methods until prediction markets prove their worth and companies figure out how to minimize the public disclosure of these forecasts.

Practical Usefulness Issues

In order for companies to incorporate prediction markets into their forecasting systems, they need to prove their usefulness.  I think it is obvious that prediction markets have the potential to be extremely useful in this regard, but it is all in the implementation.

As discussed above, software companies make it sound so easy to implement a prediction market, but that is only a small part of the process.  There are a number of major issues that make it difficult to implement effective prediction markets, and the literature has not been particularly useful in resolving them.  While many of these issues have been raised in the literature, the discussions have been very general and sorely lacking in the practical implications.  I guess that’s where the consultants come in, but it also means that a great deal of education is required in order to “sell” the concept.  This needs to change.

Advance predictions & Incentives

In order to be useful, an accurate prediction must be determined well in advance of the actual outcome.  It makes little sense to run a market where you obtain the prediction just before the actual outcome occurs.  This sounds obvious, but it is actually quite difficult to achieve, because traders want to know how their “investment” (bet) turned out, fairly quickly.  This runs counter the the corporation’s need to know the prediction in advance.  So, innovative incentives have to be designed to encourage traders to adopt patient investment strategies and be rewarded for investing in longer-term outcomes.  Not only do they need to make investment decisions well in advance, as new information becomes available, they have to be encouraged to continue trading in the market.  This provides corporations with dynamically updated predictions, which yield valuable information on trends, level of uncertainty, and may indicate the strength of various factors influencing the outcome.

Sufficient, appropriate traders

As discussed above, companies need to have a sufficient number of traders for each market, to ensure that the “crowd” prerequisite is met.  Where necessary, these traders will need to be trained in trading on prediction markets, and the incentive systems need to be explained (and preferably tested), to ensure appropriate trading behaviour is encouraged.

Focus on Valuable Variables

Management needs to determine those conditions, events and actions (variables) that are most valuable to predict, and they must know what to do with the resulting prediction when it is determined.  Again, this sounds obvious, but it isn’t something that can be determined in a few minutes (as suggested by several of the software providers).

Dynamic Analysis

One of the major benefits of prediction markets over other forecasting methods is that they provide a built-in mechanism for continuously updating their predictions.  Assuming the appropriate incentives are in place to promote continuous trading, movements in the prediction over time provide valuable information to management.  Similarly, the distribution of “investments” in the prediction options provides a measure of uncertainty in the outcome, and changes in the distribution will indicate changes in uncertainty, providing management with an early warning system for evaluating forecasting issues.

Bottom Line

I do think that enterprise prediction markets will eventually reach a tipping point, but a lot of work needs to be done.  The academic literature is good, but it is becoming too technical and theoretical.  This has to scare the corporate types.  The focus needs to be on practical implementation issues.  It needs to get away from sweeping generalizations with respect to implementing prediction markets.  Consultants need to step up and focus on rigorous implementation planning that never forgets the first principles that make prediction markets work.  Then, we can be useful helping forward thinking executives run their companies better.

Comments?

Approaching business problems differently.

My response:

Hi Jed…

I agree with your summary. In addition, I think prediction markets offer several additional benefits, including: faster predictions, continuous updating, a measure of uncertainty surrounding the prediction, and reasonable cost. I’m sure there are a few more, but for now, this should do.

I am strongly in favour of using prediction markets to complement existing forecasting methods, especially where they are able to quantify uncertainty. Your example of project milestones is another excellent use for prediction markets.

Although your research indicates that as few as 15 participants can achieve calibrated results, I am skeptical. All of the market maker mechanisms will ensure that there is market liquidity, but I believe they influence trading behaviour (generally making traders quite risk-seeking), which undermines the accuracy of the predictions. Such market maker mechanisms were designed to allow markets to operate with smaller numbers of participants, but doesn’t this degrade the “crowd” precondition for successful market predictions? You aren’t likely to have a very diverse group with a small number of participants.

The basic theory behind prediction markets is that each trader has a piece of information combined with an error factor, and the aggregation method adds the pieces of information together, with the error factors cancelling out (more or less), resulting in more, accurate information. Having a smaller number of traders not only means having fewer pieces of information to aggregate, but also, the error factors will not “cancel out” properly.

As I see it, one of the major stumbling blocks is getting enough people to be interested in each market (and staying interested). The greatest benefit of prediction markets comes from forecasting the outcome as far in advance as possible, but this runs counter to the traders’ need to know the outcome “immediately” (or at least soon). You can see this in most marketplaces where very few people trade in long-term markets. Most trade in the markets that will close in the next few hours, days or maybe a week. These very short term market predictions have little value to a decisionmaker.

If prediction markets are to gain more widespread acceptance in the business community, they will need to focus on ways to ensure their predictions are useful. That is, by providing predictions well in advance of the outcome, accurately (diverse crowd, properly motivated) and with a very reasonable cost.

Much more work needs to be done in the area of motivating traders.

Just my thoughts for now.

Posted by: Paul Hewitt | March 14, 2009

Measuring Market Entropy in Prediction Markets

I came across this blog entry on Inkling Markets’ new support site:

Measuring Market Entropy by James Hilden-Minton, Ph.D.

He proposes using an entropy metric to measure market uncertainty in prediction markets.  While it is an intriguing idea to come up with a measure of uncertainty in these markets, I’m not sure whether this is the one that will do the trick.

Here is my response.

I, too, would like to see a metric to track the uncertainty surrounding market predictions.  The concept of entropy has some appeal, but I’m not sure how it might be applied in a prediction market marketplace.

 

Theoretically, market entropy will start out high and approach 0 as information is incorporated into the market, but very few markets actually achieve a 100% likely outcome before trading is suspended.  So, there will always be a positive entropy metric for a market.  Even where the market is almost, positively, certain of an outcome, there will be a fairly high entropy metric (relative to the range of entropies between the minimum and maximum values).  How do we determine how much entropy is too much?

 

If you want to compare entropy across a variety of markets, you would need to standardize the metric.  I imagine this might require using logarithms with the base equal to the number of possible outcomes (binary market = base 2, American Idol = base 36?!). This will ensure that the maximum entropy possible in every market is 1.000.  Then, we might compare entropies between markets.  But this, too, has problems. 

 

Consider two markets, one binary, the other has three outcomes.  If the binary market uses log2 and the odds are even, entropy would be 1.000.  If the 3-outcome market uses log3, with even odds, the entropy would also be 1.000.  We should be able to compare the relative entropies of these two markets.  Now, after some trading, the traders sell off all of one of the three outcomes, leaving only 2 outcomes.  Now, the market is, essentially, a binary one.  If the remaining odds are 50% for each outcome in both markets, the measure of uncertainty should be the same, but they aren’t.  The 3-outcome market now has an entropy of 0.631 vs. 1.000 in the binary market.  Part of the problem is that the initial odds are not equal for each possible outcome (and they rarely will be).  You can carry this analysis further by making the odds for the two outcomes the same, in each market, and comparing the resulting entropy metrics.  For example, at 10% / 90% the entropies are:  0.469 (binary market) and 0.296 (3-outcome market).  The level of uncertainty is virtually identical, but the entropies are quite far apart.

 

Perhaps the answer is to track entropy from the initial entropy metric when the market opens with management’s best estimates of the initial probability distribution.  As the market moves the distribution, the entropy would decrease (hopefully).  This may provide a measure of decreasing uncertainty.  It may be able to show that the market is helping to decrease the measurement of uncertainty, relative to management’s best estimates.

 

There is also a problem with using entropy to determine when the market has incorporated all information (an equilibrium?).  For example in a binary market, entropy will be at its maximum when the odds are 50:50.  A slight change in odds will produce only a very slight reduction in entropy.  The problem is that this is almost the most uncertain condition, yet the entropy would be “flat”.  The entropy will change the greatest amount as one of the outcomes becomes most highly favored by the market (and there is less uncertainty surrounding the outcome).  Think of the US election, where there was only a small difference between the Democrat and Republican votes, yet it was a landslide.  The entropy would have been quite high, yet there was a significant “certainty” to the prediction.

 

It is, however, an intriguing concept that should be explored further.  My preference would be for a statistic that measures the dispersion of the probability distribution of outcomes, which could be tracked and compared between markets.

 

Just a few thoughts from a non-mathematician.

 

 

Posted by: Paul Hewitt | March 12, 2009

Fallacy of Economic Forecasts

Economists are too well-known for their wild forecasts of future economic conditions.  Just today, I reviewed the Economists’ poll of economic forecasts for many of the world’s economies.  Projections of the percentage change in real GNP ranged from minus 1% to minus 4% (except for Japan at minus 7.6%).  At least the poll is showing negative growth for 2009 and 2010!  A short while ago, these projections were showing modest gains!

Anecdotally, at least in Canada, we are seeing most of the population cutting back on their spending by substantial amounts (not 1 – 4%).  The cut backs are not confined to major expenditures, but include the smaller items in a household budget, like dinners out, entertainment, vacations, etc…  Among those people that I know (rich, poor and in-between), everyone has been cutting back their spending by a minimum of 10%.  I live in an area that used to receive 5 – 10 ad mail pieces per day.  Now, it is down to one or two.  Small businesses are putting a hold on their advertising to local residents.  Could it be that consumers are deferring their purchases no matter how good the product or price?  For the most part, I think this is the case.  I think it is safe to say that the true drop in economic activity is far in excess of 4%.  But this reduction is not reflected in the economic forecasts.  Why is that?

Well, so far, we have not seen the full effects of the downturn in consumption, as businesses are simply depleting their inventories.  The replacement orders will not be coming as quickly or to the same extent as they were a year ago.  When this starts being felt by the manufacturers, watch for another round of layoffs and terminations.  Watch for prices to fall, as businesses try everything to generate cashflow.

Economists use forecasting models to predict the future.  These models have numerous assumptions built into them, based on historical relationships and trends.  We are in a completely new situation in today’s economy.  The old assumptions simply do not apply the way they used to (not that economists’ forecasts were particularly accurate before)!

Take a look at the stock markets.  Their indices have dropped substantially over the last year.  In very simplistic terms, theoretically, stock prices are based on discounted future cash flows and risk.  Certainly, the economy has become riskier, and firms’ prospects of generating cashflows have decreased.  The “market” appears to have estimated that cash flows will be decreasing by as much as 40%.  This, too, doesn’t jive with the economists’ projections.

Perhaps a more likely reason for “rosy” economic forecasts is that they don’t want to be too pessimistic and they do like to stay within their comfort zone in terms of past predictions and the predictions of other economists!  If they were too pessimistic, perhaps consumers would become collectively depressed, adding fuel to the fire.

My advice, keep your eyes and ears open to guage for yourself where the economy is headed.  Until you start to see real people becoming optimistic and starting to spend on discretionary things, we will continue to be in a severe recession.

Clearly, to be successful, an EPM must be more accurate than other means of forecasting, given the cost of setting up and running the market (i.e. benefits > costs). Also, it has to provide predictions for conditions, events or actions, sufficiently in advance, such that the corporation may take action to mitigate losses or take advantage of expected opportunities. The corporation must be able to change course, if the prediction market indicates that this would be wise. If the enterprise is unable to act, even with better information, the value of the prediction is minimal.

A major determinant of an EPM success should be their ability to measure uncertainty regarding predictions of future conditions, events and actions. This would allow corporations to focus their contingency plans (including hedging, insurance, etc…) in the areas most likely to require them (and avoid wasting resources on unlikely future situations).

Of course, the prediction markets must operate effectively, meaning they must have a “crowd”, that has diversity, independence and is decentralized. Ideally, successful markets should not exhibit risk-seeking behaviour by the participants, as most prediction markets are employed to reduce decision-making risk. I fear that most prediction markets employing market scoring rules provide substantial incentives for participants to become risk-seeking. As a “work around” for a failure to attract a “crowd”, MSRs are good for creating liquidity, but not so good for obtaining accurate predictions.

Just a few comments for now.

Posted by: Paul Hewitt | March 9, 2009

My Current Issues List

Here is a list of prediction market issues that I am currently researching:

  1. How many traders are required to reach a reasonably accurate prediction?
  2. How much research/information search, on the part of traders, is sufficient?  Desirable?
  3. How much information may be supplied to traders, without introducing a bias that would degrade the accuracy of predictions?
  4. How does the price algorithm affect trading behaviour?  View my initial comments
  5. What is the best way to allow short selling?  Is it necessary?
  6. What is the best way to formulate questions to be predicted?
  7. How long does it take for a prediction market to reach an equilibrium?  How do we know?  What does it depend upon?
  8. How do you minimize herding, cascading and trader dependence?
  9. How do you minimize trader guessing (as opposed to informed trading)?
  10. What role does a trader’s risk profile play in arriving at the outcome prediction?
  11. Is it important that the collective trading risk profile match that of the real decisionmaker?
  12. What is the best way to measure uncertainty in two outcome markets?  Comments on Entropy Idea
  13. Which types of questions are least well suited for prediction markets?
  14. Why aren’t more corporations looking to prediction markets to measure uncertainty of their predictions of future events, conditions and actions?

Over the next few weeks, months(!), (depending on how busy I am), I hope to resolve many of these issues.  After that, golf season is here and the answers may have to wait until the Fall!

Any help getting me to the course sooner will be much appreciated!

Posted by: Paul Hewitt | March 9, 2009

Market Scoring Rules in Prediction Markets

Robin Hanson has developed a class of market scoring rule market makers that are, perhaps, the most prominently used in public prediction markets.  In particular, it is his Logarithmic Market Scoring Rule (LMSR) that is most widely used.  Having played around with many public prediction markets that utilize the LMSR, it appears to be an excellent mechanism for ensuring adequate trading takes place in thin markets, where there are only two outcomes.   Where there are only two outcomes (e.g. which of two sports teams will win the big game?), the odds/prices will tend to be within a relatively narrow range around 50%, giving rise to reasonable returns for a reasonable risk.  However, in markets where there are more than two outcomes, I believe the LMSR distorts trading behavior, by creating substantial incentives for traders to become risk seekers.

For example, on Hubdub, there are markets to predict the DJIA close, with the possible outcomes expressed in ranges.  The more ranges offered, the lower are the prices (odds) in the ranges, because the bets are spread out among more options.  The lower the odds, the higher to potential payoff.  Often, it is advantageous to place bets in several of the likely ranges.  You may lose on one or two bets, but you will end up with a tidy profit on the one that does “win”.  Given that these markets run daily, it is a relatively easy way to increase your “wealth”, quickly.  Part of the reason that this strategy works is that you can always sell to limit your losses as it gets close to the market closing.  The LMSR guarantees that you will be able to sell out of unwanted positions.  I believe the LMSR creates an incentive for traders to become risk seeking, and I don’t think this is a good thing for prediction markets.

Ideally, in the corporate world, prediction markets should be able to more accurately determine the level of uncertainty about future events, conditions and actions.  Corporations tend to be risk averse, or at best, risk-neutral.  In the current economic environment, the last thing you want is a corporation that is risk-seeking.  So, why would you want to base corporate decisions on a prediction market that rewards risky behavior?

I find it interesting that David Pennock’s Dynamic Parimutuel Market Maker (DPM) mechanism has not been put to wider use in prediction markets.  Under this system (similar to horse race betting), market liquidity is automatically created by allowing all traders to purchase any share (horse) at any time.  Under a straight parimutuel system, it pays to wait until just before the market closes, because there is no way to sell a “bad” bet before the start of the race.  Under Pennock’s DMP, he provides a mechanism to allow traders to sell their previous investments, under a Continuous Double Auction (CDA).  If there is sufficient liquidity (i.e. large number of traders), bad investments may be sold to other traders before the market closes.  However, there is no guarantee that a trader would be able to do so, except at a heavily discounted price (which would take into account the current risks).  This is a good thing!  If purchasers were unsure that they would be able to reverse their bad investments, they would exercise more caution before purchasing.  They would tend to act in a risk averse or risk neutral manner.  This is the type of trader behavior that we would like to see in an internal prediction market.

So, I ask, why hasn’t David Pennock’s DPM market maker achieved more widespread use?

Posted by: Paul Hewitt | March 9, 2009

Comment on Mercury’s Blog about the Economist Article

I agree, public markets are very different from internal ones.  Most of the public markets suffer from having too small of a “crowd”.  Though several cite thousands of users, few of the individual markets have more than a few actual traders.  As a result, they have to use a MSR or some other form of automated market maker in order to ensure enough liquidity for the market to function at all.  In my opinion, this is simply a workaround to the problem of not having enough traders (let alone their having sufficient diversity).  While a MSR does allow for trading among a small number of traders, the fact that it automatically allows traders to sell out of positions affects the traders’ risk preferences, skewing their “investment” behavior.  Internal markets should (and must) avoid this problem, in order to be successful. 

The software vendors make it sound very easy to set up internal prediction markets.  We should keep in mind that the software is but a tool to be used by the prediction market, which is a tool to be used in corporate decision making.  The software solution is relatively easy (apart from the choice of mechanism).  The hard part is satisfying the necessary conditions (crowd, diversity, independence) for accurate predictions.

Generally, public markets stay open until just before the outcome is revealed.  They provide very little useful predictive value, if any.  Internal markets must reveal their predictions much earlier, if they are to be useful and become more widely used. 

Finally, I believe the corporate world has missed the opportunity to use prediction markets to measure the uncertainty surrounding predictions of key events and conditions (so far, at least).  There is very little literature on the use of prediction markets to measure uncertainty.  I believe this would be a very valuable use of prediction markets in the corporate world.

Posted by: Paul Hewitt | March 8, 2009

Hello world!

Welcome the Toronto Prediction Market Blog.  My primary focus concerns the practical use of prediction markets to solve corporate forecasting and prediction problems.  Personally, I am fascinated by the potential of prediction markets to improve corporate information and improve decision making.  I am somewhat dismayed by the current low level of use of prediction markets in the corporate world.  Through an open discussion, I hope to help promote the beneficial use of prediction markets in the business world.  From time to time, I may comment on various aspects of public prediction markets, especially where they shed light on the process of predicting events using markets.

I am open to all comments and discussions.  I am particularly interested in hearing about any new applications or pilots of prediction markets being used by corporations anywhere in the world.

Enjoy!

Here is my comment regarding the article on the uncertain future of prediction markets that appeared in The Economist, recently:

After having reviewed much of the available literature on the theories supporting prediction markets and the few publicized cases of corporate pilot projects, I still believe that prediction markets can play a very valuable role in business decision-making. In addition to the reasons cited in this article for prediction markets not catching on in the corporate world, here are a few more.

In theory, for prediction markets to work properly, there must be a fairly large number of diverse traders, (who trade on their own private information), they must remain as independent as possible in their decision-making, they must be motivated to reveal their true opinions through their trading activities, and there must be a mechanism for aggregating their trades.

In practice, the prerequisites for successful predictions have been relaxed. It is difficult to find (and keep interested) a large enough number of participants to keep markets trading, in order to reach an “equilibrium”. Some are designed with an automated market maker, to ensure that all bids and asks can be fulfilled, even if there is no willing seller or buyer. Combined with incentives, this system skews market behaviour, exhibits “herding”, and ultimately, distorts the price mechanism that they are trying to achieve. Even where there are a large number of total participants, if there are too many markets available in which to invest, there will be individual markets with insufficient trading activity. Other markets will suffer from a trade-and-forget style of investor behaviour, resulting in illiquid markets shortly after opening. A quick review of many online (public) markets reveals a surprising number with “thin” trading, exhibiting large spreads between the bid and ask prices. Such markets may not reach their equilibrium prices, and the inability to make trades quickly saps participants’ interest.

To be useful in the corporate world, prediction markets must provide valuable predictions of future events, actions or conditions. It is of little use to know the likelihood of an outcome immediately before it occurs. Companies need to know the likelihood of various events at the earliest possible moment, so that contingency plans may be activated in time. The difficulty is in motivating participants to trade in an outcome that will not be revealed for a considerable period of time.

Undoubtedly, as the corporate world experiments further with these tools, valuable prediction markets will be found and exploited. One promising avenue that might be followed is the derivation of probability distributions surrounding the predictions. Management could obtain a clearer picture of the uncertainty surrounding each key forecast metric or event. More contingency planning could be devoted to those areas that exhibit the most uncertainty.

Categories