Posted by: Paul Hewitt | January 5, 2012

Good Judgment Project Performance

The Good Judgment Team is competing against other teams to see which one is able to “more accurately” predict future events (mainly political, so far).  After the first month of official predictions, the Good Judgment Team released the following statement by email (bold/italics are mine):

“Our forecasters are simply the best!  (That’s not just our opinion:  in the early days of the tournament, the Good Judgment Team’s aggregate forecasts have proven to be more accurate than those of any other research team participating in the IARPA tournament.)”

This got me to thinking.  How is the IARPA determining which Team is more accurate in their predictions?  I’ve posed the question to my team, but haven’t received a response, yet.  So, let’s make a few educated guesses.

Each Team has a large number of participants.  On our Team, there are a number of groups, presumably with some common characteristics, that are each predicting future events.  We took a variety of tests before joining the team, to measure or describe how we make decisions, process information, etc…

Almost all of the questions about future events are  binary.  They will either happen or not, by a specific date.  Our Team uses a modified prediction market to generate a likelihood of each event occurring (more information here).  Now, this is where it gets interesting.  I’m guessing that most, if not all, of the Teams predicted the correct outcomes for most of the questions.  If our Team got one or two more correct than the other teams, does that really mean that we are “simply the best“?

Could it be that our collective likelihoods of the events that occurred were higher than those for the other Teams?  In other words, when an event did happen, our Team gave the event a higher likelihood of occurring.  Jeez, I hope not, for a number of reasons.  Remember, these are binary events.  Just because a likelihood is higher doesn’t mean that it is more correct than a lower likelihood prediction!  What we really want to compare is the calibration of the market predictions with market outcomes.  Unfortunately, there isn’t enough data, yet, to determine whether our predictions are better calibrated than any other Team’s.

These markets are kept open for trading until a day or so before the outcome is revealed, unless the outcome is determined prior to the anticipated closing.  Uncertainty surrounding the outcome decreases as time marches toward the market closing.  Consequently, at the market close, all that should remain is the irreducible uncertainty (random events that affect the outcome).  Accordingly, most market should converge on a likelihood close to 100% for one of the binary outcomes, and there shouldn’t be very much variability among the Teams.

Could it be that accuracy is being determined at various points in time prior to the market close?  It’s a better basis, but again, we can’t prove calibration.  So, this isn’t likely the answer.  Maybe it’s the speed of adjusting predictions, given new information?  I doubt this one, too.  In some cases, information will lead one forecaster to conclude the event is more likely and another to conclude the opposite.  It would be impossible to determine whether the market was incorporating new information in every case.

Maybe our Team won more money.  Nope.  Basically, with an Automated Market Maker, except for the seed capital, it’s a zero sum game.  All teams would do equally well, with the same system.

Conclusion

Let’s forget for a minute that these predictions are pretty useless, if they’re only “accurate” immediately before the outcome being revealed.  How many times have I spouted on about this issue?  Also, they’re predicting binary events.  There’s no such thing as being almost right in a binary market.  So, even though it isn’t theoretically correct, I’m going to guess that the IARPA thinks a higher likelihood prediction is more accurate than a lower likelihood one, when the event does, in fact, come true.  Maybe that’s the best they can do, until they figure out the calibration issue.

Posted by: Paul Hewitt | November 5, 2011

The Good Judgment Project

I have been participating in The Good Judgment Project, one of five teams in a US government sponsored, four year, forecasting tournament.  Each team develops its own methods for forecasting world events.  Our team is based in the University of Pennsylvania and the University of California Berkeley.  I gather each team will be using some form of collective intelligence to make predictions.

This may change, but our present aggregation mechanism is an odd variant of a prediction market with an automated market maker.  Let me explain.  During the first two months, just about every question has been binary (either it will happen or it won’t).  Apparently, there may be some questions that have up to five derivative shares in a winner-take-all market.  All markets involve an automated market maker.

Participants can place trades (up to $1,000) in any market, for the event to happen or not, by a given date.  As trades are filled, the market price changes.  So far, so good.  The twist is that trades can be rescinded at any time up until the market closes or the event becomes known.  When you rescind a trade, you get back all of the money that was originally invested.  Huh?  That’s right, there’s almost no risk of selecting the wrong outcome!  But, part of what makes markets “accurate” is that there is a consequence for being wrong.  Not so here.  In a traditional prediction market, selling out of a position would net you the current market price (not your original purchase price).

At least you can’t take positions in both sides of a binary market!  The market mechanism encourages you to bet the maximum, usually at the beginning of the market.  This will allow you to double your investment (if you are correct).  In some cases, the likelihood will fall and you can generate a higher profit by investing at that point.  Usually, you will want to maximize your bet when you first enter the market, because if you try to revise your bet later, you will receive the new payoff on your entire investment (if correct).

If the odds for the outcome you selected start to fall, but you still wish to hold that investment, you need to continually revise your investment, to obtain the most favorable odds.

The other quirk is that the maximum bet is $1,000 (previously $500).  That’s a minor point, but it does potentially hinder someone with “perfect” information from placing a bet that would move the market to the appropriate likelihood.  Recall that part of the rationale for prediction markets is that it helps identify the best forecasters (they have the most funds).  When you combine this with the failure to penalize poor guesses (by allowing traders to rescind investments without penalty), I’m wondering whether this particular prediction market mechanism will be as accurate as it might otherwise be.

Posted by: Paul Hewitt | August 14, 2011

In Search of a Better Prediction Model

Among other things, Robin Hanson is famous for advocating the use of prediction markets, where their predictions are “more accurate” than other methods of forecasting.  I won’t argue with that, as long as the benefits of being more accurate exceed the marginal costs.  However, if you’ve been keeping up with my blog, you should come away with the thought that I’m not quite as high on the prediction market fumes as some of the other adherents.  I find prediction markets to be wanting in many significant areas.

The Search for Something Better

A few years ago, this got me to thinking.  If prediction markets might be better than alternative prediction methods, could there be an even better model?  And so, I scoured the literature in search of just such a model.  I thought I had found one a couple of years ago, and set out to prove the case for its replacement of prediction markets.

In making my assessment of “better”, in terms of predictions, I considered the calibration of the predictions with the actual outcomes and how far in advance the calibration was reasonably accurate.  I chose to consider the latter characteristic, because prediction markets are notoriously poor at being able to predict anything but very short-term outcomes.

I am pleased to report that my alternative prediction model appears to be better than prediction markets in most respects!  My model was able to match the calibration of prediction markets in every case, but the real benefit was how far in advance my model was able to predict the outcome, with equal or better calibration than prediction markets!  In all cases, my model was very well-calibrated with the outcomes a full two years prior to the outcome being revealed!   To my knowledge, no prediction market has ever been well-calibrated two years prior to the outcome.

Not only that, but my model was able to achieve this level of accuracy for the most difficult to predict outcomes.  Unfortunately, however, my model was not able to forecast so-called “easier to predict” outcomes with the same level of accuracy.

A Model Prediction Model

Coin toss

I’m sure I have kept you in suspense long enough.  My model involves a hand, a wrist and a coin.  Who knew that a simple coin toss might be as good, or better, a predictor of future events than a prediction market?  Very difficult-to-predict binary events have a likelihood near 50%.  If a prediction market for such an event indicates a 50.1% likelihood of occurrence, the decision-maker would predict that the event was going to occur, and he’d be right about 50% of the time.  Same thing with the coin toss, but we can toss the coin two years before the event and get an equally well-calibrated prediction.  For these really-hard-to-predict events, prediction markets, typically, fluctuate all over the map before settling on the safer 50% likelihood.

Earlier, I noted that the model does not work as well with easier-to-predict events, like for example, an event with a likelihood of 75%.  Rest assured, I’m experimenting with a new version of the model which involves bending the coin with a hammer before the toss.  I’ll let you know how that turns out.

One problem with the new model is that it only works on binary events.  However, I’m working on an even better one that will work on a group of mutually exclusive and exhaustive events (winner-take-all).  It involves darts and a dartboard.

Back to the Drawing Board

Obviously, this was intended to be a humorous post, poking a bit of fun at prediction markets and calibration.  This is the lead-in to a series of upcoming posts, in which I hope to tie together the concepts of uncertainty, price distributions, calibration, accuracy, prediction market design, and market mechanisms.  None of these issues has been adequately researched by the major players in the prediction market arena, and it is one of the major reasons why prediction markets continue to flounder.  I hate to think that it is a fear of uncovering evidence that is not supportive of the use of prediction markets that holds back the researchers.

Posted by: Paul Hewitt | August 10, 2011

The Forgotten Principle Remembered

I suppose I should be flattered when another author makes reference to, and adopts, a concept that I developed.  But surely, half the fun comes from the formal citation showing where the brilliant idea was found!  Alas, such was not the case, when I read the recent Forrester Research Inc. report:  How Prediction Markets Help Forecast Consumers’ Behaviors, by Roxana Strohmenger.

In discussing the principles that help ensure prediction markets provide accurate predictions, the author makes reference to “information completeness”, in the following passage:

At the end of the day, a prediction market must have sufficient “information completeness” even if the individuals interacting in the market do not, to accurately predict outcomes with a reasonable degree of certainty.

Here is the passage where I introduced the concept of “information completeness”:

Prediction markets must have sufficient information completeness to accurately predict outcomes with a reasonable degree of certainty.

I added the bold italic parts to show the exact same words in each paper.  I’m still flattered, just a bit miffed.

Galton’s Ox Revisited

One other interesting point in the paper concerned a reference to a recent test in the Netherlands that tried to replicate Galton’s ox experiment (James Surowiecki, The Wisdom Of Crowds).  Using 1,400 guessers (oops again, I mean participants), the average estimate of a cow’s weight was 552Kg, but the actual weight was 740Kg.  The guessers were off by a full 25%!  How could this happen?

The average guess of Francis Galton’s townspeople was remarkably accurate (1,197lbs vs. 1,198lbs).   Clearly, the townspeople were a bit more knowledgeable about the likely weight range of a butchered ox than the Netherlands guessers were about the weight range of a cow.  The author of the Forrester paper calls this “perspective“, which is a good word for it.

I called it  having a minimal level of information about the subject in order to make a prediction.  If you think about the problem, logically, when the townsfolk made their estimates, there was a fairly narrow range of possible weights from which to choose.  We would expect a normal distribution of guesses that would centre around the true weight, given reasonably small estimation errors (which cancel).

The cow guessers didn’t have a narrow range of possible weights (they actually guessed between 108 and 4,500 Kg.)!  The errors would have been much more significant, on average, and much less likely to cancel out when aggregated.

Interestingly, there must have been a few knowledgeable cow weight estimators among the 1,400.  Would a prediction market have provided a more accurate number than the simple aggregation of estimates?  That would have been an interesting follow-up experiment.

On a humourous note, this research paper is the first I’ve read on prediction markets that does NOT mention Robin Hanson.  How can this be?

Posted by: Paul Hewitt | August 9, 2011

Fallacy of Economic Estimates

Back in March, 2009, I wrote about the Fallacy of Economic Forecasts, essentially arguing that economic forecasts are bullshit (or for the faint of heart:   most likely wrong).  In an odd sort of way, the “forecast” was really a future estimate of past economic results.  Maybe I should have changed the title to the Fallacy of Economic Estimates.

Well, in this week’s The Economist, Growth figures:  Six years into a lost decade, there is ample proof of my claim.  The U.S.  Bureau of Economic Analysis (BEA) has revised it’s growth numbers for the 4th quarter 2008.  Initially, it was estimated to contract 3.8%.  This was revised a year later to indicate a much more serious decline of 6.8%.  Now, it has revised the estimate downward still, to 8.9%.

The inaccuracy is blamed on a piecemeal and slow collection of survey data, which gets fed into a national economic model.  Revisions to past estimates are made but once a year.

Perhaps the BEA needs a better model to estimate economic growth!  Maybe take a walk down Main Street and see how many storefronts are for lease.  Measure the length of unemployment lines.  Actually talk to real people about their spending plans.

In March, 2009, I estimated that growth would be down at least 10% compared with government estimates of -1% to -4%.  Seems that my noggin houses a better economic model than the that of the Bureau of Economic Analysis.

Posted by: Paul Hewitt | February 28, 2011

The Oscars 2011 – The Good, The Bad & The Ugly

We already know, or should know, that using prediction markets to forecast who will win what, as determined by a panel, is pointless.  Remember last year’s markets?  The Olympic site markets?  Britain’s Got Talent?  It really is a fool’s pursuit to try and out-guess the people that actually make the choice!

So, knowing that, at best, the Oscar prediction markets are mildly amusing diversions, I present a few interesting observations.

When we use prediction markets to make decisions, we usually make a decision based on the most likely possible outcome in the market.  Consequently, in Oscar prediction markets, when we rely on the markets, we select the actor/movie that the market gives the highest likelihood of winning.  As I have written before, in discrete markets, you will be disappointed using prediction markets.

The Good

Prediction markets at Inkling and HSX had a few amazing successes!  Yes, once again, prediction markets have proven to be remarkably accurate predictors of slam-dunk outcomes.  We can now say, at least anecdotally, that if an Oscar prediction market gives an outcome at least a 70% chance of occurring, we can rely on the market to pick the correct outcome.

Here are the markets that predicted an outcome with a 70%+ probability of occurring:

  • The King’s Speech wins Best Movie (71.28% on hsx)
  • Colin Firth wins Best Leading Actor (89.36% on hsx)
  • Christian Bale wins Best Supporting Actor (77.92% on hsx)
  • Natalie Portman wins Best Leading Actress (81.04% on hsx)
  • Toy Story 3 wins Best Animated Feature Film (94.82% on Inkling)
  • The Social Network wins Best Film Editing (76.29% on Inkling)
  • The Wolfman wins Best Makeup (70.74% on Inkling)
  • Inception wins Best Sound Editing (76.83% on Inkling)
  • Inception wins Best Sound Mixing (77.53% on Inkling)
  • Inception wins Best Visual Effects (93.51% on Inkling)
  • The Social Network wins Best Adapted Screenplay (74.16% on Inkling)
  • The King’s Speech wins Best Original Screenplay (71.52% on Inkling)
  • The King’s Speech wins the Most Oscars (70.1% on Inkling)

 

The Bad

There were a few “upsets”:

  • Alice in Wonderland won for Best Art Direction (18.04% on Inkling), even though The King’s Speech (favourite at 38.25%) and Inception (26.68%) were more likely to win.
  • True Grit was favoured to win for Best Art Cinematography (65.19%), but Inception (11.53%) did win.
  • Alice in Wonderland won for Best Costume Design (31.27%), but The King’s Speech was favoured at 46.67%.
  • The Inside Job won for Best Documentary Feature (30.78%), but Exit Through The Gift Shop was favoured (51.34%).
  • Biutiful (34.94%) got beat out by In a Better World (24.98%) for Best Foreign Language Film.
  • The Lost Thing (6.95%) pulls off a major upset against The Gruffalo (42.09%) and Day & Night (36.89%) to win Best Animated Short Film.
  • The God of Love (12.08%) wins the Best Short Film, beating out front runners, Wish 143 (39.34%) and Na Wewe (27.13%).

There was another possible upset.  The King’s Speech won the Oscar for Best Directing.  Was it an upset?  On the HSX, it was a bit of an upset.  The Social Network was favoured at 54.44%, but The King’s Speech won with 33.48%.  On Inkling, however, the two films each had an identical likelihood of winning, at 43.68%.

Getting Better All The Time?

In most prediction markets, we expect the forecast to get more and more accurate the closer it gets to the outcome being revealed.  In the Best Directing Oscar markets (HSX), we saw the exact opposite!  Basically, it was a two-horse race between The Social Network and The King’s Speech.  The King’s Speech had been steadily becoming less likely to win over the last three weeks of trading.  In normal markets this type of trend would require a steady diet of negative information.  Logically, we would expect sudden jumps in likelihoods, when (if) significant information comes to light about which way Academy voters are likely to vote.  I suppose it is possible for there to be a gradual revelation of information (say one voter/day discloses his vote), it isn’t likely.  The Academy likes to keep these things secret until the show.

At any rate, the market was right, but trending wrong.  Maybe there was some information that came to light, resulting in more uncertainty about the outcome.  Then again, maybe the predictors were really just guessers, and the markets are simply aggregating “garbage information”.  Garbage in, garbage out.

While this may not have been an upset, it does bring up another important issue.  Two prediction markets  trying to predict the same thing, unfortunately, the markets predicted significantly different likelihoods.  There were many examples, here are but a few:

For the Best Original Screenplay, The King’s Speech had a likelihood of winning of 71.52% on HSX but only 53.99% on Inkling.  That’s a difference of almost 18%.  Seems quite high to me.  The same thing happened with the Best Adapted Screenplay, where The Social Network won.  This time Inkling predicted it with a likelihood of 88.93%, while HSX gave it a likelihood of only 74.16% (about a 15% difference).

Suffice it to say, the prediction market “industry” must find out why this happens and how it can be corrected.  Otherwise, these types of markets should be abandoned for serious prediction purposes.  What am I saying?  These aren’t serious prediction markets!  Okay, the industry needs to get to the bottom of this issue, so these types of markets can be used as fair betting markets.

There are several possible reasons for the different likelihoods, and none of them help the case for prediction market accuracy or usefulness (for these types of markets).  I’ve discussed these issues in previous posts (too many to link to), so I won’t do so here.  If you took the time to read The Wisdom of Crowds, surely, you can spend a couple of hours reading this blog to learn the reasons.

Something Doesn’t Add Up

Inkling’s prediction markets consider each award as a separate market, with each nominee being a separate “share” within the market.  Accordingly, the sum of all of the likelihoods of the possible shares always add up to one (1.0 or 100%).  However, on HSX, each nominee is a separate market.  All of the markets (nominees) for a particular category are aggregated to show the results the same way Inkling does, but the sum of the likelihoods did not always add up to one.  In fact, they were often significantly different.

For examples (Award, sum of likelihoods),

  • Best Picture, 93%
  • Leading Actor, 109%
  • Supporting Actor, 110%
  • Leading Actress, 111%
  • Best Directing, 106%

Even though this is a phenomenon created by the structure of the markets, it still begs the question – why?  Shouldn’t the markets have been arbitraged back to a total likelihood of around 100%?  Not only did these discrepancies occur, they persisted!  While I didn’t continuously monitor these markets, I did take snapshots at various times and the sum of the nominee markets rarely added up to 100%.  If I start getting into all of the reasons why this might have happened, this would turn into a book. 

The Ugly

No one told us the writers had gone on strike, again!  A mere eight minutes in and we had barely cracked a smile.  When we did, it wasn’t for anything either of the hosts said, it was for the wink that Anne Hathaway directed at Colin Firth (as the King) in the opening film vignette.  Other than that, there was a lot of odd (not funny) banter between presenters and little to keep us occupied until the next Anne Hathaway appearance.  Their writers were pathetic, but her makeup person seemed to be on his or her game.  Note to the Academy:  hire Randy Newman to write next year’s script.  Either that or put Ricky Gervais on speed dial.

Final Words

For the second year in a row, my picks (from the prediction markets) were better than my wife’s.  All that’s left to be determined is my prize for this feat.

Posted by: Paul Hewitt | February 1, 2011

Disaster Hits Toronto (Few Saw it Coming)!

It has been a relatively mild winter in Toronto this year.  Even parts of the Southern U.S. have been hit harder than we have.  It’s just as well, too.  While we do know how to drive in snow, we’re a bunch of babies when it does come down after Christmas Eve.  We’re about to get hit with a snow storm that is wrecking havoc across the US midwest.  This reminded me of another snowstorm that hit Toronto.

For a bit of comic relief, I present this video news report.  It pokes fun at Torontonians, who seem to have acquired a reputation for being, well, shall we say, a bit sensitive when confronted by inconveniences (or even Acts of God for that matter).  Enjoy.  Being a prediction market blog, I should note that no prediction market could have seen this storm coming.  We were caught completely by surprise.

Posted by: Paul Hewitt | January 12, 2011

Prediction Market Prospects 2010

INTRODUCTION

Gartner Hype Cycle Social Software 2010

As we can see from the Gartner Hype Cycle Graph for Social Software, Prediction Markets are now on the downside of the dreaded “Trough of Disillusionment” (2010). Last year, it was just entering this phase, and in 2008 it was at the most-hyped “Peak of Inflated Expectations”.  The object of this paper is to examine the current status of the prediction market “industry”, discuss several troubling issues that are holding back enterprise prediction market adoption, and look at the prospects for the future.  Even if you get really sleepy reading this paper, keep going to the very end, where I will reveal a very, very long-term prediction!  Can you guess what it is about?

You’re probably already familiar with the following graph showing the Prediction Market growth trend.  It’s the one that appears in many presentations on prediction markets.  As far as I know, the graph hasn’t been updated since 2006.  It sure did look like the market was going to experience explosive growth!  Did it?


Prediction Market Growth Trend 1997-2006, Source: Newsfutures.

According to a McKinsey Global Survey of Web 2.0 adoption, enterprise prediction market “adoption” grew from less than 1% in 2007 to 8% in 2009.  This is how Consensus Point disclosed this results of the McKinsey report.  I looked at the actual McKinsey interactive graphs and found that prediction market adoption was 9% in 2008.  Does this mean that prediction market adoption had already peaked in 2008?  I thought we were just getting started!   If the survey is correct, prediction markets experienced more than an eight-fold increase in usage in the last two years.  Based on what we can see, there appears to be something wrong with the definitions of “adoptions” and  ”prediction markets”.  Alternatively, prediction market adoption is taking place behind closed doors or it isn’t really happening at all.

If the adoption rate is correct, why aren’t we seeing a significant spike in reported success stories?  There has been very little reporting of any prediction market results – good or bad.  I suspect the companies that have “adopted” prediction markets have done so in very limited pilot studies.  Here’s another possibility.  A quick review of several vendor websites indicates that many of the success stories involve idea pageant (or idea market) “prediction markets”.  I’m willing to bet that the companies that implemented these “markets” were included in those “adopting” prediction markets.  While this type of market does involve collective intelligence, it isn’t really a prediction market.

To start the review of the current status of prediction markets, let’s check in with Jed Christiansen, who recently posted his take on the industry.

There was nothing new in Jed Christiansen’s Prediction Market Review for 2010. His comments are correct, but he didn’t provide much commentary about the reasons for the developments over the past year.  Essentially, his summary was as follows:

  1. Real money betting sites are booming
  2. Free public prediction markets cannot survive without monetizing site traffic
  3. Software vendors are providing more consulting services to their clients

He sees the PM industry as “maturing”.  Existing vendors will continue to establish themselves, “as more companies experiment with new management tools and techniques.”  The problem with the industry is that the product is still in its infancy.  I don’t think you can call a market “maturing”, when the majority of the clients are merely experimenting with the concept of prediction markets and the “product” is, basically, still a concept.  As we saw at the beginning of this post, prediction markets appear to be firmly entrenched in the trough of disillusionment.  Furthermore, Gartner estimates that mainstream adoption is 5-10 years away, the same estimate they gave in 2009 and 2008.

Not only is the industry mired in the trough of disillusionment, I think the primary researchers are stuck in one too (with one notable exception)!  Over the last few years, there have been no important new research studies, no significant published prediction market trials, and no major prediction market issues resolved.  It is as if the researchers don’t want to look too closely at the issues for fear that some of them may seriously undermine the usefulness or potential of prediction markets.

I exclude one researcher (and his team) from the list of disillusioned researchers.  During the year, David Pennock and his group at Yahoo! Research, launched Predictalot to showcase a fairly complex example of a combinatorial prediction market.  So far, it has been used to predict the winners of the NCAA March Madness basketball tournament and the World Cup.  On a humorous note, Predictalot and its developers received the Best Prediction Market Development of the Year for 2010. I’ll have more to say about the significance of this development, below.

Let’s look at the reasons behind Jed’s industry developments, which will lead into a discussion of the issues holding back the adoption of prediction markets and the future prospects for the industry.

 

Betting Markets

Real-money prediction markets are booming and expected to continue to boom, not because they are good predictors, but because betting is booming.  The major players are Betfair and Intrade, neither of which spout on about the predictive abilities of their markets.

Discrete outcome markets (like horse races) are perfect for betting but not nearly as useful for making predictions and the decisions based upon them.  Most of the markets generate predictions that are too general or too public to be useful.  The value of information depends on having it before someone else and being able to act upon it.  Since these markets are ill-suited for useful predictions, their success will depend almost entirely on the public’s desire for betting opportunities.  Personally, I think these types of markets should be excluded from the definition of prediction markets.  Horse race odds are considered to be pretty good predictors of the race outcome, but we don’t consider horse race betting pools to be prediction markets.

 

Public Prediction Markets

Most public prediction markets are not very useful, at all.  Even if they were proven to be accurate, no one would pay for information that is already publicly available.  With few ways to generate revenue, growth prospects are bleak. Hubdub ceased operations during the last year.  While it was fun to play on their prediction markets, participants became disinterested as the novelty of “betting” on trivial outcomes wore off.

No amount of explaining will convince participants that it was a good thing that Susan Boyle lost Britain’s Got Talent, even though she had a 78% chance of winning.  Once we’re done explaining that, we can take a stab at explaining why there was such a wide variance between Hubdub’s (78%) and Intrade’s (49%) likelihoods of her winning.  Personally, I think she should have won!

HSX and IEM run somewhat more useful markets, but neither is very good at accurately forecast long-term outcomes.  Forecasting short-term outcomes is not particularly useful.  Unless HSX can be turned into a real-money market, the prospects for any commercial success are minimal.  However, this and other public markets are still valuable for research purposes.

Don’t expect any growth in this sector.

 

Vendor Consulting Services

This is a growth area, because their clients are ill prepared to create useful prediction markets without guidance.  Failed trials mean the client companies will stop experimenting with prediction markets.  Vendors help their clients achieve reasonable prediction results.  None of the existing vendors can survive on software sales, alone.  Vendors should try to get as many trials as possible and investigate the unresolved prediction market issues (see below).

There will be few new vendors, because the prospects for enterprise prediction markets are not very rosy (more about this, below).

 

WHAT IS HOLDING BACK ENTERPRISE PREDICTION MARKETS?

It’s no secret that prediction markets have not taken off in the corporate world.  Don’t corporate decision-makers know a good thing when they see it or is there something wrong with the product?

Since getting involved with prediction markets, I have maintained a list of issues that remain unresolved.  In my opinion, not resolving these issues is the reason enterprise prediction markets have failed to take hold in the marketplace.  Despite several researchers – especially Robin Hanson as the most published adherent – stating that prediction markets are at least as accurate as other forecasting methods, the case has not really been made (at least not to my satisfaction).

As we will see, prediction markets are unable to accurately predict long-term outcomes, and they have poor records for accuracy and reliability, all of which are crucial for enterprise adoption.  I haven’t mentioned the issues of market design, participant training, number of participants, etc…, because these things are easily solvable.  It makes little sense to tackle these issues, unless the important issues are resolved first.

 

“Just in Time” is Not Timely Enough

Prediction markets need to be able to forecast long-term events.  In order to make long-term decisions, we need information about conditions, events and outcomes that will occur far off in the future.  Well, at least longer than a month or two!  While there have been several long-term prediction markets (public ones), not one has provided an accurate prediction of the future outcome, until very close to the time when the outcome was revealed.  Such predictions, no matter how accurate, are not actionable.  In other words, these markets have been wholly inadequate for management decision-making purposes.  The use of prediction markets to forecast any long-term outcome is questionable, if not down-right dangerous.

The following two graphs of historical prices in two long-term (14 year) prediction markets are  from Ideosphere.  In both of these markets, the predictions only became reasonably accurate during the last year before the outcomes were revealed.  Of course, some prediction market advocates will argue that the markets were accurate throughout the trading period.  The market price, at any point in time, accurately reflects all available information in the market at that time.  Consequently, the markets are considered “accurate”.  However, they aren’t accurate, if our purpose is to rely on them to make decisions about outcomes in the long-future.

Unfortunately, even if these long-term markets are “accurate” several years away from the outcome, we have no way of knowing whether they can be relied upon.  It is impossible to verify the calibration of these markets (though it has been claimed that they are – 30 days before the market close).  It is difficult to imagine that these markets were calibrated back in 1998, where the market prices were approximately 75% – 80%, yet the eventual closing prices were 0%.  It’s possible, but highly unlikely.  It is much more likely that these markets were reflecting a significant amount of uncertainty about the outcome.

Ideosphere 14 year Cancer Cure Market

Ideosphere 14 year Earthquake Market

The longer the trading period of the market, the more sources of uncertainty there will be.  The steady march of time gradually reduced the uncertainty in these markets.  It is as simple as that.  Even if it were possible to acquire enough information to reduce the uncertainty surrounding the outcome, it is highly unlikely that the incentives would be enough to cover the search costs.

I don’t have the answers as to why these markets have not worked, but here are a few possibilities:

  • Traders are not patient enough to bet on long-term events.  They want to make a trade and quickly find out whether they have won.
  • The longer the time period between the prediction and the outcome, the more likely it is that there will be more random, intervening events that affect the outcome, increasing uncertainty.
  • Intervening events that have a complex influence on the outcome will increase uncertainty around the prediction AND increase the likelihood of a wrong prediction.  Such outcomes may not be predictable by any method.

As the markets move closer to the outcome, uncertainty about intervening events decreases.  Generally, about 30 days before the outcome, the markets become reasonably accurate.  In fact, for most of the period the prediction markets were in operation, the predictions were wildly inaccurate! The question is whether there this is enough advance notice for the prediction to be acted upon, making them useful.

Here is an example from IEM, used to show how even fairly heavily traded markets are unable to make actionable predictions until very near the market close.

IEM 2006 US Congressional Control Market

Note in the Congressional Control Market for 2006, the market prediction was inaccurate until a few days before the election.  For decisions that need to know which way the election would go, the prediction would likely be too late.  Most long-term markets exhibit this characteristic.

Accuracy

The Hewlett Packard pilot was one of the first studies of enterprise prediction markets (my commentary, here).  Even though it is over 10 years old, it is still the most often cited case!  This pilot study found that 6 of 8 markets outperformed the company’s internal forecasts.  That’s pretty good, except that the “better” predictions were only slightly better and three of the predictions were really poor (greater than 25% error).  One of the study’s authors commented:   “The accuracy improvement was not high enough to be adopted,” says Chen. “You need to be a lot more accurate before it’s worth it to implement a new process.”

We can say that these markets were effective aggregators of participant information.  When you consider that the participants in the prediction market trials were also involved in making the internal forecasts, it is not difficult to understand why the prediction markets were better at predicting the internal forecasts than they were at predicting the actual outcomes!  Unfortunately, prediction markets need to be good at predicting the future outcomes.

The General Mills trials showed that prediction markets were as good as internal methods, but they were not significantly better and some of the internal forecasters were also participants in the prediction markets.  It should be kept in mind that these were very short-term predictions, such that it would have been almost impossible to act upon the predictions.

Pennock et al showed that prediction markets were accurate (in the cases they studied), but they were not significantly more accurate than alternative prediction methods.  They concluded that in order for prediction markets to be useful, they must be significantly better than alternative forecasting methods.  In the cases they studied, they found prediction markets were only slightly better than other methods.  In previous posts, I introduced the concept of materiality to the analysis of prediction markets.  Essentially, for a prediction market to be useful, it must be more accurate than the next best predictor, such that the more accurate prediction would make a difference to the decision-maker relying on the forecast.  Then, we need to look at the costs and benefits to determine whether the use of prediction markets is a wise course of action.

One of the measures of accuracy is calibration.  We can be fairly sure that horse race odds are well-calibrated with race outcomes, because we can analyse thousands of homogeneous races to prove the claim.  Unfortunately, we are hard pressed to find more than a handful of similar PMs from which we might test the PM’s calibration with the outcomes.  Yet claims are made that PMs are reasonably well-calibrated and “therefore, they are accurate.”

Given the above comments about long-term PMs, we have to ask, when is a PM “well-calibrated”?  Is it when the market closes?  If so, the prediction is useless, because it cannot be acted upon, even though it may be quite accurate.  Is it 30 days before the outcome of a long-term PM?  If so, this is a bit better, but still pretty useless.  Is it near when the market opens and continuously until the market closes?  This would be ideal, but it is highly unlikely to be the case.

Galton’s ox and the missing submarine stories are examples of collective intelligence, not prediction markets, yet they are frequently cited as proof that prediction markets are accurate.

Reliability

In order to be useful in an enterprise setting, prediction markets must reliably provide accurate predictions of future outcomes.  Furthermore, they must be at least as accurate and timely as other traditional forecasting methods, and hopefully, make predictions at a lesser cost.  Here, reliability means consistency.  The same type of prediction market must consistently provide more accurate forecasts than other available means.

In the discussion about long-term markets (above), we found that PMs were very unreliable until close to the time the outcome is revealed.  This brings up a couple of crucial questions.  How far in advance can prediction markets make accurate predictions?  How will we know the point in time when a prediction is “accurate”?

Recall the Susan Boyle Britain’s Got Talent markets.  Why are there wildly different predictions of the same outcome in different prediction markets?  How do we know which market is accurate?  Is it a matter of prediction market efficiency?  If so, how do we know whether a market is efficientRajiv Sethi provides us with an approach to determining which market is more efficient, but not whether the market is sufficiently efficient.  Are there differences in participant information in the two markets?  Is there a lack of diversity in one of the markets?  Evidence of Cascading?  Herding?  Are there inadequate incentives to acquire and reveal information in the markets?  Does sufficient information exist in one or both of the markets?  If not, both markets may be aggregating guesses rather than informed opinions.

Prediction markets are touted as being excellent information aggregation methods, and by all accounts, they probably are very good at this.  It almost seems too obvious to mention, but I will anyway.  In order for the markets to provide accurate, reliable predictions, there must be a sufficient amount of information available to be aggregated.  No one is really looking at this issue, yet it is crucial to success of prediction markets.  This is the issue of information completeness.

 

THE FUTURE OF PREDICTION MARKETS

Where to from here?  Despite the significant unresolved issues, I still believe prediction markets have potential (though not as much as we all once thought).

 

Can PMs ever replace traditional forecasting processes?

Probably not. As discussed, the HP and General Mills prediction markets used individuals involved in the internal forecasting process.  Accordingly, the HP predictions were closer to the internal forecasts than they were to the actual outcome.  At General Mills, both the predictions and the internal forecasts were very close.

The nagging question is, if the internal forecasting processes had not been in place, would the prediction markets have been as accurate as they were?  We may never know, because I doubt there are any companies willing to test this proposition.  My intuition tells me that stand alone prediction markets would be less accurate than internal forecasts as well as PMs in conjunction with internal forecasts.

I’m not arguing that prediction markets are poor aggregators of information.  The reason for the lesser accuracy of stand-alone prediction markets is that there is much less information to aggregate (without the internal processes to search for information).

 

Is there a place for PMs to supplement traditional forecasting methods?

Yes.

Prediction markets involve a relatively small marginal cost.  So, it is relatively painless to implement key prediction markets to supplement traditional forecasting methods.  Some of the benefits are:  the ability to quickly check the internal forecast for significant deviations from the prediction (which can be investigated), more information by incentivizing participants to search for more information, and a reduction of forecasting bias.

The real benefit, in my opinion, is that prediction markets provide a better measurement of uncertainty around the outcome than do traditional forecasting methods.  It does this in the form of a distribution of predictions, which can be seen visually and measured by the standard deviation.  The information can be used to identify the need for further information and can be used in risk management and contingency planning.  In addition, management can measure the reduction of uncertainty over time as new information is revealed or possible sources of uncertainty are removed.

One of the most promising applications is in project management.  Task and project completion forecasts involve the most bias, and prediction markets have the potential to significantly decrease this bias.  While long-term predictions are not particularly useful, short-term ones appear to be reasonably accurate and prediction markets have been shown to quickly aggregate known information.  In managing projects, it is important to obtain very short-term forecasts for task completion, so that corrective action may be taken.  Prediction markets appear to be particularly well suited to this task.

Projects can be separated into tasks along the critical path, and PMs can be put in place to predict completion dates for these tasks.  Because completion dates are continuous variables, coming close to the actual outcome will often be good enough, even if the prediction market is not a perfect predictor.

An interesting avenue of research would be to create a combinatorial prediction market in which all of the critical tasks are linked to the total project completion date.  (See additional comments below).

 

IDEA PAGEANTS

While they are not really prediction markets – they’re more like weighted opinion polls or high-tech suggestion boxes – they are usually counted as being “prediction markets”.  Oddly, these types of information markets make up the majority of “prediction markets” in use.  They also have the greatest growth potential.

Idea pageants generate ideas quickly, at a very low cost.  They are relatively easy to understand and implement.  These applications don’t need a high level of accuracy to be useful – companies can investigate the top 10 ideas vs. needing to know the best one.  Management doesn’t have to delegate all authority to the market.  Weak or impractical ideas are quickly filtered out, but decision-makers are free to investigate all ideas, not just those that have high probabilities of success.

Based on the knowledge that the further away from the outcome, the greater the possible number of events occurring that would affect the outcome, predictions will be inaccurate and/or widely dispersed, until near the time the outcome becomes known.  These intervening events are random, but the likelihoods are not (in most cases).  Another possible application is to create markets, similar to idea markets, except that they would identify possible future events that might affect the outcome that we are trying to predict.  This information, combined with prediction markets to estimate the likelihoods of these events occurring would add useful information to the market predicting the outcome of interest.

For example, we could predict the likelihood of a truckers strike during the third quarter, which could be used to make a better prediction of third quarter revenue (the outcome of another prediction market).  Eventually, it might be possible to link the potential intervening events to the outcome in a combinatorial prediction market.

 

COMBINATORIAL PREDICTION MARKETS

Continuing with the previous example, we might apply Robin Hanson approach.  Much of his work in the area of combinatorial prediction markets focuses on conditional probabilities.  He might run two prediction markets.  The first would predict 3rd Quarter revenue given a truckers strike.  The Second market would predict 3rd Quarter revenue, given no strike.  The difference between the two predictions would be the forecast cost of a trucker strike (in terms of revenue lost).  Robin calls these decision markets, and they form the backbone of his futarchy concept.  Decision markets represent one form of a combinatorial prediction market.

With great fanfare, Crowdcast released their innovative trading platform designed to make trading more intuitive.  Essentially, it is a mechanism to allow traders to bet on user-defined spreads. For example, revenues will fall between $1.2m and $1.4m or $1.85m and $2.12m.  It allows traders to make combination bets for any range they choose.  While I think this innovation has potential, there may be a number of tricky issues regarding the effects of assumptions required to make this platform work.  Still, it is a promising development.

Combinatorial prediction markets make an awful lot of sense, if they can be practically implemented.  The above types of combinatorial prediction markets are relatively easy to implement. Perhaps the most difficult to design and implement is the type of combinatorial prediction market developed by David Pennock and his group.  While it is used for sports betting (play market), the concepts may be applied to enterprise prediction markets.

Predictalot provides a working example of a fairly complex combinatorial prediction market, which involves combinatorial betting on the NCAA March Madness and the World Cup.  For example, if Duke is predicted to win the championship, this automatically increases the likelihood of Duke winning in all of the rounds leading up to the final.  Also, if Duke is predicted to win in the first round, this increases the likelihood of Duke winning the championship.  This platform allows bettors to bet only on those things that they have knowledge.

The same combinatorial prediction market concept could be applied to project management.  It is difficult to predict the completion date of a complex project (Predictalot Champion).  Some participants will have specialized knowledge of the task (Predictalot Team) they are working on, but little knowledge of other tasks along the critical path.  A combinatorial market would allow participants to trade on those outcomes in which they have knowledge.  The market structure will implicitly incorporate the predictions of tasks into the prediction of the overall project completion date.  Similarly, the prediction of the overall project completion date will influence the predictions of the various tasks along the critical path.

This is an important development, because traders may have specific or local knowledge about one or more components of an outcome, though they  have little knowledge about the eventual outcome itself.  A single prediction market for the project outcome may fail, because there is not enough information about the outcome to generate an accurate prediction.

While it is true that a project outcome could be split into several prediction markets to predict the required tasks.  The problem is that each prediction market may be too thinly traded to generate an accurate prediction.  Also, there is no automatic inclusion of the task predictions in the project outcome prediction.  A combinatorial prediction market has the potential to solve this problem and generate better predictions of the outcome.

Looking at a more generalized application, many outcomes are dependent (or conditional) on other events, actions or conditions.  In order to better predict an outcome, we would like to know the factors that will have an effect on the outcome (discussed in the Idea Pageant section, above), and we would like to know how likely these factors are to arise.  We could set up a series of separate prediction markets to predict the likely effects of each of the factors that will affect the outcome.  The results of these markets would be available to the traders predicting the outcome of interest. While this is better than existing prediction models, it’s not ideal.   Alternatively, the factors can be combined with the outcome in a combinatorial prediction market, allowing the likely effects of the factors to be automatically incorporated in the outcome prediction.

Certainly food for thought, and it is the reason that I selected Predictalot as the most important development in the area of prediction markets for 2010.

 

YOUR REWARD FOR READING THIS FAR!

No discussion of the future of prediction markets would be complete without commenting on the most comprehensive system of prediction markets ever conceived.  Of course, I’m talking about Futarchy , one of the New York Times buzzwords for 2008.  Sadly, for Robin Hanson, its creator, Futarchy has failed to take hold, anywhere.  If the concept had any merit, Surely, at the very least, it would have been implemented in some small, South Pacific island nation by now (it hasn’t happened).  About a year ago, I commented on the Future of Futarchy, where I dismissed the concept.  Despite this, I see that in December 2010 Robin Hanson is still trying to promote the idea!  While I disagree with Futarchy, I do heartily endorse his use of decision markets.

If there were a long-term prediction market on whether Futarchy would be implemented anywhere in the world in Robin Hanson’s lifetime, the price would be flat-lining on $0.00.  Occasionally, the market price it would jump up to $0.50 (reflecting Robin’s trades), only to be smacked down by Mencius Moldbug’s trades.  I suspect there will be a smirk on Robin’s face each time the market corrects his attempt to manipulate the market.

This market illustrates another key aspect of prediction markets.  The outcome must be clearly defined.  In this market, “Robin Hanson’s lifetime” is defined to mean his lifetime in his current body.  It’s no secret that Robin wishes to have his head lopped off (when he dies, not before) and cryogenically frozen, to be thawed at some time in the future when bodies will be more “durable” or when brains can be downloaded into some robot-like “life” form.  No word, yet, about whether the good professor’s wife will be similarly decapitated.  Without this clear definition of the outcome, we wouldn’t be able to collect our bets, and it is likely that if brain cloning is possible, so is Futarchy!

So, my forecast is that Futarchy will never come to fruition and it should be cryogenically frozen now, too.

 

FINAL THOUGHTS

It has been quite an undertaking putting this paper together.  Undoubtedly, I have missed a few key items, for which I apologize.  As always, your comments are appreciated.  While there have been few new developments, there are still many tasks to be completed, if enterprise prediction markets are to gain traction in the market.  In writing this paper, it became evident how most of the major issues remain unresolved.  I hope that some of the researchers will get over their disillusionment and ascend the slope of enlightenment!  If so, I promise to get out of my own trough of disillusionment with respect to prediction markets!

Today, it was announced that David Pennock, and his team of researchers at Yahoo, has been given the prestigious Futurology Research & Astrology Foundation award for Best Prediction Market Development of the Year for 2010.  Their work in developing and launching Predictalot is a ground-breaking achievement in the field of collective intelligence.  While Predictalot has only been used to predict sports tournament winners (NCAA basketball and the World Cup) thus far, the combinatorial prediction market and related software will provide an important new platform for more accurate enterprise prediction markets in the future.  Other team members included:  Mani Abrol, Janet George, Tom Gulik, Mridul Muralidharan, Sudar Muthu, Navneet Nair, Abe Othman, Daniel Reeves and Pras Sarkar.

Commenting after the awards ceremony, Mr. Pennock said that, “it is a great honour and I’m proud of the entire team that brought this important concept to fruition.  Frankly, it’s a bit ironic – we just didn’t see this award coming at all!”

Posted by: Paul Hewitt | March 23, 2010

Paul Krugman Makes a Boo Boo

In Paul Krugman’s blog entry, Done, at 4:39pm (EDT) on March 21, 2010, he commented:  “OK, nothing is sure in this world. Intrade is still giving Obamacare a 2.2% chance of failing, …”

He was talking about the InTrade market on Health Care Reform.  In theory, the market price in such a derivative market should equal the expectation of the underlying event coming true.  However, Paul Krugman (and many others) forgot one of the most basic assumptions of the market model!  Transaction costs.

When the market price is over 95, InTrade charges a transaction fee of 3 cents per contract (real money).  While market prices are quoted in percentages, the payoff for a winning ticket is $10 (real money).  Therefore, the transaction fee is 0.3% of the winning payoff.  In addition, InTrade charges 10 cents per contract on expiry (if you “win”).  That’s another 1.0%. 

So, when the market was quoting 97.8% likelihood of the HCR bill passing before June 2010, this didn’t really mean that there was a 2.2% chance of the bill not passing.  A winning ticket would be subject to 1.3% transaction fees.  The real likelihood of failure was 0.9% – approximating the uncertainty that Obama would be “hit by a bus” before signing the bill into law. 

No rational investor would wish to purchase a share for more than 98.7, given the transaction costs.  In a sense, this is the market’s “100%”.  Interestingly, at 1:49pm GMT today (March 23), there are 695 bids at 99.1 and 413 asks at 99.2.  Clearly, some traders are not subject to the full transaction fees at InTrade.  More about that here.

I love Paul Krugman, but this time, he made a silly little mistake.  Of course, all of this assumes the market price is accurate in the first place!

Posted by: Paul Hewitt | March 22, 2010

Health Care Reform Explained

American Health CAre Presentation

I watched part of the U.S. Health Care Reform bill passage on Sunday, March 21, 2010.  Combined with the political commentary, it was pretty clear that there is a lot of misinformation.  A particularly extreme right-wing viewpoint (along with moronic comments), can be found at the Cafe Hayek blog.  I’ll warn you, before you click on the link (if you really have to), that most of the comments are remarkably irrational.  Don’t even think of trying to debate with these oddballs (I’m being charitable).

For a much more balanced and logical explanation of the American Health Care Reform, click here for an excellent (and easy to understand) PowerPoint presentation.  This is an award winning presentation by Dan Roam and C. Anthony Jones, M.D.  Enjoy!

Posted by: Paul Hewitt | March 14, 2010

Truth in Advertising – Meet Prediction Markets

Most published papers on prediction markets (there aren’t many) paint a wildly rosy picture of their accuracy.  Perhaps it is because many of these papers are written by researchers having affiliations with prediction market vendors.

Robin Hanson is Chief Scientist at Consensus Point.  I like his ideas about combinatorial markets and market scoring rules, but I think he over-sells the accuracy and usefulness of prediction markets.  His concept of Futarchy is an extreme example of this. Robin loves to cite HP’s prediction markets in his presentations.  Emile Servan-Schreiber (Newsfutures) is mostly level-headed but still a big fan of prediction markets. Crowdcast’s Chief Scientist is Leslie Fine; their Board of Advisors includes Justin Wolfers and Andrew McAfee.  Leslie seems to have a more practical understanding than most, as evidenced by this response to the types of questions that Crowdcast’s prediction markets can answer well: “Questions whose outcomes will be knowable in three months to a year and where there is very dispersed knowledge in your organization tend to do well.”  She gets it that prediction markets aren’t all things to all people.

An Honest Paper

To some extent, all of the researchers over-sell the accuracy and the range of useful questions that may be answered by prediction markets. So, it is refreshing to find an honest article written about the accuracy of prediction markets.  Not too long ago, Sharad Goel, Daniel M. Reeves, Duncan J. Watts, David M. Pennock published Prediction Without Markets.  They compared prediction markets with alternative forecasting methods for three types of public prediction markets: Football and baseball games and movie box office receipts.

They found that prediction markets were just slightly more accurate than alternative methods of forecasting.  As an added bonus, these researchers considered the issue that prediction market accuracy should be judged by its effect on decision-making.  So few researchers have done this!  A very small improvement in accuracy is not considered material (significant), if it doesn’t change the decision that is made with the forecast.  It’s a well-established concept in public auditing, when deciding whether an error is significant and requires correction.  I have discussed this concept before

While they acknowledge that prediction markets may have a distinct advantage over other forecasting methods, in that they can be updated much more quickly and at little additional cost, they rightly suggest that most business applications have little need for instantaneously updated forecasts.  Overall, they conclude that “simple methods of aggregating individual forecasts often work reasonably well relative to more complex combinations (of methods).”

For Extra Credit

When we compare things, it is usually so that we can select the best option.  In the case of prediction markets it is not a safe assumption that the choices are mutually exclusive.  Especially in enterprise applications, prediction markets are heavily dependent on the alternative information aggregation methods as a primary source of market information.  Of course, there are other sources of information and the markets are expected to minimize bias to generate more accurate predictions.  

In the infamous HP prediction markets, the forecasts were eerily close to the company’s internal forecasts.  It wasn’t difficult to see why.  The same people were involved with both predictions!  The General Mills prediction markets showed similar correlations, even when only some of the participants were common to both methods. The implication of these cases is that you cannot replace the existing forecasting system with a prediction market and expect the results to be as accurate.  The two (or more) methods work together. 

Not only do most researchers (Pennock et al, excepted) recommend adoption of prediction markets, based on insignificant improvements in accuracy, they fail to consider the effect (or lack thereof) on decision-making in their cost/benefit analysis.  Even if some do the cost/benefit math, they don’t do it right.   

Where a prediction market is dependent on other forecasting methods, the marginal cost is the total cost of running the market. There is no credit for eliminating the cost of alternative forecasting methods.  The marginal benefit is that expected by choosing a different course of action than the one that would have been taken based on a less accurate prediction.  That is, a slight improvement in prediction accuracy that does not change the course of action has no marginal benefit. 

Using this approach, a prediction market that is only “slightly” more accurate, than those from alternative forecasting approaches, is just not good enough.  So far, there is little, if any, evidence that prediction markets are anything more than “slightly” better than existing methods.  Still, most of our respected researchers continue to tout prediction markets.  Even a technology guru like Andrew McAfee doesn’t get it , in this little PR piece he wrote, shortly after joining Crowdcast’s Board of Advisors.

Is it a big snow job or just wishful thinking?

Posted by: Paul Hewitt | March 13, 2010

Paralympic Games 2010

No predictions today, just a note about a truly spectacular event that took place last night – the Paralympic Games Opening Ceremony 2010.  The link will take you to the site with the complete replay of the ceremonies.

A few weeks ago, I watched the Olympic Games Ceremonies and was quite unimpressed.  Parts of them were downright embarrassing, and I’m not talking about the torch that wouldn’t rise.  The world probably thinks Canadians are a bunch of tap dancing, tatooed fiddlers or really bad comedians (not true on both counts).  The Opening Ceremony for the Paralympic Games should change all of that.  I couldn’t be more proud to be a Canadian than I am right now.

In stark contrast to the Olympic ceremonies, this was a high-energy, happy event with a cast of hundreds of smiling children.  There were special tributes to Rick Hansen (Man in Motion) and Terry Fox (Marathon of Hope).  Don’t miss Luca “Lazylegz” Patuelli’s spectacular breakdance performance.  The music was great, the speeches heartfelt and inspiring, and the entire evening was a beautiful welcome to these amazing athletes.

Enjoy the show!

Posted by: Paul Hewitt | March 8, 2010

Oscars Prediction Markets Get it Right

My wife follows movies a lot closer than I do.  She thinks she can pick the Academy Award winners more accurately than I can.  I took up the challenge, knowing that I could visit a few prediction market sites, like hubdub and hsx.  I’m writing this portion of the blog before the Oscars take place.  I also wrote the title before the results were in. 

My picks were the front-runners in each award category, based on the market predictions on Saturday morning.  I should note that this took me all of about five minutes (compared with my wife’s hours and hours of reading and actually watching all of the movies). 

Now for the results…

I’m happy to report that the prediction markets for this year’s Oscars were 100% accurate!  I wasn’t very surprised, really, but my wife is still very skeptical about prediction markets.  How can this be?

The HSX prediction markets were not very good at picking the winners of the Best Screenplay awards.  Inglourious Basterds was “supposed” to win (52.24%), but The Hurt Locker (25.72%) did win.  Another oops.  Up in the Air was supposed to win (63.16%), but Precious did win (and it was only expected to win 7.5% of the time)!  At this point, my wife is gloating about how crappy these prediction markets are at picking winners.  While they were handing out a bunch of lesser awards, I tried to explain to her that the prediction markets were still perfectly accurate, even though a long shot actually won. 

I explained to her the concept of calibration, and how the markets were really accurate, because they were not picking winners with a 100% certainty.  In fact, the markets’ failures were validation that they were, in fact, accurate.  She thinks I’m an idiot (about prediction markets).

Up won for Best Animated Feature.  It was expected to win 98 out of 100 times (if it were to be nominated 100 times, that is).  Christoph Waltz won with 87% for Best Supporting Actor.  In these cases, the markets were both “accurate” and making accurate, useful predictions.  My wife’s not impressed.  Everyone picked those categories, apparently.  There were no other surprises in the Oscar Awards. 

Essentially, when a prediction market picks Mo’Nique to win the Best Supporting Actress Award with an 86% likelihood, she would be expected to win the award 86 times out of 100 Oscar ceremonies.  Of course, it isn’t possible to nominate her for the same role (along with the other nominees) every year for 100 years, to test the calibration of the market.  However, if the market were well-calibrated, Mo’Nique would lose the Oscar 14 times out of 100.  The market will still be considered “accurate” but fail to predict the winner when she loses.  Expressed another way, when Mo’Nique loses, it helps validate the accuracy of the market (so long as she loses only 14 times in 100). 

Unfortunately, we don’t know which 14 of the 100 trials will be losses.  Consequently, we are going to be disappointed when the losses occur.  This is why my wife is skeptical about prediction markets.  In a horse race, like the Oscars, coming close to winning means nothing.  Apparently, coming close means you’re an idiot. 

We tied in our correct picks.  However, I “won”, because I made my picks in five minutes and used the time I saved to work on my golf game.

As a side note, the predictions between hsx and hubdub were consistent.  Virtually all similar prediction markets generated expected probabilities within 5%.  Not bad, I suppose.

Though we can’t prove it, I’ll stand by my title and state that the prediction markets were 100% accurate.  But I’ll qualify this by saying they were not very useful.  If I can’t convince my wife that prediction markets are useful (she’s a corporate executive), I don’t see much of a future for enterprise prediction markets – at least not for the “horse race” types of markets.

Posted by: Paul Hewitt | January 4, 2010

The Future of Futarchy

I’ve been meaning to write this post for quite some time.  While it is an interesting concept, on paper, I’m afraid that the only place you are likely to see futarchy implemented is in a future Star Trek movie (no offense to bona fide Trekkies intended).  And, I’m sure the mythical planet, ”Futarchy”, is doomed and Spock will show no mercy towards its inhabitants.  I apologize if I got any Star Trek details wrong – I only watched a couple of episodes when I was a kid.  I only refer to Star Trek to show that this idea of futarchy is “out there” – really out there, actually.

I think Robin Hanson agrees, at least partially, with this assessment.  In his paper, “Shall We Vote on Values, But Bet on Beliefs?”, he explains that rather than use a scientific approach to assessing the viability of futarchy, he uses an “engineering” approach, which merely seeks to determine whether a concept is deserving of further study, prototype development, etc…  Interested readers should probably read Robin’s paper before proceeding.  I will explain the basic idea and assumptions behind futarchy, but many of the details will not be repeated, here.

While this should be a very short read, it isn’t.  Robin Hanson used 20+ pages to explain why futarchy is “plausible” (and continues to be hopeful of its acceptance), and Mencius Moldbug used 7,400+ words to conclude that futarchy is “retarded”.  Many more words were wasted in blog comments.  This started out as a quick post to dispose of futarchy once and for all, but one fault leads to another, and it can be hard to stop.  Anyway, read as far as you like.  The conclusion doesn’t change.

The structure of this post is as follows:

  1. What is Futarchy?
  2. Discussion of the Assumptions that support considering Futarchy
  3. Decision Market Mechanics
  4. Three Scenarios
  5. The National Welfare Measure
  6. Design Issues
  7. Other Considerations
  8. Conclusion

 What is Futarchy?

Futarchy is Robin Hanson’s term for a form of government where decision markets are employed to forecast the likely effect of a proposed policy on some measure of overall welfare, such as GDP+.  If a decision market indicates that a proposed policy is likely to generate a positive welfare benefit (relative to the status quo), the policy is automatically implemented.  Actually, Robin uses the word “immediately” to determine when the proposed policy is to be adopted.  A careful reading of the papers indicates that “immediately” really means the adoption of the policy is “hard-wired” to, or directly follows from, the decision market’s forecast.  Citizens vote for elected representatives, who administer the definition and annual calculation of the welfare measure.  Using decision markets, citizens (speculators) place bets on the likely effects of proposed policies.  In this sense, “We Vote on Values (what to do), but Bet on Beliefs (how to do it)”.

The Assumptions

According to Robin, there are three assumptions that support the concept of futarchy.  Here they are, with a brief discussion of each.

1.     Democracies fail by not aggregating enough available information

Basically, Robin states that governments make bad decisions, largely because they have to appease ignorant voters.  In a democracy, every citizen has one vote, but not all citizens are equal, at least not in terms of the validity of their opinions.  He argues that relevant information exists about whether proposed policies will achieve the desired objectives, but that it is not being aggregated accurately, so that politicians may make more correct choices.  If the politicians knew which policies were unlikely to succeed, fewer of them would be adopted. 

While it is true that the majority of the public are poorly informed and lack incentives to become informed, there is a subset of informed “elites” that would be able to make trades in these speculative markets to aggregate accurate, relevant information.  Robin cites a number of studies that lead to the following statement:

“The straightforward interpretation of this data is that experts and those who are better educated actually know more than the general public about which policies are better.”

In making his case, he comes to the conclusion that the general public is not only ignorant, but “fundamentally non-truth seeking” as well.  This presents a problem in developing good public policy, unless the uninformed, irrational “chimps” allow public policy to be determined by the informed, rational elites, “such as perhaps academic advisors.”  He cites a few examples of “contrarian” public opinions, such as “52% of Americans believe astrology has some scientific truth.”  I’m almost convinced that informed traders will have a better chance of aggregating more accurate information for policy decisions, but it doesn’t sound very democratic.  It sounds more like marketing of professors’ services and turning “chimps” into “chumps”. 

On balance, I’m going to give this one to Robin.  Many (perhaps most) of society’s problems can be traced to a lack of accurate, timely information.

2.     Speculative markets are the best known method of aggregating available information

This is where Robin does his usual cut-and-paste job, briefly touching on a variety of prediction (and betting) market “success” stories over the years (none recent, by the way).  There are the following examples that we have all seen before (many, many times):  racetrack odds are better than experts, OJ commodity futures improve government weather forecasts, Oscar markets beat columnist forecasts, gas demand markets beat gas demand experts, US presidential betting markets beat opinion polls about 75% of the time, and the granddaddy of them all, prediction markets beat HP official forecasts 6 times out of 8.

The HP results were “better” by an insignificant amount and were heavily dependent on the official forecasting process (read my analysis, here).  The Oscar markets would not have beaten a poll of the actual Oscar voters.  Isolated “successes” in disparate types of markets does not imply that public policy decision markets will be equally successful.  Yet we see, time and time again, this conclusion being reached on the basis of a very small number of diverse information aggregation field studies.

In a recent op-ed article, Robin explains that speculative markets are the way to go, because they are “an exemplary way to collect and summarize information, at least when we eventually learn the outcome.”  More proof that the more often you state something (anything) the more likely it is to be believed (even if it is beyond belief)!  Note carefully, the actual outcome must become known, for the markets to have any chance of aggregating information accurately.  I’ll have more to say about this, below, as it is not as straight-forward as Professor Hanson would have us believe.

One criticism of Robin’s approach to speculative markets is that he seems to believe that a small number of well-informed traders will always counteract the irrational trades of the uninformed.  In the area of public policy, it is quite likely that some issues will have a very, very small number of “informed” traders relative to the “chimps”.  To me, it is not clear that the informed traders will overpower the chimps.   I will have more to say about this later, but for now, I just want to make the point that speculative markets can work well, but not always.  Not only that, but no one, not even Robin Hanson, seems to care much why some markets appear to work while others clearly do not. 

Reliance on this assumption is very shaky and threatens the entire institution of futarchy.

3.     It is easy to identify rich, happy nations from poor, miserable ones, after the fact.

While agreeing that it may not be the best measure, Robin suggests that GDP may be a sufficient metric for measuring policy recommendations, at least initially.  The measurement (or metric) could be refined to take into account other factors that contribute towards national “welfare” (GDP+).  Policies that are expected to improve national welfare should be implemented.  Subsequently, the measurement of national welfare will identify whether policy decisions have been good.  The logic in favour of futarchy is as follows: 

If a statistical analysis indicates that a policy is likely to have a beneficial effect on national welfare, a speculative market would be expected to indicate the same (unless there were other, valid, reasons for this not to be so).  If it is advisable to consider a policy on the basis of statistical analysis (current practice), it should be equally advisable to consider it on the basis of a speculative market (futarchy).

In a very broad, simplistic sense, this assumption may be true-enough to proceed, though it is an open question whether this is the appropriate metric to be used to assess all (or even most) policy proposals.

Decision Market Mechanics

The mechanics of decision markets are not as simple as Robin would have us believe.  Essentially, these markets are attempting to estimate a form of net present value of the expected welfare measure (GDP+) where the policy is adopted and where it is not (status quo).  The difference between the two estimates is considered the expected benefit of adopting the particular policy.  Given the very long time horizon for most policies that might be considered, it is clear that there is  tremendous uncertainty attached to the calculations. It is almost inconceivable that such markets could provide accurate forecasts before any actual policy effects could be identified.

In order to provide the necessary incentives for trading, the market must be capable of settlement.  This is a required characteristic of all prediction markets.  i.e. the actual outcome must be revealed at some point in the future.  Informed traders generate profits by buying low and selling high while the market is open or by buying at a lower price than that in which the market is ultimately settled.  The smartest traders are those that identify and trade on the largest difference between the current expectation and the eventual outcome.  They also know this information before the less-informed traders.  If the market cannot be settled for 20 years or longer, even using some form of indexed security for the payoff, I would argue that the settlement payoff loses its incentive for all but the most patient traders.  We should note that Robin is a bit vague as to the settlement of these decision markets.  We do know that whichever market contains the condition that is not true will be cancelled (i.e. if the policy is  approved, the status quo market is cancelled).  He discusses the possibility of calculating some welfare measure over a 20 year period, using various weights and discounts, and an implied assumption about future values for the infinite time period after 20 years hence.  So, settlement is a long, long way off in the future.

Such markets must continue to trade until settlement.  If not, the very long holding period for almost every decision market, would mean that active traders would be limited in how many markets they could participate.  If they continued to invest in markets before they received any “winnings”, they would, presumably, run out of investment funds.  Most importantly, we would not see the “cream rising to the top”  That is, the best predictors becoming wealthier, relative to the chimps, until a number of markets were to settle, 20 years (or more) hence.  That is an awfully long time to identify the “experts” and give their trades more weight  (in subsequent markets).  It also assumes that they will still be alive and willing to trade.  Traders will tend to be young ones, too, in order to enjoy the benefits of their smart trades.  Perhaps Robin had Associate Professors in mind for his model “elites”. It assumes, too, that they will be equally adept forecasting policy effects for the issues that will arise 20 years hence.

In the op-ed article cited above, Robin clarifies the settlement problem by allowing trading to continue in the market that is not cancelled, which would allow some traders to cash out, without waiting for the final outcome (and payout).  He indicates that, through such trading, the market will continue to improve the prediction (or forecast).  But, who cares?  The policy decision will already have been made.  Any continued trading in the market and the very long wait until the market settles merely determine the final rewards for the better forecasters and the penalties extracted from the dolts.  There are two reasons why this would be a necessary feature of futarchy.  First, assuming the informed traders are able to cash out before the market settles, this will return liquidity to the marketplace for all policy decision markets.  Of course, there will have to be a sufficient number of chimp-chumps available to facilitate such trading.  Second, the futarchy process requires informed traders to distinguish themselves from the uninformed.  Allowing them to do so, in fewer than 20 or so years that a typical market may span until settlement, is the only practical method.

Three Scenarios

Realizing the this concept of futarchy is a bit of a stretch, Robin proposes a gradual approach to adoption, starting with corporate governance, moving on to agency decision-making and finally national governance.  I only make mention of these, to see whether we can dismiss this whole concept at an early stage.

Corporate Governance

Robin describes how corporations are like small democratic governments.  He considers a simple speculative market involving conditional “dump-the-CEO” and “keep-the-CEO” stocks.  If the “dump-the-CEO” price was “clearly” higher than the “keep-the-CEO” price for “90% of the last week of a quarter”, the CEO would be dumped for the next quarter.  It is not hard to imagine that once such a guinea pig corporation experienced one CEO dumping, many more would follow.  The success of a corporation is not (and should not) be dependent on quarterly results.   Such an institution would require a steady stream of increasingly able CEO candidates (who would be able to hit the ground running, on a moment‘s notice).   A continuous learning curve, constant change and massive severance costs would threaten the very existence of any corporation stupid enough to consider such “decision-making”.  Truly shocking in its naivety.

Thankfully, Robin Hanson appears to be well-ensconced in academia, safeguarding corporate America from the havoc this nonsense would create.

The only reason I note this scenario, at all, is that the next level involves agency governance, which would follow “after some successful examples of using speculative markets in corporate governance”.  We should be able to quit right now, but there are 20 more pages of Robin’s paper to plough through, and so, we press on.

Agency Governance

While this paper was written before the current economic recession took hold, Robin cites monetary policy as a prime candidate for using speculative markets to set policy.  Apparently, most agree on the variables to be manipulated to achieve a good outcome, and they agree on the statistics that may be used to determine whether a quality policy outcome has been achieved after the fact.

To counter this proposed application, one need only consider the current (sad) state of monetary economic intelligence among the “elite”.  If a monetary expert, like Alan Greenspan, can be so wrong for so long, what chance do the “unthinking masses” have?

Somehow, Robin believes that all we would have to do is make economic information available to the public, including speculators, and a speculative market would determine which expert to believe, setting an accurate market price and the most appropriate interest rate policy.  Sheer Madness.  It is the equivalent of handing out hammers and nails to a crowd of chimps and expecting them to build a house.

But we continue on… to national governance. Once enough people are living in these chimp houses and driving around in chimpmobiles, the case will have been made for hard-wiring speculative markets to the policy enactment process.

 National Governance (Futarchy)

Elected representatives define a formal measure of “national welfare”, GDP+, and markets would continuously forecast this metric.  As policy proposals arise, new prediction markets would be implemented to forecast GDP+ conditional on the new policy being enacted and another conditional on the status quo.  Once it has been clearly shown that there would be a forecasted improvement in GDP+ (national welfare) under the proposed policy, it would be immediately implemented.

There are so many ways to be scared by this, it is hard to know where to begin. Few, if any, policies are adopted on the basis of a single metric or desired outcome, yet Robin Hanson is proposing that we do just that.  While it is true that he makes provision for the metric to be a composite of a variety of metrics, this doesn’t solve the problem.  Elected officials are in charge of defining the metric and its composition.  One can only imagine the intense lobbying efforts to influence the definition of GDP+ which could hinder the enactment of beneficial policies or promote harmful policies that should not be passed into law.

Invariably public policies have a variety of objectives.  Selecting one metric (even a composite one) to measure the success of all policy proposals is naïve and simplistic in the extreme.  The effect of any particular policy on the metric will not be observable.  The only way to observe the actual effect of a change in policy on the metric, is to hold all other things constant, which is, of course, impossible to do. 

Robin counters that it is only the difference between the two markets that matters.  However, once a policy has been approved, based on the difference between the status quo and the policy adoption decision markets, the status quo market is cancelled.  The policy adoption decision market always has been, and always will be, attempting to forecast total national welfare measure (GDP+) assuming the policy is enacted, which is based on 20 or more years’ of future statistics.  In those intervening 20 years or so, many new policies will be enacted, and every one of them will be expected to improve the national welfare.  What are the odds of such a prediction market being able to accurately (and consistently) predict the actual national welfare that will be determined over a 20 year period? 

I’m sure Robin will counter with the fact that prior to arriving at a policy decision, both decision markets were subject to the same uncertainty about the national welfare measure.  Of course they were.  This only means that both markets must have been equally accurate prior to the policy decision being triggered.  What are the odds?  How could we prove their accuracy?

We don’t have very many long-term prediction markets that can be tested.  David Pennock did look into the issue of calibration of long-term prediction markets on ideosphere.com, here, finding that they were, indeed, calibrated.  However, I commented on Midas Oracle about the problems with his conclusion as it relates to decision-making.  To summarize, David Pennock’s analysis looked at the calibration of long-term markets 30 days prior to settlement.  By that time, almost all of the uncertainty had been eliminated from the prediction.  We would be more surprised if the markets had not been well-calibrated.  Unfortunately, those same prediction markets were consistently inaccurate for the vast majority of the time they were actively traded.  They only became “accurate” as they neared settlement, when the actual outcome was about to be revealed. 

Unless prediction markets can be understood and developed to the extent that they are capable of consistently providing accurate predictions well in advance of the actual outcome, they will not be of any use, at all, for decision-making.  If the ideosphere.com markets are any indication (and they are), it appears that such speculative markets are not very good at predicting outcomes in the face of uncertainty.  Long-term policy benefits are subject to very high levels of uncertainty.  Consequently, the prospect of relying on these markets to guide policy decisions is dangerous, to say the least.  Chimps, even elected ones, might make fewer mistakes.

The National Welfare Measure

“A very simple definition of GDP+ would be a few percent annually discounted average (over the indefinite future) of the square root of GDP each period. A not quite as simple GDP+ definition would substitute a sum over various subgroups of the square root of a GDP assigned to that subgroup. Subgroups might be defined geographically, ethnically, and by age and income. (Varying the group weights might induce various types of affirmative action or discrimination policies.) A more complex GDP+ could include measures of lifespan, leisure, environmental quality, cultural prowess, and happiness.”

This is Robin Hanson’s description of the national welfare measure that would ultimately be used to assess whether the policies adopted were “good”.  In the design issues section of his paper, he discusses the possibility of basing the calculation on a 20 year period of national welfare figures. 

This is a lovely intellectual exercise Robin has embarked upon.  The vast majority of the individuals that Robin believes would take part in these speculative markets will not have a clue as to how to forecast GDP+, even in the very simple case.  Many will be perplexed as to how to discount future GDP+ figures.  The vast majority will be unable to calculate a square root of anything.   The intermediate complexity definition involves breaking down parts of the metric into sub-groups and applying weights. We’re now down to a wee fraction of the public that might be considered “expert”-enough to make a considered forecast.  But Robin’s not finished, it could be even more complex, involving environmental quality, lifespan, leisure and a host of other highly subjective factors.  Even the best actuaries will have difficulty here.  Continuing, there is no turning back from globalization, so any definition must take into account the effects of policy changes on foreigners (and other countries’ policy consequences to us).  Finally, no country stands still in time.  Demographic changes will have to be built into the metric definition.  The meek shall inherit the earth, but only if they are fully accredited actuaries!

We can’t be too hard on Professor Hanson, after all, it is a noble cause.  It’s just that, as I noted at the beginning, it belongs more in a Star Trek episode than it does an academic paper.  It’s just so out there.

Design Issues

In this portion of the paper, Robin Hanson outlines 33 design issues that might prevent the new institution, called futarchy, from operating successfully.  Some appear to be relatively minor concerns, given the discussion points raised so far, so I will focus on those that appear most crucial.  Note that Robin phrases the issues in terms of objections to futarchy.

The Rich Would Get More Influence 

Should the rich be able to undermine the accuracy of the prediction markets, Robin proposes to tax them more (a market distortion) or limit how much each person can trade in a market (another distortion).  Robin thinks that the market forces will see to it that the rich do not have as much influence as they have now, because they will not have proportionately more or better information than the speculators.  Robin’s belief in market forces is unwavering.  As we shall see later, this is a very naïve view.

One Profits Little by Supporting Unlikely Proposals

Here, Robin considers the case where you think you have a strong proposal, but few others agree, holding down the welfare measure such that the policy is never adopted.  It seems unfair that you never get rewarded for your good policy, and they are never penalized for “your being right.” 

In this case, Robin suggests (and he is probably correct) that all political systems suffer from this problem.  Consequently, it may be possible to get the policy implemented on a smaller, local scale and keep trying to convince others that the larger proposal has merit.  One can only wonder as to who might possess the resources to embark on this course of action.  As we will see, later, there is a large cost of proposing a policy initiative. 

OR… could it be that you are wrong and deserve not to have the policy adopted?  OR…  could it be that the uninformed or the manipulators are able to set the market price with their “incorrect” information?  Robin doesn’t believe it is possible for manipulators (or uninformed “noise” traders) to “game” speculative markets, so it can’t be the latter possibility.  In fact, he goes so far as to say that manipulators make the market more accurate.  Maybe the market is working properly after all by preventing you from “being right.”  Maybe you’re not “right”.  OR… maybe manipulators can game these markets.  I think they can, as explained here, here, here and here.  These references apply to several points that follow regarding manipulation of speculative markets.

Some Markets May be too Thin

Robin considers that some markets may be too thinly traded to arrive at accurate estimates, making it possible for a few traders to push the market to favor a bad proposal.  By assuming that pro and con traders are similarly funded, each will try to influence the market, eliminating the thin market condition.  Alternatively, he assumes that the speculators would find out that one side was willing to manipulate the market and make trades to counteract the manipulation.

As I noted in, my post these are highly unlikely assumptions.

One Rich Fool Could Do Great Damage

Here, Robin considers the case where Bill Gates might try to manipulate the market.  If speculators knew which way Bill Gates was trying to move the market, they could easily counteract his trades, as it is assumed that, collectively, they have much more power than he.  Even Robin agrees that it is more likely that the speculators would allow the price to be pushed somewhat by Mr. Gates, because they would assume that Bill knows something that they do not.

People Could Buy Policy Via Trades

Similar to the “Rich Fool” situation, Robin claims that someone could not buy a policy by making the “right” trades, because other traders will only let prices move when they suspect that this new trader has new (accurate) information.  Robin states that if the other traders, with deep pockets, are able to clearly observe a particular person is trying to manipulate the market, they will not allow the price to change.  Failing to possess such oracle-like market knowledge, the other traders need only know the total quantity and direction of the noise trades in order to make their corrective trades.  Even if the other traders do not know the direction and strength of the manipulation and they are unsure as to whether the manipulator has relevant information, the manipulator’s trades will merely add a bit of noise to the market price.  The sheer weight of the other, informed, traders will nullify the effects of the manipulator’s trade. 

I refer to my posts on manipulative trading, above. 

Corrupting the Welfare Measurement Metric

It is possible that the measurement of the metric that is being forecast could be corrupted to influence the policy decision.  This can be counteracted by having multiple estimates of the metric and using the median estimate as the official one.  I agree, except that, just as we have auditors attest corporate financial statements, we will need appropriately trained, independent, ”auditors” to ensure the accuracy of the national welfare measure.

Welfare Metric Definition

The welfare metric must be defined independently from the policy process.  It is a simplified summary of the values voted upon by the electorate.  Government representatives could improperly influence the definition of the welfare measure.  Robin raises the issue in terms of manipulation designed to support a specific policy proposal. 

In addition, there is likely to be substantial lobbying efforts directed at components of the welfare measurement that are detrimental to powerful interest groups.  For example, large carbon emitters and polluters would seek to minimize the impact of their negative externalities on the welfare measurement, which would lessen the likelihood of punitive legislation coming into force.  If we think lobbying is a problem now, just wait.

Defining When a Market “Clearly” Estimates

Basically, this means determining when the market becomes accurate.  Essentially, Robin considers the need for taking a conservative approach, which would require a minimum of one year of a consistently “clear” price differential, followed by a one or two week (continued) price difference for policy approval to become effective.  It is a good idea to make sure that the market consistently indicates a policy will be beneficial before implementing it.  One major problem is that long-term prediction markets are notoriously inaccurate until shortly before the outcome is revealed (as discussed above).  Do we really want to take chances in setting public policy, based on long-term prediction markets that are completely unproven and most likely inaccurate at the time the decision is made?

Institutional Costs

It is costly to evaluate proposals, so there must be a framework to limit the flow of new proposals.  Robin suggests a fee to be paid to have a proposal considered (which would be refunded or rewarded if the proposal is adopted).  The fee might be set at $10 million (or $10,000), but could be reduced by a subsequent policy change proposal.

Interesting that Robin wants trading input from the public, but most assuredly wishes to exclude them from the proposal process.  Only the rich, corporations and special interest groups will have deep enough pockets to initiate proposals.  It ignores the fact that at least part of the responsibility of our government is to identify issues , propose solutions and implement policies for the benefit of society.  Granted, there are precious few examples of governments setting policies to prevent or avert future problems, but how might futarchy make policy setting more effective in this regard?

What about emergency policies?  Surely, these must be exempt from the process.  Assuming they are, what is to prevent the government, the rich, the corporations and the special interest groups from adopting a do nothing policy until an issue becomes so acute that an emergency policy is required?  Well-oiled lobbying machines will kick into gear, giving us the same, broken process for setting policy.

Fixing Bad Decisions

Here, Robin addresses the issue of a “bug” in the welfare function, probably due to oversimplification.  The elected government must have the power to amend the welfare function and/or reverse the policy decision.  Unfortunately, the process may be too slow to avoid substantial harm and it may be quite expensive to undo a policy. 

Robin proposes that once a policy proposal has been approved, it could be vetoed within the next year, if another market “clearly” estimates bad welfare consequences, using the welfare metric as defined in one year.  That is, he’s proposing an appeal process for policymaking.  Those with the deepest pockets will be in control of veto powers (or at least substantial delaying powers).  Lobbyists will have immense incentives to influence the welfare metric.  Business as usual.

It Seems “Hard” to make one Measure Encode all of our Values

It’s not just “hard”, Robin, it’s downright impossible.  You propose a simplified measure, initially, that would be incrementally amended over time, by the elected representatives.  Lobbyist heaven!

Even your most complex measure of welfare is, still, a remarkable simplification of “national welfare”.  Values in one part of the country will be different from those in another, on many key issues.  At best, “national welfare” will be an “average” of the values held by the citizens.  Every policy decision involves tradeoffs, and one could argue that every policy is different in this respect.  Yet, the national welfare definition “hard-wires” the same tradeoffs for all decisions.  This is far too simple.  I’ll stop here, as this could be the topic of an entire book (and we may not ever need to know the “answer”).

Other Considerations

Budget Constraints & Policy Adoption Ranking

Under futarchy, as long as it is clearly shown that a proposed policy would improve national welfare compared with the status quo, the policy is to be adopted.  You don’t have to be much smarter (if at all) than a chimp to understand that no nation would be financially able to implement every policy that met this standard for adoption.  Simply put, there are budget constraints, now and in the future.  All but the simplest of policies involve financial commitments in the future.  Accordingly, policies adopted in the current period will have budget implications in future years, which will limit the ability to adopt future policies that may be proposed (and that should be adopted).  Futarchy makes no mention of budget constraints

Consequently, there must be a method of ranking policies that are slated for adoption, so that the most beneficial policies are adopted ahead of weaker (though beneficial) ones.  Given the multi-year aspect of all policies, there must be some consideration of a policy’s adoption on the budget resources of future years, which may prevent the adoption of future policies (either under futarchy or in emergencies). 

If a policy is slated for adoption, based on the decision markets, but it cannot be adopted under a budget constraint, then both decision markets need to be voided - the policy adoption market and the status quo market.  Futarchy makes no mention of this possibility and the potential effects it may have on the decision markets.  I wouldn’t even hazard a guess at this point.

Complex Trader Forecasting

Futarchy assumes that if all available, relevant, information is made available to the public, speculators will be able to discern fact from fiction and forecast the national welfare measure accurately.  This assumes that at there are a sufficient number of informed traders that have a very good understanding of the issues and information and that they have decision models able to make accurate predictions.   I’m reminded of the super-human, computer-brained, all-knowing beings that I met during neoclassical economic theory classes.  I thought they had died off, but apparently, they’re back!

Forecasting national welfare under futarchy is an incredibly complex problem.  I don’t think it is even possible for speculators to make reasonably accurate forecasts of national welfare.  They simply do not possess the knowledge or understanding, let alone a decision model, that would allow them to make accurate predictions.  Even if the institution of futarchy provides speculators with forecasts and asks them to bet on the most likely one, they still do not have the necessary tools to make that decision. 

If the traders don’t have enough information to make an accurate forecast, the market will not create it.  Prediction markets merely aggregate available information held by the participants, they don’t create new information through trading.  Prediction market proponents understand that each trader’s prediction is an “accurate” estimate combined with an “error” factor.  The assumption is that the errors cancel out, leaving only the accurate information reflected in the market price.  I think this is likely to be true, but not in every case.  Where the individual errors are large, relative to the known, accurate information, the predicting algorithm is likely to break down.  If you were to consider a large number of traders, each with a very small amount of information, it is highly unlikely that the market will function like a jigsaw puzzle, putting all the “good” pieces together and cancelling the “errors”.  The large error factors will prevent any algorithm from generating a reasonably accurate prediction.

For example, if we were to run a decision market for a policy designed to combat global warming, the forecast would be wildly inaccurate.  The participants simply do not have enough information to make a reasonable forecast.  The market will not create any information that is not already possessed by the traders.  Yet, the market will look the same as an “accurate” prediction market.  Even worse, it is not possible to determine whether the market is accurate.

Uncertainty

There will always be random events that influence the actual outcome.  If markets are “efficient”, it is not possible to predict the effects of future random events on the outcome, based on the information held today.  Prediction markets reflect the level of uncertainty about the actual outcome by providing a distribution of outcome predictions.  When uncertainty is high, the distribution will be relatively flat.  As uncertainty is reduced, the distribution will tend to be tighter.  No prediction market can fully eliminate uncertainty surrounding the actual outcome being predicted. 

To some extent, the longer the time between the prediction (forecast) and the actual outcome (national welfare measure), the greater the uncertainty.  Consequently, most decision markets are likely to exhibit a fairly flat distribution of forecasts at the time the decision will be made.  While Robin Hanson disagrees with me, I believe that such markets are much more likely to be gamed by manipulators.  Furthermore, even if these markets are well-calibrated, they will not forecast the actual outcome, accurately, very often.

Decision Markets vs. Prediction Markets

Back in May, 2009, Mencius Moldbug posted Futarchy Considered Retarded on his blog, Unqualified Reservations.  It was an interesting smack-down of futarchy.  One point he made (among the 7,400 words) was that prediction markets is a fine idea, but decision markets are retarded.  I found this to be an odd comment, because all prediction markets are decision markets.  His distinction didn’t support his argument, and it clearly confused a number of commenters on his blog site and Robin Hanson’s, Overcoming Bias, when he posted his Reply to Moldbug.

Apart from frivolous applications of prediction markets, they all generate predictions about an outcome, and the prediction is used in some decision model to make a decision.  In this sense, they are decision markets.  Robin Hanson uses decision markets to mean a pair of prediction markets that work together to predict the difference between two predictions.  Typically, the difference is the effect of implementing a particular policy (or decision).  Futarchy goes one step further to hard-wire the decision markets to a hard-coded decision model. 

If prediction markets are fine, so are decision markets, but futarchy is still retarded.

Information Asymmetry

Mencius Moldbug made the following point:

“A prediction market, like any other market, functions only in the general absence of asymmetrical information. It is with some pain that I absorb the realization that a member of the George Mason School is unable to correctly apply this concept. … The rational approach to a market in which other players have more information than you is not to play. … This is one of the many reasons why insider trading is illegal.”

Robin replied, correctly, that virtually every market has information asymmetry, to some extent.  Markets still function, albeit not perfectly.  Only in cases where the asymmetry is severe is it possible that the market will cease to exist, and even then, over time, such markets seek to reform their institutions to alleviate the information asymmetries.  Moldbug’s assertion is a bit naive, relying much too heavily on the theoretical effects of information asymmetry in markets.  It is a wonderful, logical theory, but it is about as useful as the neoclassical framework for analysing real world markets.

“After the Fact it is Quite Easy to Test Forecast Accuracy”

Robin Hanson stated this in his reply to Moldbug. 

I find this to be a surprising statement by Robin.  It is not “quite easy to test for forecast accuracy” after the fact.  This involves measuring the degree of calibration between the market distribution and that of the outcome.  In fact, given the uniqueness of the outcomes being forecast, it is nearly impossible to measure calibration.  The best we can hope for is to estimate calibration of specific types of prediction markets with some set of homogeneous (more or less) outcomes.  Without calibration, a necessary condition, it is not possible to pass judgement on the accuracy of a prediction market.  Simply arguing that because one prediction market (pick one) possessed the calibration condition, all prediction markets must have it, is simplistic, without any support, and just plain dangerous.

Consider also that Robin Hanson is looking at a 20+ year measurement of the outcome for most public policy decision markets, under futarchy.  At best, there is a tremendous time lag (20 years or more) before it would be possible to test the calibration of any decision markets.  Remember, David Pennock’s analysis involved the calibration of markets 30 days before settlement.  To argue that these markets will be as well-calibrated (and accurate) as horse race betting markets is a ridiculous assumption.  Race track bettors at least read a racing form before making their bets.  In decision markets, we are merely pointing the chimps toward the dart board.

Conclusion

Robin Hanson doesn’t really give us his conclusion in the paper, but we can infer that he thinks futarchy is “promising”, based on his handling of the 33 design considerations and the list of next steps in the evolution of futarchy.  Further support comes from his op-ed piece in August, 2009 and his upcoming futarchy debate with Mencius Moldbug on January 16, 2010.

My conclusion is that futarchy has no chance of success, whatsoever.  It is a hopelessly flawed concept, even if its aim is true.  Decision-making, especially public policy decision-making cannot be done properly with such a simplistic process.  Inevitably, important considerations are left out of the decision, leading to bad decisions.

Robin believes that the information necessary to make good decisions exists, but that it has not been aggregated accurately.  I do believe that this is at least partly true.  However, I also think a large portion of information that is needed to make proper decisions does not presently exist.  Perhaps more of our resources should be directed to uncovering the missing information. 

In particular, market prices in the real world do not reflect externalities from economic activity.  Current proposals for a carbon tax or for cap and trade are attempts to include the cost of carbon emissions in economic decision-making.  If successful, either of these policies would have an impact on market prices for all goods throughout the economy, reallocating scarce resources to better economic uses.  Placing values on pollution, fresh water and other critical resources might be a far more important solution to the information problem in public policy decision-making.  That’s my “out there” idea for the decade to come.

Posted by: Paul Hewitt | January 1, 2010

Tories to Pay Dearly for Common Knowledge

“Maybe the Tories are so out of touch they don’t know what’s out there, but they shouldn’t waste £1m of public money reinventing the wheel.” – Jenny Willott, Liberal Democrats’ spokeswoman.

On Wednesday, December 30, the Guardian.co.uk reported that the Tories announced a new offer to pay £1m for the development of an online platform to harness the “wisdom of the British crowd” to solve problems related to British governance.  Recognizing that the collective knowledge of the British people is much greater than that of a “bunch of politicians”, the Tories believe that such a platform will generate solutions to vexing problems.  If it works, the “winning” entry will receive the payout.  However, it isn’t quite clear what needs to be developed in order to win.

Based on the few “starter” problems that might be addressed by the new platform, it appears that they are looking for an idea pageant.  But these are readily available, now.  Hence, the reinventing the wheel comment.  Perhaps the Tories think it needs to be large enough to accomodate all British citizens – it doesn’t.  Once you have a crowd, a bigger one isn’t much better.  The Tories show their lack of understanding about how information aggregation markets work.  They require incentives for participants to reveal their private information.  Maybe most of the funds should be devoted to rewarding those that come up with the winning ideas and those that recognize (and bet on) a good idea when they see one. 

While it may sound a bit wacky at first, there is a lot of potential.  It is sure to generate more good ideas than are being developed by the government on its own.  Without having to pay exorbitant consulting fees to generate garbage ideas, it is sure to be cheaper than their current problem solving process.  Over time, the idea market will come to recognize the top idea creators and those who are able to recognize them.  Maybe these people could form a future, wiser government? 

At least they could operate a legal, real-money, betting market if they choose to go that route.  A caution:  even if the idea pageant (or idea market) works, it will be up to some intelligent life form within the government to make sure that good ideas don’t go to waste through bad implementation. 

Two final points.  One, this will not be an example of public prediction markets.  It will be an example of information aggregation, but there is no prediction involved.  Two, it is, perhaps, the best possible use of an information aggregation framework for helping governments improve their decision-making.  In my next post (or two), I will turn my attention to the other information aggregation framework for good governance, the “retarded” futarchy of Robin Hanson.

Posted by: Paul Hewitt | December 21, 2009

The Essential Prerequisite for Adopting Prediction Markets

Prediction markets have been promoted as the best thing since “sliced bread” for forecasting future outcomes and events.  The truth is that the case has not been made to justify this position.  Today’s post will examine the necessary prerequisite for adopting prediction markets, and build a case for the seemingly incongruous conclusion that more prediction markets need to be put into practice now.

I have been very interested in the potential of prediction markets to accurately predict future events and outcomes, but I have been equally frustrated that not only is there very little proof that they work “as advertised”, very few researchers are even looking at the issue of accuracy.  It is as if the vendors and leading academic proponents simply repeat (over and over) a few past “success” stories, quickly conclude that prediction markets work (and if one works, they all work), and proceed to describe their newest application.

Robin Hanson and others have advocated the use of prediction markets, where they can be shown to be better than alternative methods of forecasting future outcomes.  It is hard to take exception with this statement, other than to question how one might implement it.  That is what prompted today’s paper.

“Better”

In order to be considered “better” than an alternative forecasting method, a prediction market must  generate a marginal net benefit by forecasting more accurately than the next best alternative method.  This implies that a more accurate prediction causes the decision-maker to choose a different course of action than the one that would have been chosen had a less accurate prediction (or forecasting) model been relied upon.  Not only that, the better course of action must generate a net benefit. 

So, the prediction must be materially more accurate than the alternative forecasts.  That is, the improvement in accuracy must be large enough to cause the decision-maker to change his or her decision.    The decision-maker must be able to choose a more beneficial course of action (it must exist as a possible action).   Finally, there must be sufficient time to implement the better course of action.  Most of the real world prediction markets have been unable to meet these conditions.  The HP markets showed that there was some potential for prediction markets, but none of the pilot markets generated materially more accurate predictions than the official forecasts.  The General Mills prediction markets, using much larger crowds than the HP markets, were no better than the internal forecasts, too. 

It is very questionable, at this point, whether it is possible to achieve accurate predictions from markets sufficiently far in advance to implement more beneficial courses of action.  There are very few long-term prediction markets, and even these are wildly inaccurate until very close to the actual outcome revelation.  Operating a long-term prediction market is pointless unless it is possible to take some beneficial action based on an accurate prediction.  One can only imagine the harm that could be caused by basing public policy on an early prediction of a long-term market, only to find that the policy was completely inappropriate.  Until their advocates work out this little problem with long-term prediction market accuracy, these markets should never be used to support any important decisions. 

Most of the real world prediction markets are very short-term in scope.  Even when they provide accurate predictions, in most cases, it is almost impossible for the decision-maker to make any significant changes to the course of action.  We can see this in the General Mills prediction markets (follow link above), where the markets only arrive at accurate predictions during the second month of a two month sales forecasting problem.  Not actionable.  Hardly useful information.

One exception to this general observation is the case of markets to predict project milestone completion dates.  The reason that these markets offer some promise is that decision-makers can use this information profitably on a daily basis.

Calibration

So, how do we determine whether a prediction market is accurate?  David Pennock helps us out by stating, “the truth is that the calibration test is a necessary test of prediction accuracy.”  As he comments, this is a necessary condition for statistically independent events.  The problem with this definition is that calibration is impossible to prove.  The best we can do is empirically estimate the calibration of a large number of similar prediction market predictions with the distributions of similar outcomes.  To date, no one has researched the calibration of specific prediction markets in any useful way.  True, there have been studies of horse race betting markets that have shown a very strong calibration with actual horse race outcomes, but this only proves calibration of these types of pari-mutuel markets.  Such results indicate that it may be possible to obtain well-calibrated prediction markets, but it certainly does not prove that such markets are, in fact, calibrated.

For more information about this, please refer to my previous post on calibration, here.

Why does calibration matter? 

As the number of uncertain future outcomes (or events) grows, they form a distribution, which provides us with the likelihood of each outcome occurring.  If we knew the distribution of actual outcomes before one occurred, we could make an optimal decision.  We would choose to base the decision on the most likely outcome.  This does not mean that we would always be right.  In fact, if we were to make this decision a number of times, we would only expect to be “right” about the same number of times as the likelihood of that outcome occurring would suggest.  But this is a hypothetical example where we know the actual distribution of the outcomes.  In order to make an optimal decision in the real world, we would like to find a method of estimating the distribution of actual outcomes.  The better the estimate, the better the decision-making result.

Some situations involve outcomes that are discrete and have no relationship between the alternatives.  Examples might include the selection of a future Olympic host city, the winner of a horse race, or who will win a contest.  Decisions involving these types of problems require a very high percentage of correct predictions, in order to be useful.  Since there is no relationship between the possible outcomes, it is not possible to “just miss” and be “almost right”.  Coming close is no good at all.  We’re still dealing with a distribution of outcomes, and we will still base our decision on the most likely outcome, but unless one of the possible outcomes has a high likelihood of occurrence, we are likely to be wrong more often than we are right, even when the prediction distribution is accurate.  The higher the likelihood of one outcome occurring, the less uncertainty there is about the outcome. 

Such discrete outcome situations are problematic for prediction markets.  The only way to minimize the percentage of incorrect decisions is to predict outcomes that have very little uncertainty associated with them.   If one of the outcomes is a near “sure thing”, we don’t need a prediction market to figure this out!  One potential use of prediction markets for these types of problems is to provide a ranking of the possible outcomes.  The decision-maker would make a decision based on the most likely outcome and develop contingency plans for other reasonably likely possible outcomes. 

Many outcomes are points along a continuous variable, such as dates (on a time line) or sales volumes (part of all possible sales volumes).  In these types of situations, making decisions based on a reasonable range surrounding the most likely outcomes may be quite acceptable.  It depends on the tightness of the distribution and the sensitivity of the decision to the outcome being relied upon.  That is, if the decision would not change when the outcome falls within a certain range, and the outcome can be expected to fall within this range a high percentage of the time, the risk of a “wrong” decision will be minimal.

The closer the distribution of predictions matches that of the actual outcomes, the more often the prediction market will provide an accurate prediction of the actual outcome.  This is not to say that the prediction market will always be correct.  It only says that it has the greatest chance of being correct most often.  Consequently, over a large number of trials, a well-calibrated prediction market will generate the best overall results from decisions that rely on the market predictions.

A prediction market provides a distribution of predictions around a mean market prediction.  Most decisions would be made based upon the mean market prediction.  If the market is calibrated with the distribution of actual outcomes, this will maximize the number of occasions that the decision will be correct, based on the actual outcome.  Furthermore, in non-discrete outcome cases, coming close to the predicted outcome will be the next most likely outcome to occur.  Coming close may be good enough.

Comparing Forecasting Methods

Our original problem was to determine whether a prediction market is better than another method in forecasting an outcome.  Now that we know a bit about distributions and calibration, we can proceed.

Most forecasting methods provide subjective distributions of forecasts, if they provide any at all.  Prediction markets offer a significant improvement over other forecasting methods, by providing an objective distribution of predictions, which can be compared with the distribution of actual outcomes.  This gives us the possibility of measuring the calibration accuracy of a prediction market, if we can obtain enough data points to consider.  At least it is possible.  Most other methods can create a rough distribution of possible outcomes which may be tested for calibration.  A good example is a sales forecast with a “worst case”, “most likely” and “best case” scenarios.  Likelihoods would be applied (subjectively) to create a rough distribution of possible outcomes.

Next, we need a fairly large number of trials.  This is a problem for almost every type of prediction market we may wish to consider.  Technically, each outcome or event is unique.   We can’t obtain a large number of trials for a particular outcome.  However, maybe we can obtain a larger number of trials for a set of homogeneous prediction markets and outcomes.  Ideally, each prediction market should have approximately the same “crowd” of participants and be attempting the predict the same type of variable outcome, such as quarterly sales of a product.  Another crowd could predict project completion dates, etc… 

After a reasonable number of trials, we would measure how well the distribution of predictions matched the distribution of actual outcomes.  That is, across all of the prediction markets, prediction ranges that had, say, a 10% probability of occurrence should capture the actual outcome 10% of the time.  If this is true for all (or most) of the prediction probabilities, we can conclude that type of prediction market is “well-calibrated” and may be used for future predictions of that type, using that “crowd” of participants.  Of course, we would also measure the calibration of the distributions (however crude) from the alternative methods.  Whichever method consistently develops the best-calibrated distribution of predictions should be the primary information model for that particular type of decision-making.  This doesn’t necessarily mean that you can drop all of the other forecasting methods.  These other methods may be generating the information that is being aggregated by the prediction market.  If we were to eliminate the source of critical information, the prediction market may not be as accurate.  In both the HP and the General Mills markets, some or all of the prediction market participants were also part of the internal forecasting process.  At HP, it appears that the markets were better predictors of the internal forecast than they were of the actual outcome.

Every “crowd” is different, and each type of outcome has unique information required to make a reasonable prediction.  Consequently, it would be ridiculous to assume that, because one prediction market is considered accurate, all prediction markets are accurate.  Yet, this is exactly what we are told on vendor web sites, and worse, by academic researchers.  It can probably be taken as a ”fact” that horse race pari-mutuel markets are well-calibrated, so it is not surprising that we find almost everyone assuming that these markets are accurate.  Add a tie-in about how similar pari-mutuel market are to prediction markets, and we’re half way home. 

A few prediction market successes in political election markets and one “success” in enterprise prediction markets are trumpeted, in just about every academic paper on prediction markets, as evidence that prediction markets are “more accurate” than alternative forecasting methods.  On the basis of a mere handful of prediction market success stories, they conclude that prediction markets are the future of forecasting.  This is simply wishful thinking and leads one to question the motives of those who continue to promote a model that they know (or ought to know) is not nearly as accurate or useful as they claim and has precious little proof  that it works for each type of promoted application.  The worst part about this is that the research has slowed to a trickle.  There seems to be no need to prove that prediction markets work.  It has already been done.  Now it is all about getting an application on the market.

By now you may be thinking this guy really has it in for prediction markets.  They’re nothing but high-tech “snake oil” and the sooner these defective products are removed from the market the better.  Fair enough.  I do think that the vast majority of prediction markets could be categorized as “snake oil”.  Completely unproven.  However, I do think they have some potential to improve decision-making enterprise applications.  

Since the only way to determine the accuracy of a prediction market is to determine its degree of calibration with that of the distribution of actual outcomes, we need to focus on calibration.  The only way to measure calibration is empirically.  Since this will require as many trials as possible, I am actually going to advocate that their use be promoted even though there are few benefits right now.  As they are promoted, the clients must be told that they aren’t proven, yet, but that there is a possibility that they will develop into very useful tools in the future.   

Since calibration is not a characteristic of prediction markets in general, we need to assess calibration for each type of market and for each “crowd”.  That is an awful lot of work, but without it, prediction markets are nothing more than a crap shoot.

Posted by: Paul Hewitt | December 1, 2009

Measuring Decision Market Accuracy

I came across this post: On Prediction Markets for Climate Change by Rajiv Sethi, an economics professor at Columbia University.  In his post, he makes a very interesting point that I have yet to see in any research paper about prediction markets.  He was commenting on the recent debate between Matt Yglesias and Nate Silver, regarding the use of prediction markets to help guide policy about climate change.  By way of a very brief summary, Matt believes that big business (coal and oil) will manipulate the market to influence the setting (or not) of policies that would be detrimental to their interests.  Nate thinks this is rubbish.  If the markets are broad-based and have sufficient liquidity, attempts to manipulate the market price will not succeed.  Nate thinks the markets would be “efficient”, providing market prices that accurately aggregate available public information.

Compelling Logic?

Here is where it starts to get interesting.  Rajiv comments that the logic of Nate Silver’s position is so compelling, it simply must be true.  That is, broader participation and more liquidity makes for efficient markets that generate more accurate prices.  To his credit (and I might add that he seems to be the only one), Rajiv set out to see whether this holds up in the real world.  He used Intrade and IEM markets about the 2008 election.  The hypothesis was that the IEM markets, with a more limited base and lower trade volumes, should have been less efficient than the Intrade markets.  Instead, he found the opposite!  Compelling, indeed.

How Do You Measure Efficiency?

“First of all, let’s think for a minute about how one might determine which of two markets is aggregating information more efficiently. We can’t just look at events that occurred and examine which of the two markets assigned such events greater probability, because low probability events do indeed sometimes occur.   If we had a very large number of events (as in weather forecasting) then one could construct calibration curves to compare markets, but the number of contracts on IEM is very small and this option is not available. So what do we do?”

This paragraph from Rajiv’s post, summarizes the problem of determining whether a market is “accurate”.  We believe that if a market is well-calibrated, the distribution of its market prices will be “accurate”, reflecting all market information about the outcome.  Consequently, it will be described as “efficient”.  He points out the difficulty (in most cases the impossibility) of measuring the calibration of a market and asks “what do we do?”

Essentially, he comes to the conclusion that it is impossible to measure the efficiency of a market.  However, it is possible to say which market is more efficient.  In other words, we can determine relative efficiency of two markets.  He outlines a cross-market arbitrage mechanism that could be used to eliminate price differentials for identical contracts in different markets.  You can read the approach in his post, cited above.  While he did not actually run the arbitrage experiment, he did perform an informal test to determine which of two markets was more efficient. 

The market with the smaller change in price is the more efficient of the two markets.  Effectively, then, the more efficient market’s price will be a better predictor of the future market price in the other market.  This was how he determined that the IEM markets were more efficient than those in Intrade, despite there having a limited participant pool and lesser liquidity.

So far, we have been able to determine which of two markets is the more efficient, but we don’t know how much more efficient.  Also, we don’t know whether either market is  sufficiently “efficient” for the purpose of determining its accuracy.  Both markets may be “inefficient”, yielding inaccurate or misleading market prices. 

How did IEM do it?

Rajiv gives two possible explanations as to why the IEM markets were more efficient than the Intrade ones.  Neither explanation is good news for Nate Silver’s position.

One explanation has manipulative traders moving into the Intrade markets, in order to influence the prices (odds) quoted in the media and in political blogs.  The argument is that Intrade prices were much more widely cited than those of the IEM markets.  The reasoning goes that temporary dips in market prices can be eliminated through manipulative trading.  A political party may wish to see this done, so as not to upset campaign contributions or to minimize the impact of negative information.  The author argues that the benefit of such manipulative trading could be far in excess of the cost.  Since IEM’s markets were not as widely cited in the media or blogosphere, there was a lesser incentive to manipulate prices there.

Even if we believe the research (limited) on manipulation in prediction markets, it is more than likely that a short term (maybe even a very short term) manipulation could persist long enough to achieve the intended objective.   For example, the price could be manipulated just prior to when news stories are being finalized for the following day’s paper.  Once the paper hits the streets, the manipulated price may have been corrected, but the damage has already been done.  And this is the “best case” scenario regarding prediction market manipulation.  In the worst case, the manipulation is successful as the market is unable to correct the inaccurate price.

I’m not an expert on US campaign finance, but I wonder whether an Intrade market manipulator would need to declare the amount of funds used to implement the price manipulation scheme (or whether such a person or corporation would be considered a donor at all).  If the answer is no, it would provide an additional incentive for parties or candidates to manipulate the markets for political purposes (without having to account for the funds used).  We all know what happens when incentives are strengthened.

The other explanation is that inefficient markets attract higher participation rates and market liquidity, as traders seek to profit from inaccurate prices.  Efficient markets have fewer profit opportunities and less trading is required to keep prices accurate.  As Rajiv explains, Nate Silver is caught in a paradox.  Nate’s attempt to design a market with high participation and strong liquidity, in order to achieve efficiency (and hence, accurate prices), conflicts with Rajiv’s finding that it is the market inefficiency that generates the high participation and liquidity.

The Road Ahead

Despite all of these arguments, Rajiv Sethi believes that prediction markets on climate change topics should be tried.  He suggests that corresponding markets be offered in other marketplaces, such as the IEM, so that market efficiency comparisons can be performed and studied.  I’m sure useful information could be gleaned from this effort. 

We need to keep in mind that some (or most) prediction markets may not work, however.  The objective of prediction markets is to accurately aggregate information held by the market participants.  If those participants do not have the information (or are unable to get it and profit from it), the market will be unable to generate an accurate prediction or there will be too much uncertainty about the prediction, rendering it useless for decision-making.

Personally, I like the idea of decision markets, but I think we will find that our efforts to use these markets to help guide climate change policy will ultimately fail.  There is simply too much information that is needed to accurately predict the important metrics.  It is hopeless to think that, not only will there be “informed” traders, they will be able to counteract the trading of the uninformed traders and the manipulators.  Any useful standard of “informed” traders might result in a mere handful of individuals spread throughout the world.  The impact of manipulators would swamp any efforts of the informed to set the “right” price in the market.  That said, there may be metrics that can be predicted (with reasonable accuracy) by a large number of traders.  Such predictions could be used as inputs into public policy decision models.  As with all prediction markets, the predictions must be accurate and consistently so.

Posted by: Paul Hewitt | November 27, 2009

Traders DO Need to Know the Direction of Manipulation

Information Aggregation and Manipulation in an Experimental Market Robin Hanson, Ryan Oprea, David Porter

This study looks at price accuracy in experimental (laboratory) markets, where there are price manipulators.  The overall finding is that non-manipulative traders compensate for the bias inherent in the offers from manipulators, by setting a different threshold for trading.  The authors acknowledge that the “identification of manipulation in the field is difficult” and empirical evidence is scarce and tenuous.  Hence the need for a controlled, laboratory experiment.  For background on the experiments, please refer to the original paper.

There were two parts to this experiment.  In the Replication Treatment, there were no manipulators present, and in the Manipulation Treatment, one-half of the participants were given an incentive to increase the median price at the close of the market.  All participants knew that half of their number had this incentive to manipulate, and they knew the direction that the manipulation would take (upward). Where the non-manipulative traders knew that the manipulative traders would attempt to bid up the price in the market, they lowered their threshold for accepting offers, effectively counteracting the manipulative influence in the market. This makes intuitive sense, but only in the case where the non-manipulative traders know the direction of the manipulation.

In my previous post, I indicated that it would be necessary for the non-manipulative (“informed”) traders to know which direction the manipulators would try to move the market.  Robin Hanson commented that this is not necessary.  I think he is wrong, now, but he was right when this paper was written!   I think the authors are saying that it is required.  In fact, in the paper, they go a step further and allow all participants to know the strength of the incentive to manipulate.  We should keep in mind that, while this experiment demonstrates the concept of market manipulation and whether it can have a persistent effect on market prices, it is a pretty simple, controlled example.  The real question is whether it can be generalized to more complex, real-world situations.

Posted by: Paul Hewitt | November 26, 2009

Decision-makers May be Smarter than Manipulators

Can Manipulators Mislead Market Observers? – Ryan Oprea, David Porter, Chris Hibbert, Robin Hanson and Dorina Tila.

This study showed that uninformed third parties (observers) are able to make significantly better forecasts of asset values based on market prices (of those values) in an experimental market.  Even when half of the traders attempted to manipulate the market, the observers’ forecasts were no less accurate.

It appears that the observers are able to adjust the market price to remove most, or all, of the effects of manipulation.  To me, this means the observers were using some other form of decision model to arrive at their forecast.  Such a model used the market price along with other trade data, enabling the observer to alter the forecast from that determined by the market price alone.  The authors note that the observers were able to do this, despite the fact that the non-manipulative traders and the observers did not know which direction the incentives for manipulation ran.

This is quite a remarkable result.  It would have been nice to know how they were able to make these accurate forecasts with market price data that had been manipulated.  One of the findings was that upward price manipulation resulted in about a 7% increase in the market price (though there was no similar effect for downward manipulation).  The authors note that further study is required along with robustness tests.  I agree that it might yield very useful insight into the process of making a forecast based on prediction market prices.

In a sense, the observer should be considered a decision-maker.  If decision-makers are able to filter out the effects of manipulation in a real public policy prediction market and make an accurate forecast of the underlying metric, perhaps there is a role for such markets.  I would feel a lot more comfortable if we knew how the decision-maker (observer) is able to accomplish this feat.  Finally, we need to know if this was only possible, because it was a fairly simple experimental model.  Will the same decision-maker’s  ability exist in extremely complex public policy markets?

Older Posts »

Categories

Follow

Get every new post delivered to your Inbox.