The Art & Science of Prediction
The Signal & The Noise
By Nate Silver
The world has come a long way since the days of the printing press. Information is no longer a scarce commodity; we have more of it than we know what to do with. But relatively little of it is useful. We perceive it selectively, subjectively, and without much self-regard for the distortions that this causes. We think we want information when we really want knowledge. The signal is the truth. The noise is what distracts us form the truth.
The Author Himself
Nate Silver is an American statistician and writer. He founded the blog fivethirtyeight and rose to fame after predicting 49/50 states in the 2008 presidential election. However, he was not able to predict the winner of the 2016 election.
The Promise and Pitfalls of Big Data
Silver writes that the sheer amount of data is sometimes seen as a cure-all as computers were in the 1970s. Chris Anderson, the editor of Wired Magazine, wrote in 2008 that the sheet volume of data “would obviate the need for theory, and even the scientific method.” Chris argues that “ Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.” We will no longer need first a hypothesis and then a test.
However, Silver writes that he does not view it this way and that people are essential still in the forecasting process.
“This book is an investigation of data-driven predictions in fields ranging from baseball to finance to national security.”
Humans vs Data
Humans are not the fastest, strongest or most agile of any animal. However, Humans have a special capacity for detecting patterns. A newborn baby can recognize basic patterns like a face with this being learned through evolution. However, people can also find patterns within the noise which are not true while we are also prone to biases.
Silver writes that information overload can lead to negative consequences. For example, a study in Nature found that the more information that strong political partisans were about global warming, the less they agreed with each other — the additional information did not bring them all closer to the truth.
As the sheer volume of data is increasing so to is the noise with “noise increasing faster than the signal…There are so many hypotheses to test, so many data sets to mine-but a relatively constant amount of objective truth”.
A Catastrophic Failure of Prediction
“I am convinced, however, that the best way to view the financial crisis is as a failure of judgment-a catastrophic failure of prediction.”
The credit rating agencies were a prime example of this. S&P’s AAA rating given to CDOs was meant to mean that there was a 0.12% probability that it would fault — in reality 28% of AAA-rated CDOs defaulted more than 200x higher than predicted.
What is a CDO? A CDO is a collection of mortgage debt that is broken into different tranches based on their risk profile. For example, the safest of the bets pays out unless all five of the mortgages default while the riskiest would lose you money if any of the five defaulted. An investor would be compensated with a cheaper price for the additional risk.
How do you calculate the likelihood of all five mortgages defaulting? If you were looking at a single mortgage it would be simple, e.g. on it’s own one mortgage may have a 5% change of defaulting.However, you only loose if all five default so this is the real question.
One assumption was that the mortgages were independent. Therefore, your risk is diversified. The other assumption is that being independent they will all behave exactly alike i.e. all will default or none will. If this is the case the bet is 160,000 times riskier than previously thought. If the housing market was booming maybe you could assume interdependence with some defaults happening by chance. But if there is a single black swan event which unites the owners the interdependence vanishes.
The ratings agencies problem is a broad sense was there misunderstanding of risk vs uncertainty.
- Risk — something you can put a price on.
- Uncertainty — risk that is unknown and hard to manage.
This was all compounded by leverage. Goldman Sachs had a leverage ratio of 33 to 1 meaning that it had $1 in capital for every $33 in financial positions. Therefore, a 3–4% decline in the value of its portfolio would put it into negative equity.
The errors or prediction surrounding the financial crisis largely arise due to the events being forecasts being out of sample. For example:
- Confidence in the ratings agencies was based on the fact that their ratings had generally performed well in the past. However, the agencies had never before rated such complex and new instruments.
- Confidence that house prices would continue to increase was based on the fact that there had not been a decline in the recent past. However, there had never before been such as boom in prices.
- Confidence that a housing crisis would not have a major impact on the financial system was based on past data in which the financial system was not critically hurt. However, they had never been so highly leveraged.
“We forget that our models are oversimplifications of the world….One of the pervasive risks that we face in the information age, is that even if the amount of knowledge in the world is increasing, the gap between what we known and what we think we know may be widening. This syndrome is often associate with very precise-seeming predictions that are not at all accurate. Moody’s carried out their calculations to the second decimal place-but they were utterly divorced from reality.”
Are you smarter than a television pundit?
The McLaughlin Group is a political round table show which has a segment in which panelists provide predictions on various topics related to politics — i.e. who will win the election?.
Silver looked into the accuracy of these “expert” predictions. He took 1000 examples (of which 750 were analyzable) and rated answers from completely false to completely true.
“The panel may as well have flipped coins. I determined 338 of their predictions to be either mostly or completely false. The exact same number-338- were either mostly or completely true.”
Tetlock
Philip Tetlock, a professor of psychology and political science, wanted to understand why no professional forecaster say the demise of the USSR. He concluded that the “because a prediction that not only forecast the regime’s demise but also understood the reasons for it required different strands of argument to be woven together…they tend to emulate from people on different sides of the political spectrum and scholars firmly entrenched in one ideological camp were unlikely to have embraced them both.”
Inspired by his discoveries, Tetlock began to take surveys of expert opinions in other areas (Japanese real-estate bubble, Gulf War). His study after 15 years and 28,000 predictions titled “Expert Political Judgment” was damning. The experts in his survey had done barely better than chance and were grossly overconfidence.
- 15% of events that they said had no change of happening did occur.
- 25% of events that they said would definitely occur ended up not happening.
However, some did better than others. Those who were on TV the most did worse. Those who were classed as Hedgehogs did worse than those classed as Foxed. Hedgehogs believed in Big Ideas that underpinned the world. Foxes believed in many small ideas and looking at multiple points of view.
Principles to be fox-like
- Think Probabilistically of outcomes vs single outcome.
- Change your mind-Predictions can change with new information.
“Take the best forecast possible today -regardless of what you said last week, last month or last year.”
All I Care About Is W’s and L’s
Silver describes his early beginning developing the PECOTA (Player Empirical Comparison and Optimisation Test Algorithm) system which he used to forecast Baseball player’s performance. Silver originally started developing the project while working as a transfer price consultant at KPMG.
The Ageing Curve
Silver describes how when predicting a player’s performance one must look at the ageing curve.
It has been discovered that a “typical player continues to improve until he is in his late twenties, at which point his skills usually begin to atrophy, especially once he reaches his mid thirties.”
“Olympic gymnasts peak in their teens, poets in their twenties; chess players in their thirties; applied economists in their forties, and the average age of a Fortune 500 CEO is 55.”
Can’t We All Just Get Along?
Silver describes how in baseball the best approach comes from a combination of “statheads” and “scouts”. With huge advances in data, scouts are still being used to a large extent every team.
“Organisations that would have been classified a “scouting” organisations in 2003, like the St. Louis Cardinals, have since adopted a more analytic approach and are now among the most innovative in the sport. “Stathead” teams like the Oakland A’s have expanded rather than contracted their scouting budgets.”
Silver’s PECOTA performed a little worse than Baseball America’s scout driven forecasts. As Silver describes it…
“I don’t expect the PECOTA rankings to be as accurate as … the rankings you might get from Baseball America. The fuel of any ranking system is information — and being able to look at both scouting and statistical information means that you have more fuel.”
For Years You’ve Been Telling Us That Rain Is Green
Silver describes the formation of Hurricane Katrina in August 2005. Amazingly, the National Hurricane Center had nailed its forecast of Katrina; it anticipated a potential hit on the city almost five days before the levees were breached.
A Brief History of Weather Forecasting
During the enlightenment the idea was adopted that weather systems could be predicted through data.
This idea was taken to the extreme by Simon Laplace, a French astronomer and mathematician in what is now known as Laplace’s Demon.
“We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.”
If we have perfect knowledge and perfect knowledge of the laws that govern the universe we could make perfect predictions.
In 10950 John Von Neuman used a machine that could make 5,000 calculations per second to predict the weather, however, the forecasts were not very good. Today machines such as Bluefire have immense speed (15 billion times greater than Von Neuman’s computer), however, progress has been steady but slow.
The Main Challenge is Chaos Theory
Chaos Theory applies to systems which have 2 properties:
- The system is dynamic — current behaviour influences the future.
- Nonlinear — exponential rather than linear relationships.
With nonlinear systems, a small inaccuracy can have large effects. If we should have 5 + 5 but instead have 5 + 6 the difference is small. But if do 5⁶ instead of 5⁵ the difference is huge. If the system is dynamic then the inputs from the first mistake feed into nonlinear inputs in the second step leading to huge errors from an initially small mistake!
Computers x Man
It would be wrong to think that the National Weather Service was purely a computationally driven service. In fact, it is a mix of man and machine.
“The forecasts know the flaws in the computer models. These inevitably arise because even the most trivial bug in the model can have potentially profound effects (^ chaos theory)….in the way that skilled pool players can adjust to the dead spots on the table.”
The NWS keeps two different set of books: one that shows how well the computers are doing by themselves and another that accounts for how much value the humans are contributing. According to the agency’s statistics, humans improve the accuracy of precipitation forecasts by about 25% over the computer’s guidance alone and temperature forecasts by about 10%.
Desperately Seeking Signal
Overfitting
Silver describes Overfitting as “The most important scientific problem you’ve never heard of”.
He posits that you are asked by your mafia boss to find a method for picking combination locks. He gives me three locks to practise on — a red one, a black one and a blue one. After experimenting you come back and say that you have found a solution. If the lock is red the combination is 271, if its blue its 103 and its its black its 419. This is overfitting. You have given an overly specific solution to a general problem.
John Von Neuman describes how “With four parameters I can fit an elephant”.
How To Drown In Three Feet Of Water
The Importance of Communicating Uncertainty
Silver describes the flood of the Red River in North Dakota. The residents had been aware of the flood threat for a months. The levees had been built to handle fifty-one feet of water. The respective forecasts predicted forty-nine feet of water. The river reached fifty-four feet causing a flow. In the past the predictions had a error of +- nine feet putting fifty-four feet in the bounds of possibility.
But this was not communicated with only the forty-nine feet predictions being stated!
“An oft-tol joke: a statistician drowned crossing a river that was only three feet deep on average.”
Survey of Professional Forecasters
The Survey of Professional Forecasts asks economists to indicate a range of outcomes for where they see the economy heading.
For instance, the probability that GDP might come in at between 2–3%. A 90% prediction interval, is supposed to cover 90% of the possible real world outcomes. We would expect the value to fall outside this interval roughly twice in eighteen years. In fact the interval fell outside economists predictions interval six times in eighteen years.
“Another study which ran these numbers back to the beginnings of the Survey in 1968, found even worse results: the actual figure for GDP fell outside the prediction interval almost half the time…they fundamentally overstate the reliability of their predictions.”
Economic Data
There are 45,000 economic indicators produced each year by the US government. Private data provider track 4,000,000 statistics. With this much data overfitting is bound to occur.
Superbowl
For example, a famous fake indicator of economic performance was the winer of the Super Bowl. From 1967 to 1997 the stock market gained an average of 14% for the rest f the year when a team from the NFL won the game but fell by almost 10% when a team from the AFL won instead. The indicator predicted the direction of the stock market 28/31 years. Statistically there is a 1 in 4,700,000 possibility that this relationship was due to chance. But it is!
With so many variables to pick from you are bound to get lucky!
False positives are everywhere — Ionnidis discovered that two-thirds of medical journal discoveries cannot be replicated!
Rage Against The Machines
Kasparov vs Deep Blue
“There are more possible chess games than atoms in the Universe”
In 1997, Kasparov played a six-match game again IBM’s Deep Blue.
- In the first moves of the first game, Kasparov aimed to take Deep Blue out of the historical games that it had been training on. In the first three moves, he took Deep Blue into a board that had occurred just once in master-level competition out of the hundreds of thousands of games in Deep Blue’s database.
- The heuristics that Deep Blue had learned can also not be applied in all cases. For example, the heuristic to “accept a trade when your opponent gives up the more powerful piece” is usually a good one-but not necessarily when you are up against a player like Kasparov who knows that such a tactical loss is outweigh by a strategic gain.
- At the end of the first game, Deep Blue made an error and then strangely resigned the game just one turn later. Kasparov asked “how can a computer commit suicide like that?”. Kasparov could not work this out and deemed Deep Blue to have some deep wisdom unknown to him. This shook his confidence, and he never beat Deep Blue again.
ChessBase.com
In 2005 the website ChessBase held a freestyle chess tournament in which players were free to supplement their own insight with any computer program or programs they liked and to solicit advise over the internet. The winner’s were a pair of twenty something amateurs who surveyed combinations of three computer programs to determine their move.