The Art & Science of Prediction

Super-Forecasting

Harry Cheslaw
7 min readAug 4, 2020

By Philip Tetlock and Dan Gardner

Who is Philip Tetlock

Philip Tetlock is currently the Annenberge University Professor at the University of Pennsylvania where he is cross-appointed at the Wharton School and the School of Arts and Sciences.

Tetlock is most widely known from his work “Expert Political Judgment: How Good Is It? How Can We Know?” which tries to answer the question of whether experts predictions are consistently more accurate than non-experts and the degree which they should be trusted. Super-Forecasting uses this previous work as a foundation to further our ability (or lack their off) to predict future events.

On the Expert Political Judgements Project

The Expert Political Judgements study was run over 20 years in which Tetlock asked a group of pundits to rate three possible outcomes for a political or economic event on a scale of 0 to 10 on how likely each outcome is.

Early on Tetlock makes the point that the results of his initial study regarding expert’s abilities commonly get mis-interpreted with headlines such as “experts are as accurate as monkeys throwing darts”. Tetlock writes that his research had shown that “the average expert had done little better than guessing on many of the political and economic questions” that they were given. However, many does not equal all (averages are misleading) while short-horizon questions were answered more accurately than those with medium or long term horizons. Tetlock would place the pundits in two groups with one being no better than apes and others showing a small amount of foresight.

Tetlock writes that he is an “optimistic skeptic” and that he believes it is possible to see into the future at least to some extent with the disclaimer that his previous work should not justify nihilisms.

Phase 2 of his research

Following his earlier work (Phase 1), Phase 2 started in the summer of 2011 when the Good Judgement Project was launched which invited volunteers to sign up and forecast the future. Over the life of the project more than 20,000 people tried to figure out “if protests in Russia would spread, the price of gold would plummet, the Nikkei would close above 9,5000”.

The GJP was part of a larger research sponsored by the Intelligence Advanced Research Projects Activity (IARPA) whose job it is to sponsor daring research to make American Intelligence better at what it does.

IARPA created a forecasting tournament in which five scientific teams led by top researches in the field would compete to generate accurate forecasts with the GJP being one of those five teams. In year 1, GJP beat the official control group by 60% and by 78% in year 2 with GJP beating all other teams by a significant margin (GJP even beat analysts with access to confidential information) with these results proving that some people are able to accurately predict the future.

Paul Meehl and Statistical Judgment

In 1954, a psychologit Paul Meehl wrote a small book titled “Clinical versus statistical prediction: A theoretical analysis and review of the evidence” which reviewed twenty studies showing that well informed experts predictions were not as good as simple algorithms with further research supporting this claim.

Tetlock describes how the conclusions from this study are hard to implement due to most problems not being easily transformed into algorithms.

Given that Algorithms are quick and cheap, unlike subjective judgement, a tie supports using the algorithm. The point is now indisputable when you have a well-validated statistical algorithm, use it. This insight was never a threat to the reign of subjective judgment because we so rarely have well-validated algorithms for the problem at hand.

System 1 and System 2

The common model for how we think is now made up of two systems — one primal and the other executive.

System 2 is the familiar realm of conscious thought. It consists of everything we chose to focus on. By contract, System 1 is largely a stranger to us. It is the realm of automatic perceptual and cognitive operations…We have no awareness of these rapid-fire processes but we cold not function without them. We would shut down.

Due to the fact that System 2 thinking takes up resources, we will normally let System 1 make a decision which is then reviewed by System 2. Whether System 2 will get involved is another matter. The common Bat and the Ball riddle proves that most people’s System 2s do not commonly review the decisions made by System 1.

A defining feature of System 1 judgment is its insensitivity to the quality of the evidence on which judgment is based. To avoid time, “it must treat the available evidence as reliable and sufficient. These tacit assumptions are so vital to System 1 that Kahneman gave them an ungainly but oddly memorable label: WYSIATI (What You See Is All There Is)”.

Measuring Forecast Accuracy

The GJP used Brier scores in order to measure the accuracy of forecasts — they are like gold scores with lower being better (0 is perfect and 2 is terrible). Brier scores are based around two central ideas:

Calibration-If you predict something with a 70% accuracy then it will happen 70% of the time would mean you are perfectly calibrated.

Resolution-When you give decisive predictions i.e. this has a 10% chance of happening | this has a 95% chance of happening vs placing everything within a 40–60% range of likelihood. The closer to the ends of the scale for predictions, the higher the resolution score. It is possible to be perfectly calibrated but a poor resoluter (fence sitter) through to a perfect resoluter (only using the extreme values correctly).

Expert Political Judgements — The Special Ones

Although generally damning of experts, the EPJ study did highlight a small number of professionals who were able to outperform monkeys.

So what made these professionals different?

The fox knows many things but the hedgehog knows one big thing’…I dubbed the Big Idea experts “hedgehogs” and the more eclectic experts “foxes”

It wasn’t whether they had PhDs or access to classified information. Nor was it what they thought — whether they were liberals or conservatives, optimists or pessimists. The critical factor was how they thought.

One group tended to organise their thinking around Big Ideas, although they didn’t agree on which Big Ideas were true or false. Some were environmental doomsters others were cornucopian boomsters…As ideologically diverse as they were, they were united by the fact that their thinking was so ideological. They sought to squeeze complex problems into the preferred cause-effect templates and treated what did not fit as irrelevant distractions…Committed to their conclusions, they were reluctant to change their minds even when their predictions clearly failed. They would tell us, ‘just wait’.

The other group consisted of more pragmatic experts who drew on many analytical tools. These experts gathered as much information from as many sources as they could…And while no one likes to say ‘I was wrong,’ these experts more readily admitted it and changed their minds.

Growth vs Fixed Mindset

The psychologist Carol Dweck developed a mindset model to try and understand what motives people

In a fixed mindset students believe their basic abilities, their intelligence, their talents, are just fixed traits. They have a certain amount and that’s that, and then their goal becomes to look smart all the time and never look dumb. In a growth mindset students understand that their talents and abilities can be developed through effort, good teaching and persistence. They don’t necessarily think everyone’s the same or anyone can be Einstein, but they believe everyone can get smarter if they work at it — Carol Dweck

Those with a growth mindset will respond better to failure as they will have the belief that they can grow from it. In one of her experiments, Dweck gave easy puzzles to fifth graders and then harder puzzles. Some kids loved the harder puzzles while others steered away from them. According to Dweck, the kids with a fixed mindset did not want to try harder puzzles which they could get wrong and those with a growth mindset did. In another experiment, Dweck scanned the brains of volunteers as they answered hard questions, then were told whether their answers were right or wrong. The scans revealed that volunteers with a fixed mindset were fully engaged when they were told their results but cared for little else while those with a growth mindset were interested in understanding why there were wrong.

The Leader’s Dilemma

Leaders must be confident and decisive with a vision while a good leader should also be a good forecaster. However, to be a good forecaster one must be self-critical and uncertain. How can you be decisive while changing your thinking based on new information?

Leaders must be forecasters and leaders but it seems that what is required to success at one role may undermine the other.

Tetlock describes how the key to approach leadership and organisation in the face of such a dilemma is to model it on a method first articulated by the 19th century Prussian general.

The Prussian general Moltke was esteemed having won many hard-fought wars. Moltke emphasises how in war it is impossible to lay down binding rules with soldiers needing to be able to quickly pivot and improvise based on on-the-ground information.

“All this may sound like a recipe for a fractious organisation that can’t get anything done, but that danger was avoided by balancing those elements that promoted independent thinking against those that demanded action.”

The Wehrmacht also drew a sharp line between deliberation and implementation: once a decision has been made, the mindset changes. Forget uncertainty and complexity. Act!

“Once a course of action has been initiated it must be abandoned without overriding reason,” the Wehrmacht manual stated. “In the changing situations of combat, however, inflexibility clinging to a course of action can lead to failure. The art of leadership consists of the timely recognition of circumstances and of the moment when a new decision is required.”

What ties all of this together — from “nothing is certain” to “unwavering determination” — is the command principle of Auftragstaktik.Usually translated today as “mission command,” the basic idea is simple. “War cannot be conducted from the green table,” Moltke wrote “Frequent and rapid decisions can be shaped on the spot according to estimates of local conditions.”…Auftragstaktik blended strategic coherence and decentralised decision making with a simple principle: commanders were to tell subordinates what their goal is but not how to achieve it.

--

--