Soon Morey noticed something else: A scout watching a player tended to form a near-instant impression, around which all other data tended to organize itself. “Confirmation bias,” he’d heard this called. The human mind was just bad at seeing things it did not expect to see, and a bit too eager to see what it expected to see. “Confirmation bias is the most insidious because you don’t even realize it is happening,” he said. A scout would settle on an opinion about a player and then arrange the evidence to support that opinion. “The classic thing,” said Morey, “and this happens all the time with guys: If you don’t like a prospect, you say he has no position. If you like him, you say he’s multipositional.
The objective measurement of Jeremy Lin didn’t square with what the experts saw when they watched him play: a not terribly athletic Asian kid. Morey hadn’t completely trusted his model—and so had chickened out and not drafted Lin. A year after the Houston Rockets failed to draft Jeremy Lin, they began to measure the speed of a player’s first two steps: Jeremy Lin had the quickest first move of any player measured. He was explosive and was able to change direction far more quickly than most NBA players. “He’s incredibly athletic,” said Morey. “But the reality is that every fucking person, including me, thought he was unathletic. And I can’t think of any reason for it other than he was Asian.”
They stopped and analyzed the situation more closely: The expected value of the draft pick exceeded, by a large margin, the value they placed on the player they’d be giving up for it. The mere fact that they owned Kyle Lowry appeared to have distorted their judgment about him.** Looking back over the previous five years, they now saw that they’d systematically overvalued their own players whenever another team tried to trade for them. Especially when offered the chance to trade one of their NBA players for another team’s draft picks, they’d refused deals they should have done. Why? They hadn’t done it consciously. Morey thus became aware of what behavioral economists had labeled “the endowment effect.” To combat the endowment effect, he forced his scouts and his model to establish, going into the draft, the draft pick value of each of their own players. The next season, before the trade deadline, Morey got up before his staff and listed on a whiteboard all the biases he feared might distort their judgment: the endowment effect, confirmation bias, and others. There was what people called “present bias”—the tendency, when making a decision, to undervalue the future in relation to the present. There was “hindsight bias”—which he thought of as the tendency for people to look at some outcome and assume it was predictable all along.
“Amos thought people paid an enormous price to avoid mild embarrassment,” said his friend Avishai Margalit, “and he himself decided very early on it was not worth it.”
Economists assumed that people were “rational.” What did they mean by that? At the very least, they meant that people could figure out what they wanted.
Amos turned to the person seated next to him and said, “Forever and forever, farewell, John Milholland / If we do meet again, why, we shall smile / If not, why then, this parting was well made”: lines spoken by Brutus to Cassius in act 5, scene 1, of Julius Caesar. He aced the test.
He’d noticed that the instructors believed that, in teaching men to fly jets, criticism was more useful than praise. They’d explained to Danny that he only needed to see what happened after they praised a pilot for having performed especially well, or criticized him for performing especially badly. The pilot who was praised always performed worse the next time out, and the pilot who was criticized always performed better. Danny watched for a bit and then explained to them what was actually going on: The pilot who was praised because he had flown exceptionally well, like the pilot who was chastised after he had flown exceptionally badly, simply were regressing to the mean. They’d have tended to perform better (or worse) even if the teacher had said nothing at all. An illusion of the mind tricked teachers—and probably many others—into thinking that their words were less effective when they gave pleasure than when they gave pain. Statistics wasn’t just boring numbers; it contained ideas that allowed you to glimpse deep truths about human life.
Suspicious of psychoanalysis (“I always thought it was a lot of mumbo jumbo”), he nevertheless in later years accepted an invitation from the American psychoanalyst David Rapaport to spend a summer at the Austen Riggs Center in Stockbridge, Massachusetts. Each Friday morning the Austen Riggs psychoanalysts—some of the biggest names in the field—would gather to discuss a patient whom they had spent a month observing. All these experts would have by then written up their reports on the patient. After delivering their diagnoses, they would bring in the patient for an interview. One week Danny watched the psychoanalysts discuss a patient, a young woman. The night before they were meant to interview her, she committed suicide. None of the psychoanalysts—world experts who had spent a month studying the woman’s mental state—had worried that she might kill herself. None of their reports so much as hinted at the risk of suicide. “Now they all agreed, how could we have missed it?” Danny recalled. “The signs were all there! It made so much sense to them after the fact. And so little sense before the fact.” Any faint interest Danny might have had in psychoanalysis vanished. “I was aware at the time that this was very instructive,” he said. Not about the troubled patients but about the psychoanalysts—or anyone else who was in a position to revise his forecast about the outcome of some uncertain event once he had knowledge of that outcome.
Blum was busy testing how powerful emotional states changed the way people handled various mental tasks. To do this he needed to induce in his subjects powerful emotional states. He did so with hypnosis. He’d first ask people to describe in detail some horrible life experience. He’d then give them a trigger to associate with the event—say, a card that read “A100.” Then he’d hypnotize them, show them the card—and, sure enough, they’d instantly start to relive their horrible experience. Then he’d see how they performed some taxing mental task: say, repeating a string of numbers. “It was weird, and I did not take to it,” said Danny—though he did learn how to hypnotize people. “I ran some sessions with our best subject—a tall, thin guy whose eyes would bulge and his face redden as he was shown the A100 card that instructed him to have the worst emotional experience of his life for a few seconds.” Once again, it wasn’t long before Danny found himself undermining the validity of the entire enterprise. “One day I asked, ‘How about we give them a choice between that and a mild electric shock?’” he recalled. He figured that anyone given a choice between reliving the worst experience of his life and mild electric shock would choose the shock. But what if none of the patients wanted the shock—what if they all said they’d much rather relive the worst experience of their lives? “Blum was horrified, because he wouldn’t hurt a fly,” said Danny. “And that’s when I realized that it was a stupid game. That it cannot be the worst experience of their lives. Somebody is faking. And so I got out of that field.”
“Danny says no. He tells us about ‘The Magical Number Seven.’” “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information” was a paper, written by Harvard psychologist George Miller, which showed that people had the ability to hold in their short-term memory seven items, more or less. Any attempt to get them to hold more was futile. Miller half-jokingly suggested that the seven deadly sins, the seven seas, the seven days of the week, the seven primary colors, the seven wonders of the world, and several other famous sevens had their origins in this mental truth.
In the case of two bags known to be 75 percent-25 percent majority red or white, the odds that you are holding the bag containing mostly red chips rise by three times every time you draw a red chip, and are divided by three every time you draw a white chip. If the first chip you draw is red, there is a 3:1 (or 75 percent) chance that the bag you are holding is majority red. If the second chip you draw is also red, the odds rise to 9:1, or 90 percent. If the third chip you draw is white, they fall back to 3:1. And so on. The bigger the base rate—the known ratio of red to white chips—the faster the odds shift around. If the first three chips you draw are red, from a bag in which 75 percent of the chips are known to be either red or white, there’s a 27:1, or slightly greater than 96 percent, chance you are holding the bag filled with mostly red chips.
Amos presented research done in Ward Edwards’s lab that showed that when people draw a red chip from the bag, they do indeed judge the bag to be more likely to contain mostly red chips. If the first three chips they withdrew from a bag were red, for instance, they put the odds at 3:1 that the bag contained a majority of red chips. The true, Bayesian odds were 27:1. People shifted the odds in the right direction, in other words; they just didn’t shift them dramatically enough. Ward Edwards had coined a phrase to describe how human beings responded to new information. They were “conservative Bayesians.” That is, they behaved more or less as if they knew Bayes’s rule. Of course, no one actually thought that Bayes’s formula was grinding away in people’s heads.
In Danny’s view, people were not conservative Bayesians. They were not statisticians of any kind. They often leapt from little information to big conclusions. The theory of the mind as some kind of statistician was of course just a metaphor. But the metaphor, to Danny, felt wrong. “I knew I was a lousy intuitive statistician,” he said. “And I really didn’t think I was stupider than anyone else.”
People who proved to be expert book bag pickers might still stumble when faced with judgments in which the probabilities were far more difficult to know—say, whether some foreign dictator did, or did not, possess weapons of mass destruction. Danny thought, this is what happens when people become attached to a theory. They fit the evidence to the theory rather than the theory to the evidence. They cease to see what’s right under their nose.
The best working theory in social science just then was that people were rational—or, at the very least, decent intuitive statisticians. They were good at interpreting new information, and at judging probabilities. They of course made mistakes, but their mistakes were a product of emotions, and the emotions were random, and so could be safely ignored. But that day something shifted inside Amos. He left Danny’s seminar in a state of mind unusual for him: doubt. After the seminar, he treated theories that he had more or less accepted as sound and plausible as objects of suspicion.
If, since his return to Israel, there had indeed been a growing pressure along some fault line inside Amos’s mind, the encounter with Danny had triggered the earthquake. Not long afterward, he bumped into Avishai Margalit. “I’m waiting in this corridor,” said Margalit. “And Amos comes to me, agitated, really. He started by dragging me into a room. He said, You won’t believe what happened to me. He tells me that he had given this talk and Danny had said, Brilliant talk, but I don’t believe a word of it. Something was really bothering him, and so I pressed him. He said, ‘It cannot be that judgment does not connect with perception. Thinking is not a separate act.’” The new studies being made about how people’s minds worked when rendering dispassionate judgments had ignored what was known about how the mind worked when it was doing other things. “What happened to Amos was serious,” said Danny. “He had a commitment to a view of the world in which Ward Edwards’s research made sense, and that afternoon he saw the appeal of another worldview in which that research looked silly.”
The test they administered to psychologists confirmed that suspicion. When seeking to determine if the bag they held contained mostly red chips, psychologists were inclined to draw, from very few chips, broad conclusions. In their search for scientific truth, they were relying far more than they knew on chance. What’s more, because they had so much faith in the power of small samples, they tended to rationalize whatever they found in them. The test Amos and Danny had created asked the psychologists how they would advise a student who was testing a psychological theory—say, that people with long noses are more likely to lie. What should the student do if his theory tests as true on one sample of humanity but as false on another? The question Danny and Amos put to the professional psychologists was multiple-choice. Three of the choices involved telling the student either to increase his sample size or, at the very least, to be more circumspect about his theory. Overwhelmingly, the psychologists had plunked for the fourth option, which read: “He should try to find an explanation for the differences between the two groups.” That is, he should seek to rationalize why in one group people with long noses are more likely to lie, while in the other they are not. The psychologists had so much faith in small samples that they assumed that whatever had been learned from either group must be generally true, even if one lesson seemed to contradict the other. The experimental psychologist “rarely attributes a deviation of results from expectations to sampling variability because he finds a causal ‘explanation’ for any discrepancy,” wrote Danny and Amos. “Thus, he has little opportunity to recognize sampling variation in action. His belief in the law of small numbers, therefore, will forever remain intact.”
The Oregon researchers began by creating, as a starting point, a very simple algorithm, in which the likelihood that an ulcer was malignant depended on the seven factors the doctors had mentioned, equally weighted. The researchers then asked the doctors to judge the probability of cancer in ninety-six different individual stomach ulcers, on a seven-point scale from “definitely malignant” to “definitely benign.” Without telling the doctors what they were up to, they showed them each ulcer twice, mixing up the duplicates randomly in the pile so the doctors wouldn’t notice they were being asked to diagnose the exact same ulcer they had already diagnosed. The researchers didn’t have a computer. They transferred all of their data onto punch cards, which they mailed to UCLA, where the data was analyzed by the university’s big computer. The researchers’ goal was to see if they could create an algorithm that would mimic the decision making of doctors.
But then UCLA sent back the analyzed data, and the story became unsettling. (Goldberg described the results as “generally terrifying.”) In the first place, the simple model that the researchers had created as their starting point for understanding how doctors rendered their diagnoses proved to be extremely good at predicting the doctors’ diagnoses. The doctors might want to believe that their thought processes were subtle and complicated, but a simple model captured these perfectly well. That did not mean that their thinking was necessarily simple, only that it could be captured by a simple model. More surprisingly, the doctors’ diagnoses were all over the map: The experts didn’t agree with each other. Even more surprisingly, when presented with duplicates of the same ulcer, every doctor had contradicted himself and rendered more than one diagnosis: These doctors apparently could not even agree with themselves. “These findings suggest that diagnostic agreement in clinical medicine may not be much greater than that found in clinical psychology—some food for thought during your next visit to the family doctor,” wrote Goldberg. If the doctors disagreed among themselves, they of course couldn’t all be right—and they weren’t. The researchers then repeated the experiment with clinical psychologists and psychiatrists, who gave them the list of factors they considered when deciding whether it was safe to release a patient from a psychiatric hospital. Once again, the experts were all over the map. Even more bizarrely, those with the least training (graduate students) were just as accurate as the fully trained ones (paid pros) in their predictions about what any given psychiatric patient would get up to if you let him out the door. Experience appeared to be of little value in judging, say, whether a person was at risk of committing suicide.
At which point one of Goldberg’s fellow Oregon researchers—Goldberg doesn’t recall which one—made a radical suggestion. “Someone said, ‘One of these models you built [to predict what the doctors were doing] might actually be better than the doctor,’” recalled Goldberg. “I thought, Oh, Christ, you idiot, how could that possibly be true?” How could their simple model be better at, say, diagnosing cancer than a doctor? The model had been created, in effect, by the doctors. The doctors had given the researchers all the information in it. The Oregon researchers went and tested the hypothesis anyway. It turned out to be true. If you wanted to know whether you had cancer or not, you were better off using the algorithm that the researchers had created than you were asking the radiologist to study the X-ray. The simple algorithm had outperformed not merely the group of doctors; it had outperformed even the single best doctor. You could beat the doctor by replacing him with an equation created by people who knew nothing about medicine and had simply asked a few questions of doctors.
The model captured their theory of how to best diagnose an ulcer. But in practice they did not abide by their own ideas of how to best diagnose an ulcer. As a result, they were beaten by their own model. The implications were vast. “If these findings can be generalized to other sorts of judgmental problems,” Goldberg wrote, “it would appear that only rarely—if at all—will the utilities favor the continued employment of man over a model of man.” But how could that be? Why would the judgment of an expert—a medical doctor, no less—be inferior to a model crafted from that very expert’s own knowledge? At that point, Goldberg more or less threw up his hands and said, Well, even experts are human. “The clinician is not a machine,” he wrote. “While he possesses his full share of human learning and hypothesis-generating skills, he lacks the machine’s reliability.
When they sat down to write they nearly merged, physically, into a single form, in a way that the few people who happened to catch a glimpse of them found odd. “They wrote together sitting right next to each other at the typewriter,” recalls Michigan psychologist Richard Nisbett. “I cannot imagine. It would be like having someone else brush my teeth for me.” The way Danny put it was, “We were sharing a mind.”
Their first paper—which they still half-thought of as a joke played on the academic world—had shown that people faced with a problem that had a statistically correct answer did not think like statisticians. Even statisticians did not think like statisticians. “Belief in the Law of Small Numbers” had raised an obvious next question: If people did not use statistical reasoning, even when faced with a problem that could be solved with statistical reasoning, what kind of reasoning did they use? If they did not think, in life’s many chancy situations, like a card counter at a blackjack table, how did they think?
“The decisions we make, the conclusions we reach, and the explanations we offer are usually based on our judgments of the likelihood of uncertain events such as success in a new job, the outcome of an election, or the state of a market.” In these and many other uncertain situations, the mind did not naturally calculate the correct odds. So what did it do? The answer they now offered: It replaced the laws of chance with rules of thumb. These rules of thumb Danny and Amos called “heuristics.” And the first heuristic they wanted to explore they called “representativeness.” When people make judgments, they argued, they compare whatever they are judging to some model in their minds. How much do those clouds resemble my mental model of an approaching storm? How closely does this ulcer resemble my mental model of a malignant cancer? Does Jeremy Lin match my mental picture of a future NBA player? Does that belligerent German political leader resemble my idea of a man capable of orchestrating genocide? The world’s not just a stage. It’s a casino, and our lives are games of chance.
For instance, in families with six children, the birth order B G B B B B was about as likely as G B G B B G. But Israeli kids—like pretty much everyone else on the planet, it would emerge—naturally seemed to believe that G B G B B G was a more likely birth sequence. Why? “The sequence with five boys and one girl fails to reflect the proportion of boys and girls in the population,” they explained. It was less representative. What is more, if you asked the same Israeli kids to choose the more likely birth order in families with six children—B B B G G G or G B B G B G—they overwhelmingly opted for the latter. But the two birth orders are equally likely. So why did people almost universally believe that one was far more likely than the other? Because, said Danny and Amos, people thought of birth order as a random process, and the second sequence looks more “random” than the first.
Londoners in the Second World War thought that German bombs were targeted, because some parts of the city were hit repeatedly while others were not hit at all. (Statisticians later showed that the distribution was exactly what you would expect from random bombing.) People find it a remarkable coincidence when two students in the same classroom share a birthday, when in fact there is a better than even chance, in any group of twenty-three people, that two of its members will have been born on the same day. We have a kind of stereotype of “randomness” that differs from true randomness.
A suggestion arose from Danny and Amos’s paper: If our minds can be misled by our false stereotype of something as measurable as randomness, how much might they be misled by other, vaguer stereotypes? The average heights of adult males and females in the U.S. are, respectively, 5 ft. 10 in. and 5 ft. 4 in. Both distributions are approximately normal with a standard deviation of about 2.5 in.§ An investigator has selected one population by chance and has drawn from it a random sample. What do you think the odds are that he has selected the male population if 1. The sample consists of a single person whose height is 5 ft. 10 in.? 2. The sample consists of 6 persons whose average height is 5 ft. 8 in.? The odds most commonly assigned by their subjects were, in the first case, 8:1 in favor and, in the second case, 2.5:1 in favor. The correct odds were 16:1 in favor in the first case, and 29:1 in favor in the second case. The sample of six people gave you a lot more information than the sample of one person. And yet people believed, incorrectly, that if they picked a single person who was five foot ten, they were more likely to have picked from the population of men than had they picked six people with an average height of five foot eight. People didn’t just miscalculate the true odds of a situation: They treated the less likely proposition as if it were the more likely one. And they did this, Amos and Danny surmised, because they saw “5 ft. 10 in.” and thought: That’s the typical guy! The stereotype of the man blinded them to the likelihood that they were in the presence of a tall woman.
A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50 percent of all babies are boys. The exact percentage of baby boys, however, varies from day to day. Sometimes it may be higher than 50 percent, sometimes lower. For a period of 1 year, each hospital recorded the days on which more than 60 percent of the babies born were boys. Which hospital do you think recorded more such days? Check one: — The larger hospital — The smaller hospital — About the same (that is, within 5 percent of each other) People got that one wrong, too. Their… Some highlights have been hidden or truncated due to export limits.
They confessed that they had confined their questions to situations in which the odds could be objectively calculated. But they felt fairly certain that people made the same mistakes when the odds were harder, or even impossible, to know. When, say, they guessed what a little boy would do for a living when he grew up, they thought in stereotypes. If he matched their mental picture of a scientist, they guessed he’d be a scientist—and neglect the prior odds of any kid becoming a scientist.
Danny and Amos had their first big general idea—the mind had these mechanisms for making judgments and decisions that were usually useful but also capable of generating serious error. The next paper they produced inside the Oregon Research Institute described a second mechanism, an idea that had come to them just a couple of weeks after the first. “It wasn’t all representativeness,” said Danny. “There was something else going on. It wasn’t just similarity.” The new paper’s title was once again more mystifying than helpful: “Availability: A Heuristic for Judging Frequency and Probability.”
Consider the letter K Is K more likely to appear in ____the first position? ____the third position? (check one) My estimate for the ratio of these two values is:________:1 If you thought that K was, say, twice as likely to appear as the first letter of an English word than as the third letter, you checked the first box and wrote your estimate as 2:1. This was what the typical person did, as it happens. Danny and Amos replicated the demonstration with other letters—R, L, N, and V. Those letters all appeared more frequently as the third letter in an English word than as the first letter—by a ratio of two to one. Once again, people’s judgment was, systematically, very wrong. And it was wrong, Danny and Amos now proposed, because it was distorted by memory. It was simply easier to recall words that start with K than to recall words with K as their third letter. The more easily people can call some scenario to mind—the more available it is to them—the more probable they find it to be.
Danny and Amos had noticed how oddly, and often unreliably, their own minds recalculated the odds, in light of some recent or memorable experience. For instance, after they drove past a gruesome car crash on the highway, they slowed down: Their sense of the odds of being in a crash had changed. After seeing a movie that dramatizes nuclear war, they worried more about nuclear war; indeed, they felt that it was more likely to happen. The sheer volatility of people’s judgment of the odds—their sense of the odds could be changed by two hours in a movie theater—told you something about the reliability of the mechanism that judged those odds.
They read lists of people’s names to Oregon students, for instance. Thirty-nine names, read at a rate of two seconds per name. The names were all easily identifiable as male or female. A few were the names of famous people—Elisabeth Taylor, Richard Nixon. A few were names of slightly less famous people—Lana Turner, William Fulbright. One list consisted of nineteen male names and twenty female names, the other of twenty female names and nineteen male names. The list that had more female names on it had more names of famous men, and the list that had more male names on it contained the names of more famous women. The unsuspecting Oregon students, having listened to a list, were then asked to judge if it contained the names of more men or more women. They almost always got it backward: If the list had more male names on it, but the women’s names were famous, they thought the list contained more female names, and vice versa. “Each of the problems had an objectively correct answer,” Amos and Danny wrote, after they were done with their strange mini-experiments. “This is not the case in many real-life situations where probabilities are judged.
But if you presented people with situations in which the evidence they needed to judge them accurately was hard for them to retrieve from their memories, and misleading evidence came easily to mind, they made mistakes. “Consequently,” Amos and Danny wrote, “the use of the availability heuristic leads to systematic biases.” Human judgment was distorted by . . . the memorable.
“The conditionality heuristic,” they called one of these. In judging the degree of uncertainty in any situation, they noted, people made “unstated assumptions.” “In assessing the profit of a given company, for example, people tend to assume normal operating conditions and make their estimates contingent upon that assumption,” they wrote in their notes. “They do not incorporate into their estimates the possibility that these conditions may be drastically changed because of a war, sabotage, depressions, or a major competitor being forced out of business.” Here, clearly, was another source of error: not just that people don’t know what they don’t know, but that they don’t bother to factor their ignorance into their judgments.
Another possible heuristic they called “anchoring and adjustment.” They first dramatized its effects by giving a bunch of high school students five seconds to guess the answer to a math question. The first group was asked to estimate this product: 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1 The second group to estimate this product: 1 × 2 × 3 × 4 × 5 × 6 × 7 × 8 Five seconds wasn’t long enough to actually do the math: The kids had to guess. The two groups’ answers should have been at least roughly the same, but they weren’t, even roughly. The first group’s median answer was 2,250. The second group’s median answer was 512. (The right answer is 40,320.) The reason the kids in the first group guessed a higher number for the first sequence was that they had used 8 as a starting point, while the kids in the second group had used 1.
People could be anchored with information that was totally irrelevant to the problem they were being asked to solve. For instance, Danny and Amos asked their subjects to spin a wheel of fortune with slots on it that were numbered 0 through 100. Then they asked the subjects to estimate the percentage of African countries in the United Nations. The people who spun a higher number on the wheel tended to guess that a higher percentage of the United Nations consisted of African countries than did those for whom the needle landed on a lower number. What was going on here? Was anchoring a heuristic, the way that representativeness and availability were heuristics? Was it a shortcut that people used, in effect, to answer to their own satisfaction a question to which they could not divine the true answer? Amos thought it was; Danny thought it wasn’t.
The imagination appeared to be governed by rules. The rules confined people’s thinking. It’s far easier for a Jew living in Paris in 1939 to construct a story about how the German army will behave much as it had in 1919, for instance, than to invent a story in which it behaves as it did in 1941, no matter how persuasive the evidence might be that, this time, things are different.
Here was a useful way of thinking about base rates: They were what you would predict if you had no information at all.
In the end, Danny created another character. This one he named “Dick”: Dick is a 30 year old man. He is married with no children. A man of high ability and high motivation, he promises to be quite successful in his field. He is well liked by his colleagues. Then they ran another experiment. It was a version of the book bag and poker chips experiment that Amos and Danny had argued about in Danny’s seminar at Hebrew University. They told their subjects that they had picked a person from a pool of 100 people, 70 of whom were engineers and 30 of whom were lawyers. Then they asked them: What is the likelihood that the selected person is a lawyer? The subjects correctly judged it to be 30 percent. And if you told them that you were doing the same thing, but from a pool that had 70 lawyers in it and 30 engineers, they said, correctly, that there was a 70 percent chance the person you’d plucked from it was a lawyer. But if you told them you had picked not just some nameless person but a guy named Dick, and read them Danny’s description of Dick—which contained no information whatsoever to help you guess what Dick did for a living—they guessed there was an equal chance that Dick was a lawyer or an engineer, no matter which pool he had emerged from. “Evidently, people respond differently when given no specific evidence and when given worthless evidence,” wrote Danny and Amos. “When no specific evidence is given, the prior probabilities are properly utilized; when worthless specific evidence is given, prior probabilities are ignored.”
And in the end it returned to the problem that had interested Danny since he had first signed on to help the Israeli army rethink how it selected and trained incoming recruits: The instructors in a flight school adopted a policy of consistent positive reinforcement recommended by psychologists. They verbally reinforced each successful execution of a flight maneuver. After some experience with this training approach, the instructors claimed that contrary to psychological doctrine, high praise for good execution of complex maneuvers typically results in a decrement of performance on the next try. What should the psychologist say in response? The subjects to whom they posed this question offered all sorts of advice. They surmised that the instructors’ praise didn’t work because it led the pilots to become overconfident. They suggested that the instructors didn’t know what they were talking about. No one saw what Danny saw: that the pilots would have tended to do better after an especially poor maneuver, or worse after an especially great one, if no one had said anything at all. Man’s inability to see the power of regression to the mean leaves him blind to the nature of the world around him. We are exposed to a lifetime schedule in which we are most often rewarded for punishing others, and punished for rewarding.
When Richard Nixon announced his surprising intention to visit China and Russia, Fischhoff asked people to assign odds to a list of possible outcomes—say, that Nixon would meet Chairman Mao at least once, that the United States and the Soviet Union would create a joint space program, that a group of Soviet Jews would be arrested for attempting to speak with Nixon, and so on. After the trip, Fischhoff went back and asked the same people to recall the odds they had assigned to each outcome. Their memories of the odds they had assigned to various outcomes were badly distorted. They all believed that they had assigned higher probabilities to what happened than they actually had. They greatly overestimated the odds that they had assigned to what had actually happened. That is, once they knew the outcome, they thought it had been far more predictable than they had found it to be before, when they had tried to predict it. A few years after Amos described the work to his Buffalo audience, Fischhoff named the phenomenon “hindsight bias.”† In his talk to the historians, Amos described their occupational hazard: the tendency to take whatever facts they had observed (neglecting the many facts that they did not or could not observe) and make them fit neatly into a confident-sounding story: All too often, we find ourselves unable to predict what will happen; yet after the fact we explain what did happen with a great deal of confidence. This “ability” to explain that which we cannot predict, even in the absence of any additional information, represents an important, though subtle, flaw in our reasoning. It leads us to believe that there is a less uncertain world than there actually is, and that we are less bright than we actually might be. For if we can explain tomorrow what we cannot predict today, without any added information except the knowledge of the actual outcome, then this outcome must have been determined in advance and we should have been able to predict it. The fact that we couldn’t is taken as an indication of our limited intelligence rather than of the uncertainty that is in the world. All too often, we feel like kicking ourselves for failing to foresee that which later appears inevitable. For all we know, the handwriting might have been on the wall all along. The question is: was the ink visible? It wasn’t just sports announcers and political pundits who radically revised their narratives, or shifted focus, so that their stories seemed to fit whatever had just happened in a game or an election. Historians imposed false order upon random events, too, probably without even realizing what they were doing. Amos had a phrase for this. “Creeping determinism,” he called it—and jotted in his notes one of its many costs: “He who sees the past as surprise-free is bound to have a future full of surprises.”
But of all the bad things that happened to people in hospitals, the one that most preoccupied Redelmeier was clinical misjudgment. Doctors and nurses were human, too. They sometimes failed to see that the information patients offered them was unreliable—for instance, patients often said that they were feeling better, and might indeed believe themselves to be improving, when they had experienced no real change in their condition. Doctors tended to pay attention mainly to what they were asked to pay attention to, and to miss some bigger picture. They sometimes failed to notice what they were not directly assigned to notice. “One of the things Don taught me was the value of observing the room when the patient isn’t there,” says Jon Zipursky, chief of residents at Sunnybrook. “Look at their meal tray. Did they eat? Did they pack for a long stay or a short one? Is the room messy or neat? Once we walked into the room and the patient was sleeping. I was about to wake him up and Don stops me and says, There is a lot you can learn about people from just watching.” Doctors tended to see only what they were trained to see: That was another big reason bad things might happen to a patient inside a hospital. A patient received treatment for something that was obviously wrong with him, from a specialist oblivious to the possibility that some less obvious thing might also be wrong with him. The less obvious thing, on occasion, could kill a person.
But the dazed young woman who arrived in the Sunnybrook emergency room directly from her head-on car crash, with her many broken bones, presented her surgeons, as they treated her, with a disturbing problem. The rhythm of her heartbeat had become wildly irregular. It was either skipping beats or adding extra beats; in any case, she had more than one thing seriously wrong with her. Immediately after the trauma center staff called Redelmeier to come to the operating room, they diagnosed the heart problem on their own—or thought they had. The young woman remained alert enough to tell them that she had a past history of an overactive thyroid. An overactive thyroid can cause an irregular heartbeat. And so, when Redelmeier arrived, the staff no longer needed him to investigate the source of the irregular heartbeat but to treat it. No one in the operating room would have batted an eye if Redelmeier had simply administered the drugs for hyperthyroidism. Instead, Redelmeier asked everyone to slow down. To wait. Just a moment. Just to check their thinking—and to make sure they were not trying to force the facts into an easy, coherent, but ultimately false story. Something bothered him. As he said later, “Hyperthyroidism is a classic cause of an irregular heart rhythm, but hyperthyroidism is an infrequent cause of an irregular heart rhythm.” Hearing that the young woman had a history of excess thyroid hormone production, the emergency room medical staff had leaped, with seeming reason, to the assumption that her overactive thyroid had caused the dangerous beating of her heart. They hadn’t bothered to consider statistically far more likely causes of an irregular heartbeat. In Redelmeier’s experience, doctors did not think statistically. “Eighty percent of doctors don’t think probabilities apply to their patients,” he said. “Just like 95 percent of married couples don’t believe the 50 percent divorce rate applies to them, and 95 percent of drunk drivers don’t think the statistics that show that you are more likely to be killed if you are driving drunk than if you are driving sober applies to them.” Redelmeier asked the emergency room staff to search for other, more statistically likely causes of the woman’s irregular heartbeat. That’s when they found her collapsed lung. Like her fractured ribs, her collapsed lung had failed to turn up on the X-ray. Unlike the fractured ribs, it could kill her. Redelmeier ignored the thyroid and treated the collapsed lung. The young woman’s heartbeat returned to normal. The next day, her formal thyroid tests came back: Her thyroid hormone production was perfectly normal. Her thyroid never had been the issue.
It wasn’t that what first came to mind was always wrong; it was that its existence in your mind led you to feel more certain than you should be that it was correct. “Beware of the delirious guy in the emergency unit with the long history of alcoholism,” said Redelmeier, “because you will say, ‘He’s just drunk,’ and you’ll miss the subdural hematoma.” The woman’s surgeons had leapt from her medical history to a diagnosis without considering the base rates. As Kahneman and Tversky long ago had pointed out, a person who is making a prediction—or a diagnosis—is allowed to ignore base rates only if he is completely certain he is… Some highlights have been hidden or truncated due to export limits.
Whenever a patient recovered, for instance, the doctor typically attributed the recovery to the treatment he had prescribed, without any solid evidence that the treatment was responsible. Just because the patient is better after I treated him doesn’t mean he got better because I treated him, Redelmeier thought. “So many diseases are self-limiting,” he said. “They will cure themselves. People who are in distress seek care. When they seek care, physicians feel the need to do something. You put leeches on; the condition improves. And that can propel a lifetime of leeches. A lifetime of overprescribing antibiotics. A lifetime of giving tonsillectomies to people with ear infections. You try it and they get better the next day and it is so compelling. You go to see a psychiatrist and your depression improves—you are convinced of the efficacy of psychiatry.”
Hal Sox happened to have coauthored the first article Amos ever wrote about medicine. Their paper had sprung from a question Amos had put to Sox: How did a tendency people exhibited when faced with financial gambles play itself out in the minds of doctors and patients? Specifically, given a choice between a sure gain and a bet with the same expected value (say, $100 for sure or a 50-50 shot at winning $200), Amos had explained to Hal Sox, people tended to take the sure thing. A bird in the hand. But, given the choice between a sure loss of $100 and a 50-50 shot of losing $200, they took the risk. With Amos’s help, Sox and two other medical researchers designed experiments to show how differently both doctors and patients made choices when those choices were framed in terms of losses rather than gains.
Lung cancer proved to be a handy example. Lung cancer doctors and patients in the early 1980s faced two unequally unpleasant options: surgery or radiation. Surgery was more likely to extend your life, but, unlike radiation, it came with the small risk of instant death. When you told people that they had a 90 percent chance of surviving surgery, 82 percent of patients opted for surgery. But when you told them that they had a 10 percent chance of dying from the surgery—which was of course just a different way of putting the same odds—only 54 percent chose the surgery. People facing a life-and-death decision responded not to the odds but to the way the odds were described to them.
The Samuelson bet, for instance. The Samuelson bet was named for Paul Samuelson, the economist who had cooked it up. As Amos explained it, people offered a single bet in which they have a 50-50 chance either to win $150 or lose $100 usually decline it. But if you offer those same people the chance to make the same bet one hundred times over, most of them accept the bet. Why did they make the expected value calculation—and respond to the odds being in their favor—when they were allowed to make the bet a hundred times, but not when they are offered a single bet? The answer was not entirely obvious. Yes, the more times you play a game with the odds in your favor, the less likely you are to lose; but the more times you play, the greater the total sum of money you stood to lose.
The secret to doing good research is always to be a little underemployed. You waste years by not being able to waste hours.
People had incredible ability to see meaning in these patterns where none existed. Watch any NBA game, Amos explained to Redelmeier, and you saw that the announcers, the fans, and maybe even the coaches seemed to believe that basketball shooters had the “hot hand.” Simply because some player had made his last few shots, he was thought to be more likely to make his next shot. Amos had collected data on NBA shooting streaks to see if the so-called hot hand was statistically significant—he already could persuade you that it was not. A better shooter was of course more likely to make his next shot than a less able shooter, but the streaks observed by fans and announcers and the players themselves were illusions.
Basketball experts seized on random streaks as patterns in players’ shooting that didn’t exist. Arthritis sufferers found patterns in suffering that didn’t exist. “We attribute this phenomenon to selective matching,” Tversky and Redelmeier wrote.† “. . . For arthritis, selective matching leads people to look for changes in the weather when they experience increased pain, and pay little attention to the weather when their pain is stable. . . . [A] single day of severe pain and extreme weather might sustain a lifetime of belief in a relation between them.”
Danny had always been curious about people’s ability, or inability, to predict their feelings about their own experiences. Now he wanted to study it. Specifically, he wanted to explore the gap—he had sensed it in himself—between a person’s intuitions about what made him happy and what actually made him happy. He thought he might start by having people guess how happy it would make them to come into the lab every day for a week and do something that they said they enjoyed—eat a bowl of ice cream, say, or listen to their favorite song. He might then compare the pleasure they anticipated to the pleasure they experienced, and further compare the pleasure they experienced to the pleasure they remembered. There was clearly a difference to be explored, he argued. At the moment your favorite soccer team wins the World Cup, you are beyond elated; six months later, it means next to nothing to you, really.
What did it mean if people’s prediction of the misery that might be caused by some event was different from the misery they actually experienced when the event occurred, or if people’s memory of an experience turned out to be meaningfully different from the experience as it had actually played out? A lot, thought Danny. People had a miserable time for most of their vacation and then returned home and remembered it fondly; people enjoyed a wonderful romance but, because it ended badly, looked back on it mainly with bitterness. They didn’t simply experience fixed levels of happiness or unhappiness. They experienced one thing and remembered something else.
Funny things happened when you did this with people. Their memory of pain was different from their experience of it. They remembered moments of maximum pain, and they remembered, especially, how they felt the moment the pain ended. But they didn’t particularly remember the length of the painful experience. If you stuck people’s arms in ice buckets for three minutes but warmed the water just a bit for another minute or so before allowing them to flee the lab, they remembered the experience more fondly than if you stuck their arms in the bucket for three minutes and removed them at a moment of maximum misery. If you asked them to choose one experiment to repeat, they’d take the first session. That is, people preferred to endure more total pain so long as the experience ended on a more pleasant note.
Those who had been given the less unhappy ending remembered less pain than did the patients who had not. More interestingly, they proved more likely to return for another colonoscopy when the time came. Human beings who had never imagined that they might prefer more pain to less could nearly all be fooled into doing so. As Redelmeier put it, “Last impressions can be lasting impressions.”
Amos’s textbook defined risk aversion this way: “The more money one has, the less he values each additional increment, or, equivalently, that the utility of any additional dollar diminishes with an increase in capital.” You value the second thousand dollars you get your hands on a bit less than you do the first thousand, just as you value the third thousand a bit less than the second thousand. The marginal value of the dollars you give up to buy fire insurance on your house is less than the marginal value of the dollars you lose if your house burns down—which is why even though the insurance is, strictly speaking, a stupid bet, you buy it.
A rational person making a decision between risky propositions, for instance, shouldn’t violate the von Neumann and Morgenstern transitivity axiom: If he preferred A to B and B to C, then he should prefer A to C. Anyone who preferred A to B and B to C but then turned around and preferred C to A violated expected utility theory. Among the remaining rules, maybe the most critical—given what would come—was what von Neumann and Morgenstern called the “independence axiom.” This rule said that a choice between two gambles shouldn’t be changed by the introduction of some irrelevant alternative. For example: You walk into a deli to get a sandwich and the man behind the counter says he has only roast beef and turkey. You choose turkey. As he makes your sandwich he looks up and says, “Oh, yeah, I forgot I have ham.” And you say, “Oh, then I’ll take the roast beef.” Von Neumann and Morgenstern’s axiom said, in effect, that you can’t be considered rational if you switch from turkey to roast beef just because they found some ham in the back. And, really, who would switch? Like the other rules of rationality, the independence axiom seemed reasonable, and not obviously contradicted by the way human beings generally behaved.
Amos had always had an almost jungle instinct for the vulnerability of other people’s ideas. He of course knew that people made decisions that the theory would not have predicted. Amos himself had explored how people could be—as the theory assumed they were not—reliably “intransitive.” As a graduate student in Michigan, he had induced both Harvard undergraduates and convicted murderers in Michigan prisons, over and over again, to choose gamble A over gamble B, then choose gamble B over gamble C—and then turn around and choose C instead of A. That violated a rule of expected utility theory.
He asked his audience to imagine their choices in the following two situations (the dollar amounts used by Allais are here multiplied by ten to account for inflation and capture the feel of his original problem): Situation 1. You must choose between having: 1) $5 million for sure or this gamble 2) An 89 percent chance of winning $5 million A 10 percent chance of winning $25 million A 1 percent chance to win zero Most people who looked at that, apparently including many of the American economists in Allais’s audience, said, “Obviously, I’ll take door number 1, the $5 million for sure.” They preferred the certainty of being rich to the slim possibility of being even richer. To which Allais replied, “Okay, now consider this second situation.” Situation 2. You must choose between having: 3) An 11 percent chance of winning $5 million, with an 89 percent chance to win zero or 4) A 10 percent chance of winning $25 million, with a 90 percent chance to win zero Most everyone, including American economists, looked at this choice and said, “I’ll take number 4.” They preferred the slightly lower chance of winning a lot more money. There was nothing wrong with this; on the face of it, both choices felt perfectly sensible. The trouble, as Amos’s textbook explained, was that “this seemingly innocent pair of preferences is incompatible with utility theory.”
Danny wasn’t inclined to see the paradox as a problem of logic. It looked to him more like a quirk in human behavior. “I wanted to understand the psychology of what was going on,” he said. He sensed that Allais himself hadn’t given much thought to why people might choose in a way that violated the major theory of decision making. But to Danny the reason seemed obvious: regret. In the first situation people sensed that they would look back on their decision, if it turned out badly, and feel they had screwed up; in the second situation, not so much. Anyone who turned down a certain gift of $5 million would experience far more regret, if he wound up with nothing, than a person who turned down a gamble in which he stood a slight chance of winning $5 million. If people mostly chose option 1, it was because they sensed the special pain they would experience if they chose option 2 and won nothing. Avoiding that pain became a line item on the inner calculation of their expected utility. Regret was the ham in the back of the deli that caused people to switch from turkey to roast beef.
Decision theory had approached the seeming contradiction at the heart of the Allais paradox as a technical problem. Danny found that silly: There was no contradiction. There was just psychology. The understanding of any decision had to account not just for the financial consequences but for the emotional ones, too. “Obviously it is not regret itself that determines decisions—no more than the actual emotional response to consequences ever determines the prior choice of a course of action,” Danny wrote to Amos, in one of a series of memos on the subject. “It is the anticipation of regret that affects decisions, along with the anticipation of other consequences.”
When they made decisions, people did not seek to maximize utility. They sought to minimize regret. As the starting point for a new theory, it sounded promising.
People regretted what they had done, and what they wished they hadn’t done, far more than what they had not done and perhaps should have. “The pain that is experienced when the loss is caused by an act that modified the status quo is significantly greater than the pain that is experienced when the decision led to the retention of the status quo,” Danny wrote in a memo to Amos. “When one fails to take action that could have avoided a disaster, one does not accept responsibility for the occurrence of the disaster.”
They were uncovering, or thought they were uncovering, what amounted to the rules of regret. One rule was that the emotion was closely linked to the feeling of “coming close” and failing. The nearer you came to achieving a thing, the greater the regret you experienced if you failed to achieve it.† A second rule: Regret was closely linked to feelings of responsibility.
That was another rule of regret. It skewed any decision in which a person faced a choice between a sure thing and a gamble. This tendency was not merely of academic interest. Danny and Amos agreed that there was a real-world equivalent of a “sure thing”: the status quo. The status quo was what people assumed they would get if they failed to take action.
By testing how people choose between various sure gains and gains that were merely probable, they traced the contours of regret. Which of the following two gifts do you prefer? Gift A: A lottery ticket that offers a 50 percent chance of winning $1,000 Gift B: A certain $400 or Which of the following gifts do you prefer? Gift A: A lottery ticket that offers a 50 percent chance of winning $1 million Gift B: A certain $400,000 They collected great heaps of data: choices people had actually made. “Always keep one hand firmly on data,” Amos liked to say. Data was what set psychology apart from philosophy, and physics from metaphysics.
People felt greater pleasure going from 0 to $1 million than they felt going from $1 million to $2 million. Of course, expected utility theory also predicted that people would take a sure gain over a bet that offered an expected value of an even bigger gain. They were “risk averse.” But what was this thing that everyone had been calling “risk aversion?” It amounted to a fee that people paid, willingly, to avoid regret: a regret premium.
What puzzled Danny was what the theory had left out. “The smartest people in the world are measuring utility,” he recalled. “As I’m reading about it, something strikes me as really, really peculiar.” The theorists seemed to take it to mean “the utility of having money.” In their minds, it was linked to levels of wealth. More, because it was more, was always better. Less, because it was less, was always worse. This struck Danny as false. He created many scenarios to show just how false it was: Today Jack and Jill each have a wealth of 5 million. Yesterday, Jack had 1 million and Jill had 9 million. Are they equally happy? (Do they have the same utility?) Of course they weren’t equally happy. Jill was distraught and Jack was elated. Even if you took a million away from Jack and left him with less than Jill, he’d still be happier than she was. In people’s perceptions of money, as surely as in their perception of light and sound and the weather and everything else under the sun, what mattered was not the absolute levels but changes.