Wednesday 2 August 2017

Behavioural Economics: a review

Most of us have read Kaheman's Thinking, Fast and Slow or Thaler's Misbehaving or Nudge. These books all discuss the birth of behavioural economics, a discipline that marries economics with psychology, and which its adherents claim has supplanted neoclassical economics.

Yet contrary to the strong assertions made in these books, or by some of the discipline's fans, behavioural economics has not definitively dethroned traditional economics. Indeed, in spite of the discipline's popularity, it is still a small part of economics courses curricula. In this post, I review three main criticisms of the discipline that help explain why this is so: first, the criticisms leveled by behavioural economists against classical economics are often unfair; second, many of the experiments that gave birth to the discipline have failed replication attempts, or cannot be generalised from the lab to society at large; and third, the fact that neoclassical economics make for a better foundation for policy.

A. Behavioural economics vs Neoclassical Economics
Neoclassical economics refers to the attempt to model an economy based on three principles:

a) that people have rational preferences between outcomes (this basically means that any two alternative choices can be compared to each other, and that preferences are transitive, i.e. if a person prefers apples to bananas, and bananas to pears, then he also prefers apples to pears);

b) that individuals maximise utility; and that

c) people act independently on the basis of full information.

Neoclassical economics relies on these assumptions to model allocation of resources, market beviour &c, often making use of game theory. This latter field, popularised by the film A Beautiful Mind, is concerned with predicting how two agents will behave in a particular situation. Briefly, game theory suggests that a possible interaction among a number of agents will result in equilibrium, a state where no agent has an incentive to change their behaviour.

The classic game theory example is the prisoner's dilemma: two criminals are arrested, placed in separate cells, and offered a bargain: each prisoner can testify that their partner committed the crime, or they can stay silent. If both prisoners betray each other, they both get two years in prison; if one prisoner betrays his partner, but his partner stays silent, the snitch goes free but their loyal partner gets three years; and if both stay silent, they both get one year in prison (due to some lesser charge the prosecutor can concoct).

This scenario can be visualised in the following table:
The bottom left number in each cell shows A's sentence, and the top right B's sentence. According to game theory, both prisoners betraying each other is the game's only Nash Equilibrium: you can see that in any other cell, one or both of the prisoners has an incentive to change their strategy, whereas in the bottom right cell, a prisoner will only be worse off if they change. So what this game tells us is that even though mutual cooperation would leave both players better off, rational decision making will lead to mutual betrayal.

Behavioural economics challenges the three hypotheses that underpin neoclassical economics. The discipline suggests that, not only people are irrational, but they are predictably so, to the point that the same approaches used by neoclassical economics (such as game theory) would lead to different conclusions, were the predictably irrational behaviour of humans taken into account.

There are two responses to this challenge. The first is that behavioural economics does not so much supplant neoclassical economics, as it augments it. Prospect theory, one of the discipline's foundations, proposed by Amos Tversky and Daniel Kahneman, slightly modifies utility theory, so that according to it, people make choices between alternatives based on potential gains and losses, not end-states; it also suggests that people use heuristics to make decisions. But at its core, it's not all that different to classical economics.

The second is that neoclassical theory is actually pretty good at predicting behaviour; the experimental results from behavioural economics that seem to suggest otherwise misunderstand neoclassical theory. There is a good paper on this by David Levine and Jie Zheng. This paper uses the Ultimatum game as an example: this is a game that many behavioural economics proponents claim undermines neoclassical economics.

In the Ultimatum game, person A is given $10, and can then suggest a division of this money between himself and player B. Player B can then accept A's suggestion, or reject it, in which case neither player gets any money. In various lab experiments, it has been observed that few people, if anyone, offer less than $2 to player B, with most people offering $5; and, when player A makes an "unfair" offer, player B often rejects it. Some behavioural economists consider this an excellent refutation of neoclassical economics: surely, traditional, neoclassical theory, with its selfish, buck-maximising agents, would predict minimal offers from player A, which would always be accepted by B.

(This way of reasoning is called sub-game perfection: the idea is that you break the game into two stages, and reason backwards: player A thinks, as long as I offer anything to player B, he is better off accepting rather than rejecting my offer; therefore, I can offer anything, no matter how little, and still have him accept it.)

However, neoclassical economics does not have selfishness or lack of altruism as a fundamental axiom; in fact, Adam Smith explicitly stated that people's utility functions most likely have a moral dimension to them. More importantly though, game theory says that, perhaps counter-intuitively, the Ultimatum game has many Nash equilibria. As Levine and Zheng write, the right way of thinking about the problem is to check whether people's losses (as a result of their strategy) are small relative to what they could have gained, had they played optimally.

To do this, one would have to look at how much money a player who had past experimental data could have made and compare it to how much they actually made. Using this approach, it is found that players in the Ultimatum game lose about $1. Furthermore, only 1/3 of this $1 represents known losses, i.e. money that the players know they will lose (clearly, only player B has known losses in this game, when he rejects A's offer, knowing he is choosing to forego money). The remaining 2/3 are basically due to players who assume the role of A not having had enough experience to judge what kind of offers are typically rejected.

In summary then, many argue that behavioural economics is nothing but tinkering with the neoclassical model; any claims that it's a fundamentally new paradigm show a misunderstanding of neoclassical theory.

B. Humans: not that irrational or uniform
Behavioural economists, drawing on work from psychology, make some pretty astonishing claims: if you "prime" people by having them read words that remind them of old people, they will subsequently walk slower; if you give them more products to choose from, they are less likely to make a purchase; if you make exam questions harder to read, they will perform better. Some of these have been as influential as they are hard to believe - for example, consumer goods companies have reduced the number of products they sell to reduce "choice overload", and leaders such as Obama and Zuckerberg have simple wardrobes on purpose to avoid ego depletion. It turns out, however, that some of these effects are not as robust as pop books would have us think.

In this section, I will discuss some experiments that behavioural economists use as examples of human irrationality; but first, there is another matter to be addressed. A great deal of the criticisms leveled against neoclassical economics is based on lab experiments that purport to show people are far more altruistic, selfless or irrational than standard theory predicts. However, many of these results cannot be generalised to society at large; furthermore, human behaviour varies significantly across the world, and we should be weary of drawing conclusions about humanity from lab experiments performed at Ivy League colleges.

Stephen Levitt and John List expand on what lab experiments say about the real world in this paper. They start by suggesting that people's utility function takes the form

U(action, stakes, norms, scrutiny) = Morality(action, stakes, norms, scrutiny) + Wealth(action, stakes)

In other words, the utility, how happy a person will be by taking an action, depends on the moral cost of this action, as well as on its effect on the person's wealth. Whereas the effect on wealth depends on the action and the stakes involved, the moral cost also depends on social norms and the scrutiny of an individual's action. Levitt and List argue that behaviour in the lab is not a reliable predictor of behaviour in society because scrutiny in the lab is far higher than in real life and the stakes are often lower.

(Here's a video demonstrating the importance of scrutiny in guiding action:

)

This is not just a hypothesis, but an observed fact. In one experiment, List ran an experiment in which sellers could choose the quality of the products to offer to buyers in response to the buyers'' bids. He used experienced sports card traders as subjects, and found that in the lab, they exhibited strong social preferences: when buyers offered high prices, sellers responded by offering high-quality cards, even though they were not obligated to do so. But he then ran a field test on these same traders. He sent confederates to pose as buyers in sports-cards shows. It turns out that outside the lab, there was little relationship between price offered and quality. Similarly, other experiments have found that people are more likely to behave selfishly if their anonymity is guaranteed.

Also, here's an interesting factoid found in Levitt and List's paper: in another experiment, List and a collaborator examined whether professionals behave the same way as students in trust games. It turns out that CEOs in Costa Rica are considerably more trusting and trustworthy than students. Maybe it's because the people who become CEOs in Costa Rica are particularly nice; it may be because CEOs care more about their reputation and behave extra-trustingly. But either way, this shows that it's hard to generalise from experiments run on students.

Which leads me to the WEIRDest people in the world - members of Western, Educated, Industrialised and Democratic societies. The authors of this paper make the same argument as the previous paragraph - behavioural and cognitive studies tend to generalise their experimental results to the entire human species, when their effects are local. They back this claim with a number of case studies.

Consider, for example, the Muller-Lyer, aka the two lines, illusion. Which of the two lines below is longer?

You can probably guess the answer, even if you haven't read any books on pop psychology: the two lines have the same length. If you have read pop psychology (or a Buzfeed article on 27 Illusions that will BLOW your mind (you won't believe number 4!)) you have probably read something like "viewers invariably perceive line b as being longer". But there is nothing invariable about this phenomenon:


The chart above shows by how much line a must be increased in length, before subjects perceive the two lines as being of equal length, by country. As you can see, in some societies, viewers can tell the two lines are the same length with hardly any manipulation; also, children and adults respond quite differently to the illusion.

Whether a society is industrial or not also affects its members behaviour in the Ultimatum Game. I mentioned earlier that most people who play the Ultimatum game in a lab setting offer about 50% of their wealth; but this is only the behaviour of American adult subjects; in fact, Americans seem to be far more generous than other societies...

% of wealth offered in Ultimatum Game, by country
... and more willing to reject an offer they deem unfair:
Income maximing offer, by country

(The second chart shows the % the proposer should offer, to maximise their wealth on average. In the US, the optimal strategy for a proposer is to offer 50% of his wealth, otherwise he runs the risk of the receiver rejecting the offer; in other countries, receivers are content with 10% instead.)

Even more shockingly, experiments ran in Russia, China, Sweden, the Netherlands and Germany show that some subjects even reject so-called hyper-fair offers (>60% of the proposer's wealth). I mean... you can kind of understand this behaviour in communist countries like Russia or China, or in socialists' poster-boy Sweden, but Germany??

And for my favourite example of different behaviour across countries, consider Herrmann &al's paper on anti-social punishment. This paper focuses on a so-called public goods game. This game is played with four players over ten rounds. Players are given 20 tokens, and in each round, they need to decide how many of their tokens to contribute to a common pool. The tokens in this common pool are then increased by 40%, and divided over all four players, regardless of whether they contributed or not. So, as in many real-life situations, players are better off if they all contribute, but each one has an incentive to free-ride on the other players' contributions. For example, if all four players contribute 10 tokens, they will each end up with 14 (= 4 x 10 x 1.4/4); but if one player does not contribute anything, he keeps his 10, and he gets an additional 10.5 (= 3 x 10 x 1.4/4) from the other players, thus ending up with 20.5. Herrmann & al ran this experiment in a number of different countries, using university undergraduates as subjects.

There are a few interesting results from this experiment. First, the level of cooperation, as measured by the average contribution by each player, varied significantly across different countries. Second, as a pessimist (or a classical economist) would expect, cooperation quickly declined as the game progressed (and people realised others started free-riding):

But that's not the best part yet. The researchers also ran the same experiment introducing the ability to punish other players. After learning other players' contribution choices, each player could assign every other player between one and ten deduction points. Each deduction point would reduce the punished player's tokens by three, but would cost the punisher one token.

In this variant of the game, the cooperation level increased, or at least remained stable in most countries:

But this is still not the best part. If you were playing this game, whom would you punish? Odds are, you would choose to punish those players who contributed less than you. That's only fair, right? Well, that's only fair if you come from an Anglo-Germanic country. It turns out people from a number of countries, most notably Oman and Greece, choose to punish overly generous players!
It's anybody's guess why anyone would punish other generous players. The researchers suggest it's a form of revenge: though players cannot see who punished them, they probably assume that they were punished by the more generous ones. Indeed, it seems that this "anti-social punishment" correlates with the amount of punishment a player received in the previous round.

Needless to say, anti-social punishment has an extremely strong negative correlation with mean contribution:
(I grant that this whole section on the public good game is only tangentially related to the core matter at hand, in that it shows how differently people behave by country, and how irresponsible it is to make universal claims re human behaviour based on American studies; the main reason I am including it here is that it confirms my long-held belief that at the core of Greece's problems lies the classic Hellenic quip - "τι είμαι εγώ, μαλάκας;/σιγά μη γίνω εγώ ο μαλάκας της υπόθεσης".)

In short, people do not behave the same way across the world. More importantly, people do not behave the same way outside the lab. Behavioural economics is predicated on the assumption that people behave irrationally in a predictable, uniform way. Evidence seems to suggest otherwise.

Now, I realise that what makes pop economics and psychology books exciting are the factoids they offer - the trivial pieces of knowledge that we all like to repeat at parties and seem clever. The rest of this section adopts this strategy (though admittedly, too late: I suspect that the readers who have followed me this far are those who would persevere regardless of factoids): I list below a few "classic" experiments that are referenced by behavioural economists to show that they are not as robust as some books make them seem.

The paradox of choice
A 2000 study by Iyengar and Lepper found that giving consumers more choice results in fewer purchases. In their experiment, they set up two tasting booths in an upscale grocery store, on different days. One of the booths had six varieties of jam displayed. The other had 24. What they found was that though more consumers stopped at the large-sample booth (60% vs 40% for the small-sample one), only 3% of consumers exposed to the large-sample booth made a purchase, vs 30% of those exposed to the small-sample booth.

They also ran two more experiments as part of the same study. In the second experiment, psychology students were given the option to write an essay for extra credit. Some students were given six topics to choose from, others 30. Not only did more students who were given six topics actually write the essay (74% vs 60%), but their essays were actually better! In the third experiment, participants were asked to choose a chocolate. Again, some participants were given a limited assortment to choose from, and some a larger one. This experiment found that people who were given a larger assortment to choose from took longer to make a choice, felt they were given too many options, did not feel any more confident that they made the right choice, and enjoyed the chocolates they chose less than those given a smaller range to choose from (though they reported enjoying the selection process more). Not only that, but when participants were asked whether they wanted to be paid in cash or in chocolates for their time, 48% of those given a small assortment chose to be paid in chocolates, vs 12% of those given a wider range.

It's hard to overstate the effect of this study - not just in academia, but also in business. I have actually heard people reference the choice paradox in meetings, to argue for reducing the number of products we offer.

Now, as I've said many times before, I totally agree that society does not really need 20 different shampoo variants within one brand. But to make a decision based on one study that you haven't read and understood is pretty irresponsible.

A meta-analysis of all studies that have looked into the choice paradox found the mean choice overload effect to be virtually zero. Several studies tried to directly replicate the original experiments and failed - for example, Scheibehene tried to replicate the jams study in Germany, and Greifeneder tried to replicate the chocolate study, both without any meaningful results.

Of course, many of the studies analysed by the meta-study did also find evidence of choice overload. There are a number of factors that may explain the variance in these studies - some have to do with publication bias, but some other interesting ones are:
  • Measurement choices: it seems that more choice is better when what is being measured is consumption, instead of binary buy/not buy choices.
  • Strong preferences: people with strong preferences prefer more choice.
  • Ease of comparison: if the products in an assortment are difficult to compare, e.g. by having complementary features, consumers may experience regret after making a choice, hence leading to choice overload.
  • Perception of quality distribution: people may be more likely to prefer small assortments if all products on offer are of high quality. But if average quality is low, with some products being of high quality, then a larger assortment increases the odds of being able to buy a satisfactory product.

Basically, the jury's still out on this one. It's certainly not the case that more choice invariably leads to fewer purchases though.

Priming
You must have heard of this one: subtle cues subconsciously "prime" you in a way that visibly alter your behaviour. In the original study on the matter, volunteers had to create a sentence from scrambled words. When these words related to old people, subjects walked slower when leaving the lab after the experiment.

Whereas I readily bought into all the other effects I discuss here, I must say I always viewed this one with suspicion: apparently, one of the words used to prime subjects was "Florida". This seems very strange to me. Whereas I could grant that some people may associate Florida with old people, to an extent that they then alter their behaviour, I find it crazy that subjects only associate Florida with old people. What about Disney land? Alligators? Miami? Spring breaks? Why would these words not prime people to walk like a princess, run for their lives, swagger about or stumble drunkenly?

It turns out my suspicion was justified: another group of scientists tried to replicate the study, with a few modifications: a) they timed subjects using infrared sensors, not stopwatches as in the original experiment; b) they used more volunteers and c) they used experimenters who did not know what the study was about. They found zero priming impact.

But they went further: they repeated the experiment, only this time, they told the experimenters that the subjects had been primed. They told half of them (the experimenters) to expect faster walks, and half of them to expect slower walks. The subjects were found to walk slower only by those experimenters who were expecting that!

Of course, the author of the original paper responded that a) his experimenters were also blind to the study hypotheses (which is true, but the experimenters were the ones who prepared all materials, which they had plenty of time to study; and being smart people, many of them probably guessed the hypothesis); b) subjects in the replication experiment were told to "go straight down the hall when leaving", which draws attention to the process, and arguably implies speed, thus eliminating the effect (but there is no evidence they were told this - plus, if the effects of priming are so weak, what's the point of it?); c) the replication experiment used too many old-related words, which meant subjects may have noticed the connection, cancelling priming (but his own original paper said that more primes would yield stronger results) and d) the experiment would only work if subjects associated old age with infirmity, an association the replication did not test (but then, neither did the original paper).

I am not saying we are not susceptible to subliminal messages; but we would be a pretty ridiculous species indeed if we walked slower every time someone said "Florida".

System 2 Activation
Try answering the following three questions:
  1. A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?
  2. If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?
  3. In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake?
The answers are $0.05, 5 minutes and 47 days. Yet many people answer $0.1, 100 minutes and 24 days - not because these questions are difficult, but because it is very easy for our minds to make these mistakes when going on autopilot.

In Thinking, Fast and Slow, Kahneman talks about how humans reason using two different "systems": system 1 is quick, effortless and relying on intuition, whereas system 2 is slow, deliberate and analytical. Because using system 2 takes up a lot of effort, we tend to rely more on system 1, the autopilot that causes a lot of us to answer one or more of these questions incorrectly.

But, says Kahneman: if you disrupt people's autopilot, they will switch to system 2, and perform better. One way to do this is to explicitly say, "careful, these are trick questions"; but more astonishingly, according to Kahneman, you can disrupt system 1 just by making the questions harder to read - e.g. by using a font that's harder on the eyes, or a pale colour.

Kahneman bases this claim on this paper, in which experimenters asked 40 Princeton students to take the three-question test above. Half the students took the test in normal font, the other half in a difficult, 10% gray, italicised font. The first group got 2.45 of the questions right, on average, whereas the second only got 1.9 right.

But a number of replication attempts have failed to discover any such effect:

I think all we can take out of this series of experiments is that Ivy League students are slightly smarter than non-Ivy League ones.

Ego Depletion
Here's another effect that has had real life impacts. A study put students in a room with freshly baked cookies and radishes. Some were told they could only eat the former, some that they could only eat the latter. All students were then given an unsolvable test, and the researchers measured for long the students would keep trying to solve it. It turned out that those who were allowed to eat the cookies persevered for far longer (19 mins) than those who weren't (8 mins). This was taken to show that humans have a fixed amount of willpower than can get depleted; furthermore, that willpower is like a muscle that can be trained. Hundreds of studies have been run since then, all apparently confirming this hypothesis.

And people have taken heed - including Obama and Zuckerberg, who have both claimed to opt for dull, standardised wardrobes so as to avoid wasting decision energy on useless tasks.

However, a more recent, massive attempt to reproduce the main effect outlined above, using 2,000 subjects, has found zero effect.

Cracks in the theory had appeared before. Evan Carter, a graduate student at Miami, tried to replicate a previous experiment, only to find that he could not reproduce its results. So he looked into a 2010 meta-analysis, and discovered that a) the meta-analysis had only included published studies, increasing the risk of publication bias (unexciting results don't get published all that much) and b) some studies had bizarre or contradictory measures of willpower - e.g. one study suggested that depleted subjects would be less willing to help a stranger, whereas an other study said that depleted subjects would give more to charity. Re-evaluating the studies in the meta-analysis adjusting for such errors, he also found no effect.

Again, I am not disputing that people get tired, and that if they are asked to do too many things, they will have less energy. But the original formulation of the hypothesis, and some of the lessons that people have taken from it, such as that taking an extra minute each morning to decide what tie to wear can deplete one's willpower, seem exaggerated and unfounded.

To conclude this section: I am not claiming that humans are perfectly rational. Indeed, I think Kahneman, Tversky and other economists/psychologists have done a brilliant job demonstrating many ways in which humans are irrational. I think their work on heuristics humans use instead of reason, and how these lead to mistakes such as overconfidence, ignoring base rates and other fallacies such as the Linda effect, is brilliant (some people have suggested these are all framing issues that disappear if questions are asked differently, but I found that criticism pretty weak. See here Kahneman and Tversky's reponse.)

But we are not as stupid, easy to manipulate, or homogeneous as behavioural economists often suggest. Nor have behavioural economists conclusively proven that their models are better at predicting human behaviour in real life. And this brings us to...

C. Behavioural economics and policy making
This will be a short section. Behavioural economics have been so influential that the US and British governments have set up whole departments to carry out policy based on the discipline's lessons. David Cameron himself referred to a behavioural economics insight: "The best way to get someone to cut their electricity bill is to show them their own spending, to show them what their neighbours are spending, and then show what an energy-conscious neighbour is spending".

But as Tim Hartford (the Undercover Economist) points out, this is plain wrong. The best way to make people cut their energy consumption is to increase prices. There may be all sorts of reasons to oppose a policy (such as tax) that aims to make energy more expensive; indeed, as someone who identifies as more or less a libertarian, I would rather keep government taxes at a minimum. But this is neither here nor there: the fact remain that classical economics offers better policy solutions than behavioural economics. Standard tools such as taxes, subsidies and interest rates are way more powerful, and have far stronger impacts, than "nudges".

This is because, again, on aggregate, claims of neoclassical economics' death at the hands of the 2008 crisis are greatly exaggerated (another factoid: Mark Twain never used this exact phrase. What he wrote was "the report of my death was an exaggeration"). Neoclassical economics is still being taught at schools and universities, not because academics are die-hard traditionalists, but because it still has lots of valuable things to say about how the world works.