A 2010 paper published in a top-tier psychology journal advised that “a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful.” The researchers had 42 people assume two positions for one minute each — either high-power poses (sitting in a chair with their feet on a desk and standing with their hands spread on a desk) or low-power poses (sitting in a chair with hands clasped between their legs and standing with their arms and legs crossed).
Saliva samples were used to measure the dominance hormone testosterone and the stress hormone cortisol. Risk-taking was gauged by a willingness to take a bet with a 50 percent chance of winning $2 and a 50 percent chance of losing $2. Feelings of power were measured by stating on a scale of 1 to 4 how “powerful” and “in charge” they felt.
Compared to the low-power poses, those who took high-power poses had an increase in testosterone, a reduction in cortisol, an increased willingness to take risks, and increased feelings of power. The statistical significance of each result was gauged by the p-value, which is the probability that the observed effect is a false-positive. All the p-values were less than 0.05 and the authors concluded that
a simple 2-min power-pose manipulation was enough to significantly alter the physiological, mental, and feelings states of our participants. The implications of these results for everyday life are substantial.
The implications were certainly substantial for one of the co-authors, Amy Cuddy. She wrote a best-selling book, gave one of the most-watched TED talks of all time, and became a celebrity speaker with fees of $50,000 to $100,000.
In her TED talk, Cuddy is confident, emotional, and charismatic while she tells the audience that her advice is backed by solid science. One of her memorable lines is don’t just “fake it ‘til you make it,” but “fake it ‘til you become it.” While a giant screen projects a picture of Wonder Woman standing with her legs spread and hands on her hips, Cuddy tells her audience:
Before you go into the next evaluative situation, for two minutes, try doing this in an elevator, in a bathroom stall, at your desk behind closed doors…. Get your testosterone up. Get your cortisol down.
In 2015 there was a published report by a group of researchers who had hoped to explore gender differences in the benefits from power poses. However, using a much larger sample (200 subjects), they found only a small increase in self-reported feelings of power and no effects on hormone levels or behavior.
In 2017 the journal Comprehensive Results in Social Psychology (CRSP) published the results of several large studies, each using a peer-reviewed pre-registration that specified, in advance, how the studies would be conducted in order to limit the possibilities for p-hacking — fiddling with the data to lower the p-values. These studies did not find evidence of hormonal or behavioral changes, but did find modest effects on feelings of power.
Dana Carney was the lead author of the original power-posing paper and she provided detailed feedback on the pre-registration plans for the CRSP papers. In September 2016, she posted a statement on her faculty website at the University of California, Berkeley:
As evidence has come in over these past 2+ years, my views have updated to reflect the evidence. As such, I do not believe that “power pose” effects are real.
Carney also revealed that there had been some p-hacking in the original study. Here are a few excerpts from her statement:
The sample size is tiny.
The data are flimsy. The effects are small and barely there in many cases.
We ran subjects in chunks and checked the effect along the way. It was something like 25 subjects run, then 10, then 7, then 5. Back then this did not seem like p-hacking. It seemed like saving money.
Some subjects were excluded on bases such as “didn’t follow directions.” The total number of exclusions was 5.
Subjects with outliers were held out of the hormone analyses but not all analyses.
Many different power questions were asked and those chosen were the ones that “worked.”
The p-hacks may well explain why the initial results were statistically significant but did not replicate.
If the sample size is fixed ahead of time, the probability of a false-positive is 5 percent, no matter what the sample size. If, however, the sample is increased after the results turn out not to be statistically significant, this increases the probability of a false-positive. Here, the original study started with 25 subjects, then added 10 more, then 7, then 5. To calculate the true chances of a false-positive, we need to make an assumption about what the researchers would have done if 47 subjects had not been sufficient. The figure shows the false-positive probabilities if they had continued adding batches of 5 subjects for maximum sample sizes up to 102.
A flexible sample size is not the only trick for achieving statistical significance. Data sometimes contain outliers that are very different from the rest of the data. For example, a study of middle-class lifestyles might include a billionaire who lives, well, very differently. If the researcher is truly focused on the middle class, the billionaire can be rightfully removed from the sample. There is no mathematical rule for identifying data as outliers, so it is ultimately a subjective decision to be made by researchers — which provides welcome flexibility for those seeking p-values below 0.05.
The figure shows how the false-positive probability is affected by discarding an outlier in order to reduce the p-value. With a 102-subject stopping point, the flexible sample size pushes the probability of a false-positive from 5 percent to 17 percent. Discarding one outlier as needed further pushes the false-positive probability to 37 percent. A willingness to discard more than one outlier would increase the probability of a false-positive even more.
I don’t know the extent to which the discarded observations in the original power-pose study reduced the p-values, but it is troubling that there were 47 subjects, not the 42 reported, with 5 people excluded for vague reasons like “didn’t follow directions” and that an additional observation was omitted from the hormone results, but not the other results.
In addition, the reporting of a subset of the power questions was almost surely a substantial p-hack. Altogether, the false-positive probability was most likely well over 50 percent.
Continued in Part II:
Fake It ’til You Make It – The Power Pose Parable Part II. Where does p-hacking and the replication crisis leave the state of scientific studies? A professor at the University of Toronto wrote that “the only way can really change is if we reckon with our past, coming clean that we erred; and erred badly.”