Blog/

Analysis

Patrick Flynn

Why the polls may be overestimating Kamala Harris' lead

September 23, 2024

In both the 2016 and 2020 presidential elections, pollsters underestimated Donald Trump's performance. Compared to 538's election day polling averages, the Democratic candidate's national lead over Trump was 3.9 points lower than the polls predicted in 2020 and 1.8 points lower in 2016.

At Focaldata, we are concerned this pattern may continue in 2024, and some of the issue stems from how pollsters estimate turnout.

Desire is not destiny

Social-desirability bias is a well-studied phenomenon in the field of research. In simple terms, it relates to the tendency of people to give survey responses that may be viewed more favourably by others, such as whether they voted in an election or whether they give money to charity.

Given the socially-desirable impact of electoral participation, voters are (generally speaking) not particularly good at assessing their own likelihood of voting. Surveys which simply rely on self-reported turnout may therefore be subject to an added degree of bias in their results.

To combat this problem, we have devised a turnout model to estimate the likelihood of a respondent in our US election surveys actually voting, rather than simply relying on their own estimation. To create it, we used the 2020 Cooperative Election Study (CCES) panel of 60,000 respondents, which were matched to the voter file to determine whether each respondent actually voted in the election.

The CCES allows us to determine whether a person’s self-declared likelihood to vote reflects their subsequent turnout. On the surface, a reasonable estimate for a pollster might be to assign ‘certain’ voters a 100% likelihood of voting, ‘probable’ voters somewhere around 75%, undecided voters a 50-50 chance, and ‘would not vote’ 0%. In reality, these figures do not correspond with actual voting behaviour.

In 2020, over a quarter (27%) of people who said they were certain to vote; going to vote early; or had already voted, did not actually vote in the presidential election. Even more strikingly, those who said they would ‘probably’ vote, i.e. they were more likely than not to head to the polls, only turned out 23% of the time. In addition, a respondent saying they will not vote does not entirely preclude them from voting – 5% of those who said they wouldn’t vote actually did.

A respondent’s self-declared likelihood is important, but it should not be the sole factor in a turnout prediction in an opinion poll. Some pollsters do not even assess likelihood of voting, instead relying solely on registered voters to generate their headline results. Implicitly, a registered voter poll assumes every voter has the same probability of voting (assuming they are registered), which we know empirically is not the case.

Definitely indefinite

If rates of overstating turnout were similar across different demographic groups, the turnout weighting problem for pollsters would be quite small and its effects would mostly cancel each other out. However, there are significant differences in reported-versus-actual behaviour by age group and education level, making the problem significantly larger.

Let’s take voters under 35. In 2020, young voters who said they were ‘definitely’ going to vote only voted around half the time. Among those aged 65+, the figure shoots up to 85%.

Similarly, those with high levels of education are much more likely to correctly assess their probability of voting. 80% of ‘definite’ voters with postgraduate degrees turned out, and just 1% who said they wouldn’t vote ended up voting. In contrast, only 63% of self-declared 'definites' who didn’t graduate high school voted, and 5% who said they wouldn’t vote did.

If we were to simply assume 'definitely' means the same thing across different groups, we would end up with poll results too heavily skewed towards the views of younger, non-white and lower-education voters. Two of these three groups lean heavily towards the Democrats, partially explaining why the party's candidate has been overestimated in the polls at the last two presidential elections.

Solving the problem

Using a sophisticated turnout model which takes into account the effects of self-reported likelihood to vote — alongside other demographics like age, race, education and political interest — reduces Kamala Harris’ lead over Donald Trump by an average of 2.4 percentage points in our latest wave of swing state polls. In an election which could be decided by just 60,000 voters in November, this margin could easily be the difference between a right and wrong call on the election winner. Pollsters who simply rely on self-reporting may be subject to another polling miss in Trump’s favour.

The latest wave of Focaldata swing state polling will be released this week, alongside updated MRP estimates.

Image of Kamala Harris by Gage Skidmore

Stay connected

Subscribe to get the Bi_Focal newsletter delivered directly to your inbox.

Subscribe
Button Text