(See updates below for more.)
It is true that I’m not a fan of aggregated poll results, but there are things that you can learn about pollsters and how their differing methodologies color their results.
The current Real Clear Politics average of polls nine recent national polls shows a Romney lead of .6%. However the individual polls themselves are all over the place, ranging from Romney +4 to Obama +3. Nor are the results clustered about the mean as we would expect. In fact, there are more polls in the tails of the distribution of results than there are about the mean.
I’ve discovered a possible relationship that might explain the disparity in those results. Of the six polls for which I could find internal numbers, there appears to be a negative correlation between the percentage of turnout implied by the poll’s sample and the spread between Barack Obama’s and Mitt Romney’s level of support. (Gallup’s poll is counted twice since it gives separate results for likely and registered voters.)
Nate Silver addressed this issue back in July, but he didn’t explore it in depth. Silver made the implied assumption that a likely voter is a likely voter. Each polling company does it differently. Earlier today I referenced the most recent ABC Washington Post poll. Of the likely voters surveyed, 82% said that they are “absolutely certain to vote.” Another 5% said that they would “probably vote,” 8% said that there was a 50-50 chance or less that they would vote, and 4% said that they have already voted. At least one out of every seven respondents who said that they are absolutely certain to vote is lying. In fact, the 1,382 “likely” voters they identified out of a poll of 1,764 adults would indicate a voting age population percentage of 78%. Never in my lifetime has that percentage reached even 65%.
Below I’ve plotted the spread between Romney and Obama as a function of the turnout percentage implied by the poll. If we fit a line to the results, it looks like that for every 10 point increase in voter turnout, Romney’s lead falls by 1.7%.
Admittedly, this is too small a sample from which to derive any statistically significant results. Except for this: Of the recent polls for which I have internal numbers, only Rasmussen’s turnout percentage assumptions are realistic. The other four polls use a sample that assumes an election turnout of between 86 and 93 percent. That simply is not going to be the case. From this small sample it appears that Rasmussen is not the outlier it is often accused of being. Instead, other polling organizations in the current RCP Average employ a likely voter screen that removes only 7% to 14% of registered voters from the sample pool, when we know that about 30% of registered voters are not going to show up to vote.
When a poll oversamples unlikely voters, it gives disproportionate weight to the unenthusiastic. In most years that pushes Republican poll numbers lower, at least until the only poll that counts: the one on election day. The presidential election of 2008 might have been an exception to that rule. But even if that were so, in 2012 it is Barack Obama who now has lukewarm support. Turnout simply is not going to match the expectations contained in most of today’s polls. And a lot of Democrats are going to be surprised by the results.
UPDATE: The graph above now includes last night’s poll released by AP-GFK. It showed a 2-point Romney lead and was from a sample of 839 “Likely Voters” out of 1,041 Registered Voters and 1,186 adults. In other words, it assumes that 80.6% of registered voters and 70.7% of the adult population will turnout to vote. This conforms with the earlier trend.
ALSO: Josh Jordan notes that polls in Ohio seem to show a much higher rate of early voting than official early voting records show.
Enter Ohio, where the current estimates from compiling early in-person and absentee voting shows early turnout to be about 15 percent of voters. But responses in the current polls claim that 23 percent of registered voters have already voted. That means that polls are overstating early voting by eight percentage points on average.
Jordan quotes Gregory House to justify the apparent error of one-third of those who claim to be early voters: “Everybody lies.” I think there is something else at work too–especially in Ohio: Voter Fatigue. Ohio is the penultimate bellwether state. It has been fought over relentlessly now for four straight presidential election cycles. I have family who live there. They report that a dozen political phone calls a day is the norm this time of year. Whether true or not, “I already voted” is one phrase that gets some of those calls to stop.
Meanwhile, Zombie looks at the national polls in the RCP average and notices another curious coincidence (emphasis in original):
All polls with 1000 or more respondents favor Romney; all polls with smaller than 1000 respondents favor Obama (or are tied).
Zombie wonders why? I think I have the answer, and it’s the same trend that I noticed four years ago. Notice that the polls with the three largest sample sizes are not media-affiliated polls. Gallup, Rasmussen, and SurveyUSA are polling for the sake of polling. All of the other polls are conducted at the behest of a media organization, and in 2012 that means a media organization that is on a very tight budget.
I contend that it is not media bias that drives these differing results, but is instead cheapness. Good polling is very expensive. To get a sample that is representative of the voting population means that you have to make many times more calls than you need actual respondents. The best polling organizations ask about your voting history and from that they determine your likelihood of voting. If you voted in the primary and the mid-term election in 2010, you are more likely to vote again this year. If you didn’t vote then and you don’t know the location of your polling precinct, you are unlikely to vote now. Polls conducted on the cheap ask people to self-identify whether or not they are likely to vote and take people at their word. Culling those unlikely to vote means turning away a potential respondent that you finally got on the phone and who is willing to answer your questions. It is a lot easier and cheaper to use a looser likely voting screen and a smaller sample size than it is to do polling right.
(This, BTW, explains why most polling organizations don’t change to even their porous version of a likely voter screen until after Labor Day.)
* NOTE: (Updated to change “likely” to “registered” in the last sentence of the second to last paragraph.)
Doug Mataconis produced this chart from Federal Election Commission Data.
There are a few dates of note that explain significant shifts. I don’t know about 1964 and 1968, but in 1960 there was at least one state (Alabama) that allowed individuals to vote for each of the state’s electors. That meant that one could hypothetically cast eleven ballots for president. This probably explains why turnout exceeded 100% of those registered to vote. The next date is 1972 when you see a ten-point drop in the percentage of registered voters who voted. That is attributable to the addition of college-age registrants who voted at a much lower rate. 1996 was the first election after motor-voter laws that essentially gave you “free” voter registration whenever you got a driver license. Registration went up, but overall turnout between 1988 and 1996 was essentially unchanged. I say all this to say that you can pretty much disregard turnout percentages from before 1996. Finally, 2008 turnout as a percentage of registrations is hard data to come by. If we use Mataconis’ number of 62.2% of the voting age population (Larry Sabato has it about a point lower), we can project that the percentage of registered voters who voted was no more than 75%.