Garbage in = Garbage out

Byline: | Category: 2012 | Posted at: Wednesday, 24 October 2012

(See updates below for more.) 

It is true that I’m not a fan of aggregated poll results, but there are things that you can learn about pollsters and how their differing methodologies color their results.

The current Real Clear Politics average of polls nine recent national polls shows a Romney lead of .6%.  However the individual polls themselves are all over the place, ranging from Romney +4 to Obama +3.  Nor are the results clustered about the mean as we would expect.  In fact, there are more polls in the tails of the distribution of results than there are about the mean.

I’ve discovered a possible relationship that might explain the disparity in those results.  Of the six polls for which I could find internal numbers, there appears to be a negative correlation between the percentage of turnout implied by the poll’s sample and the spread between Barack Obama’s and Mitt Romney’s level of support.  (Gallup’s poll is counted twice since it gives separate results for likely and registered voters.)

Nate Silver addressed this issue back in July, but he didn’t explore it in depth.  Silver made the implied assumption that a likely voter is a likely voter.  Each polling company does it differently.  Earlier today I referenced the most recent ABC Washington Post poll.  Of the likely voters surveyed, 82% said that they are “absolutely certain to vote.”  Another 5% said that they would “probably vote,” 8% said that there was a 50-50 chance or less that they would vote, and 4% said that they have already voted.  At least one out of every seven respondents who said that they are absolutely certain to vote is lying.  In fact, the 1,382 “likely” voters they identified out of a poll of 1,764 adults would indicate a voting age population percentage of 78%.  Never in my lifetime has that percentage reached even 65%.

Below I’ve plotted the spread between Romney and Obama as a function of the turnout percentage implied by the poll.  If we fit a line to the results, it looks like that for every 10 point increase in voter turnout, Romney’s lead falls by 1.7%. 

turnout_2.jpg

Admittedly, this is too small a sample from which to derive any statistically significant results.  Except for this:  Of the recent polls for which I have internal numbers, only Rasmussen’s turnout percentage assumptions are realistic.  The other four polls use a sample that assumes an election turnout of between 86 and 93 percent.  That simply is not going to be the case.  From this small sample it appears that Rasmussen is not the outlier it is often accused of being.  Instead, other polling organizations in the current RCP Average employ a likely voter screen that removes only 7% to 14% of registered voters from the sample pool, when we know that about 30% of registered voters are not going to show up to vote.

When a poll oversamples unlikely voters, it gives disproportionate weight to the unenthusiastic.  In most years that pushes Republican poll numbers lower, at least until the only poll that counts:  the one on election day.  The presidential election of 2008 might have been an exception to that rule.  But even if that were so, in 2012 it is Barack Obama who now has lukewarm support.  Turnout simply is not going to match the expectations contained in most of today’s polls.  And a lot of Democrats are going to be surprised by the results.

UPDATE:  The graph above now includes last night’s poll released by AP-GFK.  It showed a 2-point Romney lead and was from a sample of 839 “Likely Voters” out of 1,041 Registered Voters and 1,186 adults.  In other words, it assumes that 80.6% of registered voters and 70.7% of the adult population will turnout to vote.  This conforms with the earlier trend. 

ALSO:   Josh Jordan notes that polls in Ohio seem to show a much higher rate of early voting than official early voting records show. 

Enter Ohio, where the current estimates from compiling early in-person and absentee voting shows early turnout to be about 15 percent of voters. But responses in the current polls claim that 23 percent of registered voters have already voted. That means that polls are overstating early voting by eight percentage points on average.

Jordan quotes Gregory House to justify the apparent error of one-third of those who claim to be early voters: “Everybody lies.”  I think there is something else at work too–especially in Ohio:  Voter Fatigue.  Ohio is the penultimate bellwether state.  It has been fought over relentlessly now for four straight presidential election cycles.  I have family who live there.  They report that a dozen political phone calls a day is the norm this time of year.  Whether true or not, “I already voted” is one phrase that gets some of those calls to stop.

Meanwhile, Zombie looks at the national polls in the RCP average and notices another curious coincidence (emphasis in original): 

All polls with 1000 or more respondents favor Romney; all polls with smaller than 1000 respondents favor Obama (or are tied).

Zombie wonders why?  I think I have the answer, and it’s the same trend that I noticed four years ago.  Notice that the polls with the three largest sample sizes are not media-affiliated polls.  Gallup, Rasmussen, and SurveyUSA are polling for the sake of polling.  All of the other polls are conducted at the behest of a media organization, and in 2012 that means a media organization that is on a very tight budget.

I contend that it is not media bias that drives these differing results, but is instead cheapness.  Good polling is very expensive.  To get a sample that is representative of the voting population means that you have to make many times more calls than you need actual respondents.  The best polling organizations ask about your voting history and from that they determine your likelihood of voting.  If you voted in the primary and the mid-term election in 2010, you are more likely to vote again this year.  If you didn’t vote then and you don’t know the location of your polling precinct, you are unlikely to vote now.  Polls conducted on the cheap ask people to self-identify whether or not they are likely to vote and take people at their word.  Culling those unlikely to vote means turning away a potential respondent that you finally got on the phone and who is willing to answer your questions.  It is a lot easier and cheaper to use a looser likely voting screen and a smaller sample size than it is to do polling right. 

(This, BTW, explains why most polling organizations don’t change to even their porous version of a likely voter screen until after Labor Day.)

* NOTE:  (Updated to change “likely” to “registered” in the last sentence of the second to last paragraph.)

Doug Mataconis produced this chart from Federal Election Commission Data.

voters.jpg
There are a few dates of note that explain significant shifts.  I don’t know about 1964 and 1968, but in 1960 there was at least one state (Alabama) that allowed individuals to vote for each of the state’s electors.  That meant that one could hypothetically cast eleven ballots for president.  This probably explains why turnout exceeded 100% of those registered to vote.  The next date is 1972 when you see a ten-point drop in the percentage of registered voters who voted.  That is attributable to the addition of college-age registrants who voted at a much lower rate.  1996 was the first election after motor-voter laws that essentially gave you “free” voter registration whenever you got a driver license.  Registration went up, but overall turnout between 1988 and 1996 was essentially unchanged.  I say all this to say that you can pretty much disregard turnout percentages from before 1996.  Finally, 2008 turnout as a percentage of registrations is hard data to come by.  If we use Mataconis’ number of 62.2% of the voting age population (Larry Sabato has it about a point lower), we can project that the percentage of registered voters who voted was no more than 75%.

Share this post:

11 Responses to “Garbage in = Garbage out”

  1. tex Says:

    “A lot of Democrats are going to be surprised by the results.”
    - In which direction?

    “Ohio is the penultimate bellwether state.”
    - If Ohio is the penultimate, which state is the ultimate in your view?

    Ed: Iowa is the ultimate bellwether. Since 1992 it has always gone with the popular vote winner. However, with only 7 electoral votes, it just doesn’t get the attention that Ohio gets.

    As for your first question, an Obama win, if it occurs will be by the slimmest of margins. It is only Mitt Romney who has the potential to surprise to the upside.

  2. Dan Says:

    The people likely to answer these calls are not representative of the population at large.

  3. Johnny Says:

    You may have already been accounting for this, but after a quick read, I was wondering whether you accounted for the bias in the data that given the fact that you were willing to sit through the phone survey, you already are more likely to vote than your run-of-the-mill registered voter. What are your thoughts on how that plays into what percent of this subset of the registered voter population is likely to vote (or has already voted)?

  4. Why Most Polls Are Likely Wrong This Year Says:

    [...] Bob Krumm explains: Of the recent polls for which I have internal numbers, only Rasmussen’s turnout percentage assumptions are realistic.  The other four polls use a sample that assumes an election turnout of between 86 and 93 percent.  That simply is not going to be the case.  From this small sample it appears that Rasmussen is not the outlier it is often accused of being.  Instead, other polling organizations in the current RCP Average employ a likely voter screen that removes only 7% to 14% of registered voters from the sample pool, when we know that about 30% of registered voters are not going to show up to vote. [...]

  5. JF Isher Says:

    I was going to say this but I refreshed and saw that I was beaten to it:

    The people likely to answer these calls are not representative of the population at large.

    I mean, you say that the voters in Ohio are inundated, do you think they’re going to sit through all the crap just to try to get out of listening to all the crap? Pollsters only get a 10 percent response rate. And Rasmussen is often accused of being the outlier because it has been in the past. Also, one word, 2010.

    If anything pollsters are putting too much emphasis on enthusiasm. I’m not going to sit around and take phone calls all day, but I always vote.

    However, your article is well-written and thoughtful :)

  6. JustKarl Says:

    A nitpick:

    Your statement that “the 1,382 ‘likely’ voters they identified out of a poll of 1,764 adults would indicate a voting age population percentage of 78%” is incorrect, inasmuch as the VAP is considerably larger than the pool of registered voters. That’s why, for your purposes, the chart of FEC data is more relevant.

    Ed: I agree that the VAP is a much larger population, but I’m not sure of your point. BTW, I’ve been engaged in an email exchange with a left-leaning mathematics grad student who isn’t comfortable with the registration numbers being the basis for comparison. I think there’s merit in that argument. Thus comparing likely voter poll respondents as a percentage of adults polled with the percentage of the VAP that works would be more accurate. Unfortunately, only the ABC/WP (78%) and the AP-GFK (71%) polls give us those numbers. That is still much higher than even 2008 when turnout as a percentage of the VAP reached its record high in the post-26th Amendment era of about 62%.

  7. PolisciGuy Says:

    Sorry, but is this the same Bob Krumm who predicted a McCain EC victory just days before the 2008 election? If so, as much as I’d like to agree with your analysis, I think it has to be taken with a massive grain of salt.

    Ed: Then you should stick around for Part II when I talk about what delusion looks like from the inside.

  8. Mark Says:

    The percentage of likely voters among registered voters in a pollster’s results is not meant to be a projection of the national turnout. In order for it to be so, you’d have to assume that the probability of an individual answering a pollster is independent of their probability to vote, which is definitely false. Pollsters have ways of estimating and correcting for this non-response bias, but the end result is that you should not assume that a survey is projecting 80% turnout just because their likely voter sample is 80% of their registered voter sample.

    Without that key assumption, I don’t think there’s much of an argument left to be had in this article.

    Ed: The problem with your counter-argument is that there is ample historical evidence of a social acceptability bias when respondents are asked whether or not they intend to vote. Corroborating evidence for that is the high likelihood of passing through likely voters screens employed by most polling agencies. The best polling considers not just your answers, but your level of knowledge–ie., can you name the candidates running for office, your age, income levels, education levels, etc. Those are factors that in most years are relatively stable among the voting population. (2008 and 1992 may have been recent exceptions to those rules, but 2012 is likely to revert more to historical norms.)

    As for correcting for non-response rates . . . the right way to do it is to query non-responders and then test the hypothesis that their results differ from the sample population you polled. The cheap way of doing it is to simply weight a non-representative sample, which is the method you implied above. Most open-source polls use the cheap method. Only the campaigns are looking at real polls that factor non-responders, not simply weight to account for them. Garbage in = Garbage out.

  9. Carl W. Edwards Says:

    I’m not the ultimate polling wonk, but I’m learning quickly. Please add me to your mailing list. …………………CWE

  10. Dennis boznango Says:

    If any of those other polls show results for Registered voters, it’d be nice to see them on the chart like you did for Gallup.

    Informative work here.

    Ed: I wish that they did. That would remove more subjectivity. Yesterday’s poll from National Journal gave us the count of adults, but it didn’t give us the results of how they would vote.

  11. Garbage In Is….Garbage | Daily Pundit Says:

    [...] This guy, Bob Krumm, explains it well. [...]