Before I tell you that, I’m going to let you peer inside the gonculator that spat out the results. This isn’t a fancy machine of myriad weights and measures. Instead, it’s just a little bit of applied statistics to a problem of high incertitude.
About two weeks ago I noticed that there appeared to be a correlation between a poll’s strength of likely voter screen and Mitt Romney’s level of support. Another way of saying that, is that the greater the percentage of adults included in the likely voter portion of a poll, the worse Romney’s result. At the time, I only had a few data points and only one pairwise comparison. That is, only one of the polls told me the results of its sample of likely voters and the results of its larger sample of registered voters.
Since then I have a few more data points, a few more pairwise comparisons, and a corroboration of my thesis that higher turnouts significantly aid President Obama’s poll numbers. This evidence came in the form of the Pew poll released Sunday which gave the presidential preference for unlikely voters. By a margin of 65-23, the small portion of their sample that they identified as unlikely to vote, wanted Barack Obama to win. The Pew’s overall sample of likely voters gave Obama only a three-point edge. Additionally, the poll indicated that “Romney’s suporters continue to be more engaged in the election and interested in election news than Obama supporters, and are more committed to voting.” In other words, the lower the turnout, the better the result for the Republican. No, that’s not an earth-shaking revelation, but I think that I’ve figured out a way to estimate by how much turnout effects Romney’s support.
Let me start by saying that politics is not baseball. I admire Nate Silver’s analytical skills, but taking models that work well in one forum does not necessarily mean that they will translate to another. The biggest difference between the ballpark and the political arena is the human element. There are never more than 13 players on a field at any time. If you add in the 4 umpires (2 more if it’s the post-season), the managers, and the first and third base coaches, plus the official scorer, there are a total of no more than two dozen people involved in determining what happens at every plate appearance. In presidential politics, that number is in the tens of millions.
Furthermore, in baseball there is near certitude about what actually did happen. Sure, occasionally an umpire blows a call, or an official scorer counts an error as a hit. But it’s actually quite rare. In politics there is far more uncertainty, not just about what is about to happen, but even about what did happen. The last polls in 2000, for example, were agnostic as to the effect of the late revelations about Bush’s decades’ old DUI arrest; so extrapolating conclusions from those last poll results was and is dangerous.
Polling, which is the entry point that goes into Nate Silver’s model, is not like a box score–which is a near exact representation of an historical event. Instead, it is an educated guess about what the current landscape is. Not what it is going to look like. And definitely not what it is certain to be. That is very different from the data that we have with which to evaluate baseball.
Let me make an analogy that may help to explain the logic by which I have arrived at my prediction. Imagine that, if instead of having a database full of actual baseball records, your historical data was a collection of estimates pooled from baseball experts who were somehow able to consider the running speed of the batter and his teammates on base, the positioning and fielding ability of the fielders, and, in that split second after the batter made contact, predict what the outcome of the plate appearance was going to be. Obviously, if we could gather enough data–speed and trajectory of the ball when it left the bat, for instance–we could probably guess more often than not if the hit ball was an easy out or a home run. The model that Nate Silver and many other sabremetricians use brilliantly considers exactly these variables.
But there’s one more variable: imagine that those making their predictions don’t know the ballpark in which the game is being played. They have each assumed a different ballpark when making their predictions, but usually they don’t tell you which one it is. The expert who has assumed that the game is played at Wrigley with the wind blowing out, is likely to record a sharply hit ball to left field as a probable home run. On the other hand, if another expert assumes that the batter is at Fenway, he may conclude that the ball will bounce off the Green Monster and yield only a long single.
Every one of the polling companies is assuming a different ballpark. And when you have a 39-point gap between honest unlikely voters and the members of your sample, the inclusion of dishonest unlikely voters is likley to significantly skew your result. While all the polls in the RCP average are currently using likely voters in their models, they have each defined likely voters differently.
So what I’ve done is to try to estimate the effect of an unlikely voter in their sample. To do that I’ve taken the number of likely voters in each poll and listed it as a percentage of either registered voters or adults in their sample, if either or both numbers are given.
If we were able to know the entire random sample of adults from which poll respondents were chosen, we could estimate implied turnout percentages in the polls. In other words, we can estimate the size of the political ballpark that each of the polling companies is using to get their results. Four polls gave us enough data to calculate the implied turnout in their samples. But three of them also told us the results of their poll for all registered voters. Across the four different polls for which we can gather data, turnout ranges from 68.6 to 73.2 percent of the voting age population. As a percentage of registered voters, it ranges from 75.5 to 88.5 percent. That gave us two different data points for each of these polls, and from there we could estimate the straight line effect of increasing turnout percentages on presidential preference within individual polls.
Below are two charts, showing the results of these polls. The first chart is expressed as a percentage of all registered voters. The x-axis is the percentage of registered voters in their sample. The left-most point on each line is the poll result. The right-most point shows what happened to the result when they polled all registered voters from their sample. The second chart is as a percentage of the voting age population. Where the number of registered voters in the sample was given by the polling company, it is shown as a percentage of all adults contacted in the sample.
I wanted to see if the slope of each line was similar enough to use for projection. Admittedly, this is a small sample, but all slopes were negative and ranged betweenh -0.09 and -0.35, which was roughly similar enough for me to use for the purposes of this estimate. The average of the five slopes was -0.220, meaning that for every one percent of unlikely voter included in a sample, Mitt Romney’s lead decreased by 0.22 percent.
Knowing the slope of our estimate of the influence of oversampling unlikely voters, all I needed to do was to figure out where to place the line and project what turnout is likely to be. I chose to calculate turnout as a percentage of VAP. That is a less subjective measure, as the number of registrations, and thus the size of the denominatior, often fluctuates a great deal. (In Cleveland’s Cuyahoga County, for example, the purging of old data scrubbed 200 thousand names from voter rolls in the last four years, even though Cleveland’s population didn’t fall nearly that much.) Turnout is usually in the mid 50s, and since 1972, has never exceeded 62.6% of the VAP. I have chosen to use an estimate of 60 percent.
As for where to place the line, I began with the RCP average* of Obama over Romney by 48.3 to 47.8 percent. And I decided to place that at 70% on the VAP scale, since our three known data points on that scale are between 68.6 and 73.2 percent turnout. That means that an oversample of 10% unlikely voters in our samples gives Barack Obama an advantage of 2.2% above where he would sit if turnout is 60% instead of 70%. Taking 1.1% away from Obama and giving 1.1% to Romney gives us a new estimate Romney over Obama by 48.9% to 47.2%.
Finally, to account for 3.9% that are undecideds or others, I just split them proportionately while leaving 1% for other candidates. That gives us a final prediction of Romney over Obama by 50.4% to 48.6%. In other words, I am projecting a result similar to Scenario 3, giving Mitt Romney approximately 295 electoral votes.
Let me make some caveats:
- I’m not sure that there is going to be any effect on the election from Hurricane Sandy. If there is, it is likely to be contained to two states that I don’t expect to be in play (NJ and NY). I do expect that there has been an effect on polling, but having no means of predicting what it is, I’ve ignored it.
- I’m not really certain that there is a last-minute surge to Obama that some are seeing. It would be an anomaly if there was a move in the incumbent’s direction. Instead, I think that we’re seeing people automatically included in the likely voter pool because they said that they have already voted, when in fact they did not. Routinely I’m seeing polls reporting early voters above what actual numbers from secretaries of state are indicating. Part of this is the social acceptability bias that causes people to say that they are going to vote, but then they don’t. Part of this is a respondent saying that he has already voted in the hope that the guy on the other end of the line with yet another political call will hang up. Even before the existence of large numbers of early voters, predicting turnout was always difficult. Predicting it now in the midst of early voting is even more difficult.
- If there is a last-minute break to the challenger beyond the proportionate breakout of undecideds that I’ve assumed, then expect Scenario 4. We’ll know that when we see Ohio’s returns tomorrow. I hope to do a detailed analysis of the Buckeye State describing what to look for in order to get a sense of how things are breaking. (UPDATE: posted here) Bottom line: if Ohio breaks hard one way or another: expect somewhere between Scenario 1 and Scenario 2 if it goes hard for Obama early, or Scenario 4 (or even Scenario 5) if it goes Romney’s way.
- If turnout falls to the recent historical average of about 55%, this model would give Mitt Romney another one point advantage.
- As for the individual states: I fully expect that because of his investment in the Buckeye State, that Barack Obama does better in Ohio than he does nationally. The same happened with John McCain when the entire nation shifted about ten points from four years before, but Ohio only moved about 7 points in the Democratic direction. With a popular vote win of slightly under 2 points, it is not inconceivable that Mitt Romney could still lose Ohio. However, by winning by that much nationally, he will have put away Colorado, Florida, and Virginia, while Iowa, New Hampshire, Pennsylvania, Wisconsin, and even Michigan will be teetering on the edge of tilting red. That’s too much territory for Obama to defend and to expect them all to go his way. (Thanks for alert reader Trent Telenko for bringing this to my attention; it may explain why polling in Michigan indicates that the Wolverine State appears more red than its PVI would lead us to believe.)
- Finally, if I had to give you my margins of error on the spread of 1.8 points, I’d swag it at +/- 2 points. In other words, somewhere between a modified Scenario 2 (an Obama squeaker) and a version of Scenario 4 (a solid Romney win). And yes, if you’re keeping score, that means that I’m projecting that these states already: Florida and North Carolina, for Romney (minimum 235 EV), and Connecticut, Maine (less the 2nd district), New Jersey, New Mexico, and Oregon for Obama (minimum 190 EV). Only 113 electoral votes are still in play (bluest to reddest: Nevada, Minnesota, Michigan, Maine 2nd, Pennsylvania, Wisconsion, Iowa, Ohio, New Hampshire, Virginia, and Colorado).
- If you’re wondering why I’m so confident about Flordia: Early voting is not going well there for Obama. Not at all. By the same token I could probably chance a call on Colorado, but not quite.
* The top line on the Pew poll was Obama over Romney 50-47. However, the poll result was 47-45; they then allocated undecideds to arrive at their projected result. I used their raw numbers without undecideds allocated. All numbers are calculated from the RCP average at approximately 1800 Central Monday 5 November. With the change in the Pew poll, the RCP average at that time was Obama over Romney 48.3 to 47.8 percent.