Friday, September 14, 2012

Catch of the Day

One for Nate Silver, for this about some new swing-state polls showing Barack Obama in the lead:

We have seen a shift toward Mr. Obama in the polls since the Democratic convention. It appears that if an election were held today, he’d win it by somewhere in the neighborhood of four or perhaps five percentage points.

If Mr. Obama is ahead by four to five points nationally, we’d certainly also expect him to post his share of leads by about that margin in swing states. Because of statistical variance and differences in methodology, some of the numbers are going to be a little bit better for him than others. But the consensus of the data ought to quite strong for him.
Remember, the first assumption should be uniform swing. If Ohio and Florida figure to be dead even if the national race is dead even, then they figure to show a five point lead for Obama if he's up five nationally. Indeed: if that's the situation, then a new state poll showing uniform swing isn't new information about that state.

Now, we know that we don't really have uniform swing, and in a very close race it can matter a lot if Ohio, say, swings a bit more and Utah swings a bit less.

If it's not a close race, however, that stuff just doesn't matter. More than that: we're still more likely to be mislead by a state poll that's a bit off, especially a single state poll, than we are by just looking at the national polling averages and just assuming uniform swing. That's going to change some over the next couple of weeks, as we get a lot more state polls (meaning that we'll start having meaningful averages of state polling independent of the national polls), but we're not there yet.

Perhaps, at the risk of redundancy, I should explain a bit more. Say we get an Ohio poll today showing Obama +5. Well, what we want is to make an Ohio polling average. But if we're only getting one Ohio poll a week, then the previous one might be from right after the Democratic convention (maximum Obama bounce!) and the one after that from before either convention. So you can take a simple average of them, but that's pretty useless because what we want to know is opinion in Ohio now, and the other two would be dated. So our choice is just one poll (lots of uncertainty) or averaging using outdated polls (meaningless for assessing now).

The best of the polling aggregations sites (Silver very much included) can do a bit more -- they can take each state poll and adjust for how it compared to the national average at that point, thus deriving an estimate for a state that should be a bit more precise than uniform swing. And yet, I'm pretty skeptical that we can extrapolate forward from June to November when it comes to that. Given that once we get beyond a three point national lead the state swings become totally irrelevant, and even with a one point national lead the state swings are probably irrelevant, I'm really just fine with holding off until just after this point in the calendar before I pay any attention to the state estimates at all.

And: nice catch!


  1. Warning: Pedantic attack coming!

    "more likely to be mislead" should be "more likely to be misled"

    The past tense of "lead" is "led." Many, many, many people confuse it with the base metal "lead," probably by false analogy with "read."

    1. omg I just got that. good one, Comrade Anonymous.

  2. I'll try to not re-emphasize this more than once or twice a year, but I have a nice article on my site about why NO American telephone poll should be trusted to be accurate to the claimed margin of error. The problem is that in order to get 1000 completed telephone interviews, the polling organizations are probably having to make 4000 to 5000 calls or more, and the necessary randomness that is required to achieve the stated margin of error is lost.

    A long quote to illustrate the argument: "Some people are concerned that survey calls are only being made to landline phones, not cell phones, and that there would be significant differences in the two populations. Many survey companies, however, are trying to reach cell phones too (using computer-generated random phone numbers, in the hope a certain percentage will reach valid cell numbers). What I’m trying to say is that there are probably subtle yet not-insignificant differences among all the types of situations you reach in trying to use telephones as a survey tool. On a given day, the answering machines, busy tones and not-at-homes may not be evenly distributed among Republicans and Democrats, and the care that the survey company’s interviewers take to be patient with the marginally coherent seniors will have a huge effect on the results obtained from that 5 or 10% of the population. When 60 to 70% of American adults are just not available to even hear you say “would you like to participate in our survey today?,” you just can’t say that you are actually getting a random sample of the public."

    The efforts of Jonathan B. here, and Nate Silver and others elsewhere to average polls is an implicit admission that no single poll is reaching the claimed margin of error -- the results of the individual polls vary far more than would be expected if each was actually reaching a random sample. The averaging of the many polls effectively acts to vastly increase the sample size, and to average out the factors such as whether conservatives or liberals are refusing to participate in a particular poll on a particular night.

    The individual polls are still valuable as "snapshots" for which the claimed margin of error should be doubled or tripled, however the high degree to which averages of polls are more accurate than individual polls serves, in my opinion, as a proof of the argument I'm making.


Note: Only a member of this blog may post a comment.

Who links to my website?