A plain blog about politics: Notes on a Nerdfight

Friday, March 30, 2012

Notes on a Nerdfight

Nate Silver posted a very long takedown of election predictions, some of which are by economists and some by political scientists. He proves, without a doubt that some people, or at least some book publicists, have not been at all careful about their claims (I don't mean to be snide, there; being careful about one's claims is actually about the most important thing an academic can do when taking their work public). His conclusion:

The “fundamentals” models, in fact, have had almost no predictive power at all. Over this 16-year period, there has been no relationship between the vote they forecast for the incumbent candidate and how well he actually did — even though some of them claimed to explain as much as 90 percent of voting results.

It's an interesting post, and I certainly agree that calling people on their predictions is a useful thing to do. I started taking notes...and then John Sides beat me to the response. Which is just as well; John knows this part of the world quite a bit better than I do, anyway, and I agree with almost everything he said.. So rather than composing a proper post, I'm just going to splatter my notes out here, starting with three general points, all of which John covered better than I do:

1. This can't be said enough times, and I wish Silver had said it up front. Election prediction models are a tiny sliver of what political scientists do. To the extent that they're looking on as anything but a parlor game, it's as a guide to explanation, not prediction.

2. And with that: it's not my field, and I don't keep up with it nearly as closely as I perhaps should, but really political scientists know quite a lot about voter behavior and elections. Almost all of that has nothing to do -- nor, really, should it -- with making the best election prediction models.

3. What's more the near-consensus among political scientists is pretty simple: the economy plays a major role in elections, but campaign and candidate level effects can also be real.

4. On prediction systems. There are two reasons prediction systems can fail: because the thing they're trying to predict is not predictable, or because they're not very good models. The first would be true if, for example, candidate and campaign effects were very large -- but it also would be true if perfectly predictable effects of economic performance depended on data that were not available until after the event (or even before the event but after the prediction).

5. If the problem was that the models stink, then what we might find are some predictors that do much better than others, even if it just means they stink less.

6. That appears to be the case. Three of the predictors -- Abromowitz, Wlezien & Erickson, and Hibbs -- do quite well. Their average error (not RMSE, just simple of the error Silver reports) for the two-party margin of difference is 3.3, 3.7, and 4.6, respectively. Out of fifteen predictions among them, only a couple are stinkers -- Hibbs on Gore/Bush missed by 8.7 points, and W & E miss that one by 9.5. And they get the winner right every time.

7. Then again, Hibbs is using two variables and no polling. With that, you can get an average miss of under five points? I'll take it!

8. Major warning: it's certainly possible that those three have just been lucky. However, what's reassuring is that they are consistently among the best. So that even in 2000, when none of them do particularly well, they rank 3rd, 4th, and 6th out of 9 predictions.

9. This makes me strongly suspect -- but doesn't prove -- that what we have are good and bad predictors, not an overall failure of prediction-systems-in-general, or something that is impossible to predict. Again, it doesn't prove it!

10. However, treating these three systems as similar to, say, the Lockerbie predictor -- which has missed by 19.3, 12.6, and 8.9 points in its three trials -- doesn't make a whole lot of sense to me. If someone publishes poorly constructed Senate election projections that perform poorly, does it make Nate Silver's Senate projections worse? Which suggests it's not good enough to just look at and average all predictions; you need to look at and critique which ones are well constructed and consistent with what we know generally about elections.

11. Not that I'm doing that in this post!

12. The one disagreement I have with John's response is that he gives the models a pass because they almost invariably have at least picked the right winner, and after all that's what we really care about. I'm not sure that's right. For one thing, it depends a lot on what the point is of doing the prediction. If it's to test what we think we know by projecting out into the future, then our demands are very different than if the reason is to satisfy our curiosity. Both of those are legitimate things to do, but they imply different standards to use in evaluating a predictor.

13. Silver makes much of the distinction between pure fundamentals predictors which include only non-campaign indicators, and those which incorporate polling information. That's reasonable, but I believe (and I haven't looked at all of them) that there's a wide range in how these predictors use polling. Generally, if someone uses horse race numbers from September (and I don't know that any of the models Silver uses do that) then it's not going to be nearly as useful as one that looks at presidential approval many months out.

14. Basically, what we want for an explanatory-type predictor, it seems to me, is something that excludes the influences of the campaign and of non-incumbent candidates. A pre-campaign presidential approval number essentially incorporates the effects of whatever events have happened during the campaign along with any residual popularity of the incumbent. I can see both advantages and disadvantages compared to either ad-hoc dummy variables (for, say, incumbent party while a city was flooded) or ignoring all the events that can't be systematically accounted for (such as the economy or wars).

15. By the way: if the question is whether the models in general work, then I think Silver makes the wrong choice by including multiple version by the same author(s). If the models were updates, then only the last one should count; if they were released together...I'd probably just dismiss those altogether. Silver is of course correct that anyone who releases multiple versions and then only touts the winners is misbehaving. But that doesn't really speak to the question about whether the models as a whole are capturing anything.

16. That said, as I eyeball it, I'm not seeing that it matters much; again, just from a very quick glance, it doesn't appear that the highest-number version does (much? any?) better.

17. Silver mentions the issue of revised data. This is, again, a fairly big deal, although how it affects any of these predictions I have no idea. My general memory of these things is that there were significant post-election revisions in three of the five cycles. In 1992 the revisions were improvements, which would have made Bush a more likely winner and, again just eyeballing, tending to hurt the models' performance. In 2000 the revisions were downward, which would have helped W. and made most of the models' performance better; in 2008, the revisions were again downward, which would have helped Obama, helping some predictors and hurting others. Caveat: that's again based on both my memory and on eyeballing -- and of course different predictors use different numbers, so the revisions could easily affect different models differently.

18. Of course, if the purpose is to successfully predict the future, then it's a fairly big deal if it turns out that the economic data just aren't good enough quickly enough to be able to do so. If it's explanatory, then it's no big deal at all to go back and plug in the right numbers after the fact.

19. I'd be interested to see if the separation between "good" and "bad" models I noted above holds up if final economic numbers are plugged in. Not enough interested to do the work myself.

20. Silver makes much of the spread between the models -- not only are their errors large, but they don't vary together. That's a good point -- but again, if some of these models are better than others, then it's not all that interesting to know that the "bad" predictors are all over the place. The three "good" models are relatively tightly bunched in all five cycles, with no more than a six point spread.

21. I think I'm repeating this for the third time, but it's important: I don't know that the best-performing predictors are actually good, or that the worst-performing are bad. Could be luck.

22. And last point. I can't find it, but if I recall correctly someone (Nyhan?[See update below]) showed that a weighted average of the predictors does an excellent job. Silver says the predictors are still mostly useless averaged; is that true with a weighted average, in which the "bad" models would count for much less than the "good" ones?

UPDATE: Make that this post, reporting on this article.

7 comments:

Matt JarvisMarch 30, 2012 at 5:24 PM
Good points.
The point that I would make to Nate is that you have to actually get under the hood of the models. Just looking at predictions and RMSE or R2 or whatever your metric is misses the point. What would be much more interesting is a comparison of the beta weights in models that include both polling and fundamentals. And, methinks that the fundamentals are going to really swamp the polls.

It's not that polls don't add useful details for the prediction. It's that the basic nature of the race partially determines them, too. I liken it to golf. Yes, the short game is very important. But, if you face 10 degrees in the wrong direction off the tee, you're hosed. In golf, knowing which general direction to face off the tee is easy...but it's also fantastically importnat. Drive to the east on a northerly fairway and you're not going to do very well.
ReplyDelete
Replies
Don CoffinMarch 30, 2012 at 10:10 PM
I don't do election forecasts for a couple of reasons (a) it's not my field and (b) I don't want to look stupid in public. I will, however, say a couple of other things, based mostly on having a little knowledge of models and of statistics. (1) You probably want to look at different levels of elections differently (House of Representatives separately from Senate separately from Presidential separately from state legislatures...). (2) If you're trying to create a model, in most cases you have relativelyfew observations. Take Presidential elections. This year is, what, number 56? 57? If you want to use any sizable number of explanatory variables, you're going to lose all of the 18th century elections and probably all of the 19th century elections and at least some of the 20th century elections. Good luck developing a useful model from what's left.

But I should really let people who create these models defent themselves...
ReplyDelete
Replies
AnastasiosMarch 30, 2012 at 10:26 PM
I have to say, if these predictors are parlor games, and viewed so by the poli sci community, then there is an enormous amount of gassage going on in the press and blogosphere about things that don't amount to a hill of beans (not that such should be a surprise to anyone).

To pick up on doc's point, it would be interesting to know the status of Congressional predictors vs Presidential predictors. There has been a lot of talk the past couple of years, including on this blog in the last couple of weeks, about the inherent illogic of the GOP strategy in the House. Taking tough votes on things that can't pass, backing wildly unpopular policies, following a delusional strategy, etc. Yet, if Abramowitz is right with his Congressional model, this has hurt them ... not at all. He is predicting holding their losses in the House to three and a 6-7 seat pickup in the Senate. That certainly doesn't sound like a party that has been hurt. Is his model flawed (I believe he himself admits that it has a very large error on the Senate side)? Or do all these votes in the House we have discussed on this blog ... actually matter not at all? Does the debt ceiling debacle that we talked about damaging Congress ... actually matter not at all? Is the delusional attachment to unpopular policies actually a delusion on our part (about the impact of that) and not theirs?

Maybe Paul Ryan is right, and there simply isn't a penalty to be paid for going off the deep end. Or maybe the penalty is only paid once the policies actually get passed? If that is so, we are in for a world of hurt, since the Republicans will pay no penalty for their policies until they have actually passed regressive tax cuts and damaged Medicare. Bummer.
ReplyDelete
Replies
AnonymousMarch 31, 2012 at 7:06 PM
When I read Nate Silver, I often get the impression that he has made significant money on predictive models of how certain sports events turned out.... especially with his emphasis on the importance of point spreads. He expresses no opinion on policy or partisanship. Politics has interest to him because it is a new arena for his tradecraft, and apparently pays well. He's a breath of fresh air.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

A plain blog about politics

Friday, March 30, 2012

Notes on a Nerdfight

7 comments:

Jonathan Bernstein

Elsewhere

Ones That I Liked

Blog Archive

Plainly worth reading

A plain blog about politics

Friday, March 30, 2012

Notes on a Nerdfight

7 comments:

Jonathan Bernstein

Elsewhere

Ones That I Liked

Blog Archive

Subscribe To

Plainly worth reading