IAmA blogger for FiveThirtyEight at The New York Times. Ask me anything.

What software do you use to analyze your data?

I use Stata for anything hardcore and Excel for the rest.

Please also include information about your presentation tools (e.g. how do you create the graphics you use on your site, the charts and tables, etc.)

Most of the one-off charts are just done in Excel. It isn't that hard to make Excel charts look unExcellish if you take a few minutes and get away from the awful default settings. For anything more advanced, like the stuff that appears in the right-hand column at 538, I'm relying on the help of the NYT's awesome team of interactive journalists.

And if you write your own, how do you feel about making it open source, like Princeton Election Consortium does?

I'd certainly like to aim to increase the level of disclosure at 538 going forward. Sometimes what happens is that I have best intentions to write a super detailed, 5000-word methodology post, and then some senate candidate does or says something stupid, and I get caught up in the news cycle and it gets forgotten about. Which is a pretty lame excuse, I know. At the same time, 538 is a commercial business and the ability to license proprietary intellectual property is a fairly big part of how I make my living, so the disclosure would probably stop short of outright releasing source code or my database in most cases.

What the biggest abuse of statistics that people aren't aware of?

Overfitting, which I discuss quite extensively in my book, is a way more pernicious problem than most people realize.

Could you please address some of the biggest misconceptions of what it is you do and can do? A lot of "Silver is a wizard who can calculate everything" jokes have emerged, as you have grown in popularity, but often so at the cost of understanding what statistics are actually about.

More often than not, people overrate the reliability of predictions in systems with a lot of complexity. There are certainly exceptions, and presidential elections are almost certainly one of them, but it's a bit weird/ironic that I'm known for one of the exceptional cases.

Hey Nate- Been a big fan for a long time. You had a couple great pieces on your site back before it was picked up by NYT about the futures of different domestic issues (i.e. same-sex marriage, drug legalization, etc.) and I found them to be really insightful. Your analysis of same-sex marriage in particular stuck with me--you highlighted, if I'm remembering correctly, a linear path and an "accelerated" path that has been crazy accurate over the last few years. Given your overwhelming success with the electoral side of things, are there any plans to have you continue coverage of specific policy issues or are you going to stick exclusively with the horserace? Also, are you a redditor and/or do you own a cat?

One of the things I'm trying to figure out is what range of topics to cover at 538. After the 2008 election, it became sort of a quantitatively-flavored politics blog, and I think that was something of a mistake. Some things, like cabinet nominations, really do requite careful reporting, and statistical analysis will provide a dollop of color commentary at best. On other days, the lead political story is just gossipy and stupid and isn't really newsworthy at all. So on a day like today, when the Chuck Hagel nomination is the major political story and that doesn't really play into our strengths, I'd rather write about something like baseball instead. The ambition is to expand 538 "horizontally" across topics, based on HOW we cover the news, rather than into the politics vertical, if that makes sense.

We're definitely overdue to do a couple of posts on same-sex marriage, however.

I don't own (or rent) a cat.

Last month, the quant-blogger mathbabe took your book to task for confusing cause and effect. She said, "We didn’t have a financial crisis because of a bad model or a few bad models. We had bad models because of a corrupt and criminally fraudulent financial system ... this is not just wrong, it’s maliciously wrong." She then claimed you were "a man who deeply believes in experts," which is where your book went wrong. Could you address this criticism and defend your conclusions? (full post: (link))

I'd encourage you to read my book and ask whether she fairly interprets my hypothesis. I don't think she does. The financial crisis chapter is quite explicit about asserting that the credit ratings agencies were not just stupid, but also a bunch of dirty rotten scoundrels, so to speak. And the book is generally quite skeptical about the role played by "experts".

For aspiring applied statisticians, what do you think are the best and hottest new skills to learn and add to one's resume?

Maybe this is too vague, but I think the most important thing is just to lessen the amount of book-learnin' that you do and start to play around with some data sets instead.

What's been the strangest experience you've had due to your sudden fame?

When I was in Mexico last week, I got recognized at the top of the Sun Pyramid at Teotihuacan, which I'm pretty sure really is a sign of the Apocalypse.

Were the Romney campaign predictions a result of bad polling, analysis, or just group think?

Groupthink and perverse incentives were the causes; to the extent their polling or analysis was bad, it flowed from that.

can you prove whether gun control would make America safer?

It's a tricky problem, statistically. The issue is that while gun ownership rates could plausibly be a cause of fatal crimes and accidents, it can also be a reaction to it, i.e. people purchase guns because they feel unsafe.

I'm not saying that the issue is intrinsically inscrutable. But it's something that more requires a PhD-thesis-level treatment than a blog post to really add much insight, I think.

How would you fix baseball Hall of Fame voting?

I'd probably lower the threshold for players getting dropped from the ballot, from 5 percent to 2 percent or so, or have some sort of a sliding scale where the threshold depends on how many times a player's name has appeared. It now seems plausible that Alan Trammell will eventually get in, for example, and it's a little weird that Lou Whitaker got dropped from the ballot years ago when he might otherwise be gathering some support along with Trammell right now.

Are you ever going to finish your Burrito Bracket Project?

Perhaps I can convince Penguin that my next book should be a 256-taqueria burrito bracket with entries from all across the country.

Be honest. How much did you enjoy getting the ire of pundits (not the few who actually critiqued your method, models, or assumptions, but those who just dismissed your work wholesale)? Was there a part of you that wrung your hands together, laughed a tad manically, and egged them on to continue, since all they were doing was bringing more attention to your work and the lack of rigor in their approach?

At some point in the last few weeks of the election, I guess I decided to lean into the upside outcome a little bit in terms of pushing back at the pundits in my public appearances -- as opposed to emphasizing the uncertainty in the model, as I had for most of the year. (Nothing about the model design itself changed -- just how I tended to talk about it.)

Stupid poker analogy: part of playing well is in maximizing the amount of value you get from a hand in the event that things go well, in addition to mitigating your losses if they don't.

Nate, do you think most of the popular news sources (cable, network, newspapers) intentionally overlooked the data analysis from you and those like you in order to hype up the 2012 election?

News organizations tend to have incentives to "root for the story". Part of what were were saying for much of the campaign -- both at different stages of the general election and perhaps even more emphatically in the end-stage of the primary when Romney pretty much had things wrapped up -- is that the outcome had become fairly certain. So that creates a bit of a culture clash.

Is sabermetrics useful in soccer?

Traditionally, soccer leagues just kept track of goals and bookings, and there's only so much value you can mine from that data. But I know that the EPL and MLS are starting to track all other sorts of statistics as well: tackles, passes, time of possession, etc. Would be interesting to explore that at some point. I suspect there is some low-hanging fruit since the soccer culture (even more than in most American sports) tends not to be very data-friendly.

At the end of the day, what would it take for a 3rd party candidate to seriously challenge for, or even win, the presidency? Was Perot a once in a lifetime phenomenon, or is there a possibility of something outside the 2 party system?

Historically, periods of greater polarization are associated with better performance for third-party candidates, so the chances of a successful independent campaign are probably higher than average. However, that still might mean there's 3 or 5 percent chance of an independent candidate winning the 2016 election as opposed to a 1 or 2 percent chance. You might need a perfect storm where (i) Obama is perceived as really having screwed up and (ii) the Republicans nominate someone terrible and (iii) someone VERY talented runs and takes his campaign very seriously and (iv) then gets a few breaks in the Electoral College, etc. None of those individual steps are impossible, but the odds against the parlay are pretty long.

As an Econ major, how did you gain your statistics background?

Mostly from trying to win my fantasy baseball league and my NCAA tournament pool.

Which do you find more frustrating to analyze, politics or sports?

Politics. I don't think its close. Between the pundits and the partisans, you're dealing with a lot of very delusional people. And sports provides for much more frequent reality checks. If you were touting how awesome Notre Dame was, for example*, you got very much slapped back into reality last night. In politics, you can go on being delusional for years at a time.

  • Full disclosure: I said in a NYT video yesterday that I'd bet Notre Dame against the spread.

In a recent profile, you stated you wished not to be known as a "gay statistician" but as a statistician who happens to be gay. Isn't that a bit naive in today's political and social climate? Don't you think that whether you like it or not, people will treat you differently because you are gay and that your identity as a gay man cannot be limited to your private sexuality? As someone so ubiquitous now in the public sphere, should you be addressing issues in your writing that are related to gay rights as much as baseball?

It's a complicated issue that maybe doesn't lend itself so well to the reddit treatment.

My quick-and-dirty view is that people are too quick to affiliate themselves with identity groups of all kinds, as opposed to carving out their own path in life.

Obviously, there is also the issue of how one is perceived by others. Living in New York in 2013 provides one with much a much greater ability to exercise his independence than living in Uganda -- or for that matter living in New York forty years ago. So perhaps there's a bit of a "you didn't build that" quality in terms of taking for granted some of the freedoms that I have now.

And/but/also, one of the broader lessons in the history of how gay people have been treated is that perhaps we should empower people to make their own choices and live their own lives, and that we should be somewhat distrustful about the whims and tastes and legal constraints imposed by society.

Very simple: Do you prefer Chicago or New York, and why?

In terms of quality of life, it's very close. But New York is a lot better for someone working in "the media", and probably also more broadly for most people who are super ambitious about their careers. One of the big cultural differences here -- very much for better and worse -- is that people are often very career-driven well into their 40s, 50s, 60s.

Your ability to predict election outcomes has lead to your work moving election betting markets... have you ever been tempted to profit via these markets?

Tempted, yes, but sometimes resisting temptation is a good thing.

As a baseball nerd (Go Royals!) and a politics/elections nerd, Thank you for doing an AMA. One of the manifestations of my political nerddom involves me finding and entering election results on a website. Really. Seeing as you're someone who has had some success with the idea of using actual election-related data to shape an idea of what could happen in the future, I've got a question to ask you: ”Is there a reasonable opening for a baseball-reference equivalent for the world of campaigns and elections and such? Are there enough solid facts that can be cataloged for such an effort?” To add extra notes if this helps you answer or understand the question: I say solid facts since a lot of the ratings that get associated with politics have their flaws. Some are blatantly cherry picked (Interest Group scores), some could be a bit inadequate (National Journal) and on the other extreme, some could just be too much like Earnshaw Cook to really sink in with people. The basic information about who got so many votes in such and such election is out there, although it's a bit dispersed in a sense. Some of it is in books, some is in databases (ICPSR). Some states with interesting electoral histories have a lot of results out there (West Virginia, Louisiana), and some with interesting histories don't have a lot of their results online (Michigan, Mississippi) I could probably write way way too long on the topic of what's out there state-by-state but your post isn't titled “I am Nate Silver, I predicted the freaking election, what's YOUR line?” I just find it quirky (in a way) that we have box scores online for every major league baseball game from 1918. But the same really can't be said for elections held in 1918. That difference might reflect a difference between people who do political science and people who research baseball. I'm sure you've had your fun going out to find the electoral data that is necessary to figure out the future and such, so if you could add some insight about if the data can be disseminated to more people, that'd be really cool. And go Royals too.

Sorry for a brief answer to a very long question, but I've long been surprised that there isn't an Sean Forman had better get on that or I might steal the idea.

Do you believe the theory that Anonymous stopped Karl Rove from stealing the election via hacking electronic voting machines?


Given that Barry Bonds will likely be declined a first-ballot visit to the Hall of Fame tomorrow, is there any way to look at numbers from the steroid era (both for those implicated, and those that just happened to play in the era) such that they show actual performance? Essentially, can we actually make any assessments of numbers from the steroid era?

If we had a list of exactly who used steroids and when, you could do a lot of clever things. But we don't, and the sample of alleged and actual steroids users is liable to be nonrandom and biased in various ways.

Your prediction of 2012 presedential elections gave Romney a ~20% chance. That's lower than ~80% of Obama, but it was still somewhat possible. If Romney had won you would have not been proven wrong (things much less likely than 20% happen all the time), but how would you have handled it? What you would you say to people who would say you were wrong? How would you defend math? Edit: Also, do you have any opinion on the way students are graded (not an american, so the overall stuff)? Is there anything wrong with it? Something you would change? I know it doesnt seem a very statistics related issue, but statistics play an important part in it.

Intellectually, the defense is pretty simple, which is that 20 percent outcomes happen 20 percent of the time. In fact, the 20 percent outcomes are supposed to happen 20 percent of the time (not substantially more OR substantially less) or you've calibrated your model incorrectly.

OK, not quite that simple: any time a low-probability event occurs (although I'm not sure that I'd describe a 20 percent outcome as a "low-probability event") you ought to be asking whether your model of the universe was correct, particularly in cases where there is a considerable amount of structural uncertainty. The answer may well be "yes" -- you shouldn't necessarily be in a rush to change your model and there can be harm in doing so -- but you should be posing the question.

But I have no illusion: this defense would have been less than persuasive to many people. If you watch a poker hand, and a guy gets all-in before the flop with aces against kings (an 80/20 bet), our animal instinct is very much to tag him as a LOSER if a king comes up on the flop, even though he probably played his hand perfectly. So I'd just have had to take my lumps and acknowledge that I'd been very fortunate in many respects in life (i.e. often getting much more credit than I deserved) up through 11/6/12.

Would you vote for Barry Bonds or Roger Clemens to get into the Hall of Fame? Edit: Spelling

Yes, I think, in large part because the split-the-baby solutions to steroids use are hard to apply in practice. I might use steroids use as a tiebreaker for otherwise very close cases (and I think McGwire, Sosa and Palmeiro all fall into that category). But I don't think people should pretend that we can put each player's stats through some kind of algorithm and come up with "steroid-neutral" statistics. We just don't know all that much about who did and didn't use steroids, and when.

What are your thoughts on data-driven metrics for teacher evaluation? Do you think a system that accurately reflects teacher value could ever be created, or will it always be plagued by perverse incentives (teaching to the test, neglecting certain types of students, etc)?

There are certainly cases where applying objective measures badly is worse than not applying them at all, and education may well be one of those.

In my job out of college as a consultant, one of my projects involved visiting public school classrooms in Ohio and talking to teachers, and their view was very much that teaching-to-the-test was constraining them in some unhelpful ways.

But this is another topic that requires a book- or thesis-length treatment to really evaluate properly. Maybe I'll write a book on it someday.

All the karma should go to Mike Bostock and D3 (link)

Yes, definitely. The New York Times guys really are the very best at the world at this. Part of that is because they really are journalists in addition to being programmers and/or graphic artists: the goal is to communicate complex information clearly and accurately, and not just to make something cool or pretty. There should be a Pulitzer category for this stuff.

Nate, do you think you can come up with a system for college football that is better than the BCS?

Yes, it's called a playoff. Ideally an 8- or 12- or 16-team playoff, I think.

The irony is that of all college and professional sports, NCAA football is the one that might most necessitate a playoff because 12 games just isn't enough to tell you very much -- especially when many/most are played against mediocre competition. If instead a team needs to win 3 or 4 games against top-flight opponents to win the national championship, you can say with a bit more confidence that they're deserving.

Are you concerned that during future elections, the accuracy of your predictions will lull readers into a mindset of "it has been foretold, therefore I needn't bother to vote"?

It worries me a bit. There is probably a danger zone in which a candidate's supporters take for granted that he'll win the election and so don't turn out to vote, but the election is nevertheless close enough for him to lose. That may have happened in the Democratic primary in New Hampshire in 2008, for example. There were a lot of reasons why Hillary beat her polls, but one contributing factor may have been that a lot of independent voters who would otherwise have voted for Barack chose to vote in the GOP primary instead since it seemed more competitive.

At what point did you feel the 2012 Presidential Election ceased being a 'close race'? And do you think other media entities who maintained it was until the end were simply not in agreement with you, or kept towing that line to keep ratings up? Also, what did you view as the biggest missteps during the election?

2012 was a reasonably close election. Not 2000 close, obviously, but closer than average.

The distinction that got lost a bit was between closeness and uncertainty. If a baseball game is 3-2 in the bottom of the 9th inning and you've got Papelbon on the mound or whatever, it has definitely been a "close" game but not one in which the outcome is in all that much doubt.

Less abstractly: when it became clear (i) Romney's "momentum" from Denver had begun to recede and (ii) that the final major news event of the campaign (Hurricane Sandy) was working to Obama's benefit, some of the uncertainty was removed.

Is it correct to assume that sabermetrics will never work in football and basketball like they do in baseball? And if so, is that because baseball is much more of an individual sport, or are there other reasons as well? (Edit: By an individual sport, I mean that for the most part it's pitcher vs. batter, with anything happening after that only a result of the initial matchup. This is not like football, where even a simple five yard run only happens because of many moving parts, i.e. blocking, and thus makes it much harder to grade anyone on a completely individual level.)

Well, I guess I'd put it like this: statistical analysis may not get you as far in basketball* or (especially) football as it does in baseball. But it still probably gets you much further than in most industries.

  • A lot of NBA teams (especially the ones that win a lot) have become VERY sophisticated about their decision making. Basketball may be closer to the baseball than the football end of the spectrum, both in theory and practice.

