Post-Piketty Lessons

The latest crisis in data analysis comes to us (once again) from the field of Economics. Thomas Piketty, a French economist recently published a book titled Capital in the 21st Century that has been a best-seller. I have not read the book, but based on media reports, it appears to make the claim that inequality has increased in recent years and will likely increase into the future. The book argues that this increase in inequality is driven by capitalism’s tendency to reward capital more than labor. This is my non-economist’s understanding of the book, but the details specific claims of the book are not what I want to discuss here (there is much discussion elsewhere).

An interesting aspect of Piketty’s work, from my perspective, is that he has made all of his data and analysis available on the web. From what I can tell, his analysis was not trivial—data were collected and merged from multiple disparate sources and adjustments were made to different data series to account for various incompatibilities. To me, this sounds like a standard data analysis, in the sense that all meaningful data analyses are complicated. As noted by Nate Silver, data do not arise from a “virgin birth”, and in any example worth discussing, much work has to be done to get the data into a state in which statistical models can be fit, or even more simply, plots can be made.

Chris Giles, a journalist for the Financial Times, recently published a column (unfortunately blocked by paywall) in which he claimed that much of the analysis that Piketty had done was flawed or incorrect. In particular, he claimed that based on his (Giles’) analysis, inequality was not growing as much over time as Piketty claimed. Among other points, Giles claims that numerous errors were made in assembling the data and in Piketty’s original analysis.

This episode smacked of the recent Reinhart-Rogoff kerfuffle in which some fairly basic errors were discovered in those economists' Excel spreadsheets. Some of those errors only made small differences to the results, but a critical methodological component, in which the data were weighted in a special way, appeared to have a significant impact on the results if alternate approaches were taken.

Piketty has since responded forcefully to the FT's column, defending all of the work he has done and addressing the criticisms one by one. To me, the most important result of the FT analysis is that Piketty’s work appears to be largely reproducible. Piketty made his data available, with reasonable documentation (in addition to his book), and Giles was able to come up with the same numbers Piketty came up with. This is a good thing. Piketty’s work was complex, and the only way to communicate the entirety of it was to make the data and code available.

The other aspects of Giles’ analysis are, from an academic standpoint, largely irrelevant to me, particularly because I am not an economist. The reason I find them irrelevant is because the objections are largely over whether he is correct or not. This is an obviously important question, but in any field, no single study or even synthesis can be determined to be "correct" at that instance. Time will tell, and if his work is "correct", his predictions will be borne out by nature. It's not so satisfying to have to wait many years to know if you are correct, but that's how science works.

In the meantime, economists will have a debate over the science and the appropriate methods and data used for analysis. This is also how science works, and it is only (really) possible because Piketty made his work reproducible. Otherwise, the debate would be largely uninformed.

This entry was posted in Uncategorized. Bookmark the permalink.
  • Robert Link

    Interesting. My understanding is that Piketty's critics were claiming that he, like Reinhart and Rogoff, had errors in his Excel spreadsheets. Taken together those two incidents make a strong case that maybe it's time we admitted that spreadsheets are not a reasonable tool for serious data analysis (let alone modeling and simulation, which, believe it or not, I have seen). They are clunky and error-prone, not to mention slow. I get that people don't like having to learn to use new tools, but honestly the gains in terms of both productivity and lack of screwing things up make this a no-brainer.

  • Keith O’Rourke


    It does reflect well that Piketty made his data available, with reasonable (if not fully convincing to others) documentation.

    It is naive to think you can take non-randomized observational data "as is" and on an as equal basis (i.e. weight it _equally_ by population size.)

    An attempt to make that clear is attempted here in the first reference given at

    But as you point out that requires one to be an economist with an understanding of how to process multiple sources of evidence (those others capable of being rationally convinced) and these are few and far between.

    But, but I don't think "Time will tell" until we have enough of those few to reach a credible rough agreement (at least within a time frame that is not just of historical interest of a long ago past).

  • James MacDonald

    The funny thing to me is that Economists use Excel for their analyses, as apparently do most other finance types. So debunking someone's results involves clicking about in an Excel workbook, peering at the various functions as you try to chase the thread of their analysis.

    This is the antithesis of reproducible research, and it's inexplicable to me why evidently intelligent people would use such an ill-suited tool for their work.

  • Fr.
  • Travis

    Great coverage! Thanks

  • peter2108

    I have a subscription (for the moment) to the FT and I think that you will be able to read the article at The criticism of Giles given in the FT is very far from reproducible in yor (correct!) sense. A reason for making work reproducible that you do not seem to emphasise is that it exposes it to more severe criticism, and such exposure is according to Karl; Popper the very essence of science.