14
Oct

A general audience friendly explanation for why Lars Peter Hansen won the Nobel Prize

8346

Lars Peter Hansen won the Nobel Prize in economics for creating the generalized method of moments. A rather technical explanation of the idea appears on Wikipedia. These are a good set of lecture notes on gmms if you like math. I went over to Marginal Revolution to see what was being written about the Nobel Prize winners. Clearly a bunch of other people were doing the same thing as the site was pretty slow to load. Here is what Tyler C. says about Hansen. In describing Hansen's work he says:

For years now journalists have asked me if Hansen might win, and if so, how they might explain his work to the general reading public.  Good luck with that one.

Alex T. does a good job of explaining the idea, but it still seems a bit technical for my tastes. Guan Y.  does another good, and a little less technical explanation here, but it is still a little rough if you aren't an economist. So I took a shot at an even more "general audience friendly" version below.

A very common practice in economics (and most other scientific disciplines) is to collect experimental data on two (or more) variables and to try to figure out if the variables are related to each other. A huge amount of statistical research is dedicated to this relatively simple-sounding problem. Lars Hansen won the Nobel Prize for his research on this problem because:

1. Economists (and scientists) hate assumptions they can't justify with data and want to use the fewest number possible. The recent Rogoff and Reinhart controversy illustrates this idea. They wrote a paper that suggested public debt was bad for growth. But when they estimated the relationship between variables they made assumptions (chose weights) that have been questioned widely - suggesting that public debt might not be so bad after all. But not before a bunch of politicians used this result to justify austerity measures which had a huge impact on the global economy.
2. Economists (and mathematicians) love to figure out the "one true idea" that encompasses many ideas. When you show something about the really general solution, you get all the particular cases for free. This means that all the work you do to show some statistical procedure is good helps not just you in a general sense, but all the specific cases that are examples of the general things you are talking about.

I'm going to use a really silly example to illustrate the idea. Suppose that you collect information on the weight of animals bodies and the weight of their brains. You want to find out how body weight and brain weight are related to each other. You collect the data, they might look something like this:

So it looks like if you have a bigger body you have a bigger brain (except for poor old Triceratops who is big but has a small brain). Now you want to say something quantitative about this. For example:

Animals that are 1 kilogram larger have a brain that is on average k kilograms larger.

How do you figure that out? Well one problem is that you don't have infinite money so you only collected information on a few animals. But you don't want to say something just about the animals you measured - you want to change the course of science forever and say something about the relationship between the two variables for all animals.

The best way to do this is to make some assumptions about what the measurements of brain and body weight look like if you could collect all of the measurements. It turns out if you assume that you know the complete shape of the distribution in this way, it becomes pretty straightforward (with a little math) to estimate the relationship between brain and body weight using something called maximum likelihood estimation. This is probably the most common way that economists or scientists relate one variable to another (the inventor of this approach is still waiting for his Nobel).

The problem is you assumed a lot to get your answer. For example, here are the data from just the brains that we have collected. It is pretty hard to guess exactly what shape the data from the whole world would look like.

This presents the next problem: how do we know that we have the "right one"?

We don't.

One way to get around this problem is to use a very old idea called the  method of moments. Suppose we believe the equation:

Average in World Body Weight = k * Average in World Brain Weight

In other words, if we take any animal in the world on average it's brain weights 5 kilos then its body will on average be (k * 5) kilos. The relationship is only "on average" because there are a bunch of variables we didn't measure and they may affect the relationship between brain and body weight. You can see it in the scatterplot because the two values don't lie on the same line.

One way to estimate k is to just replace the numbers you wish you knew with the numbers you have in your population:

Average in Data you Have Body Weight = k * Average in Data you Have Brain Weight

Since you have the data the only thing you don't know in the equation is k, so you can solve the equation and get an estimate. The nice thing here is we don't have to say much about the shape of the data we expect for body weight or brain weight. We just have to believe this one equation.  The key insight here is that you don't have to know the whole shape of the data, just one part of it (the average).  An important point to remember is that you are still making some assumptions here (that the average is a good thing to estimate, for example) but they are definitely fewer assumptions than you make if you go all the way and specify the whole shape, or distribution, of the data.

This is a pretty oversimplified version of the problem that Hansen solved. In reality when you make assumptions about the way the world works you often get more equations like the one above than variables you want to estimate. Solving all of those equations is now complicated because the answers from different equations  might contradict each other (the technical word is overdetermined).

Hansen showed that in this case you can take the equations and multiply them by a set of weights. You put more weight on equations you are more sure about, then add them up. If you choose the weights well, you avoid the problem of having too many equations for two few variables. This is the thing he won the prize for - the generalized method of moments.

This is all a big deal because the variables that economists measure frequently aren't very pretty. One common way they aren't pretty is that they are often measured over time, with complex relationships between values at different time points. That means it is hard to come up with realistic assumptions about what the data may look like.

By proposing an approach that doesn't require as many assumptions Hansen satisfied criteria (1) for things economists like. And, if you squint just right at the equations he proposed, you can see they actually are a general form of a bunch of other estimation techniques like maximum likelihood estimation and instrumental variables, which made it easier to prove theoretical results and satisfied criteria (2) for things economists like.

----

Disclaimer: This post was written for a general audience and may cause nerd-rage in those who see (important) details I may have skimmed over.

Disclaimer #2: I'm not an economist. So I can't talk about economics. There are reasons gmm is useful economically that I didn't even talk about here.

Can you explain the difference between this and the principle of Maximum Entropy, and why one would want to use one vs. the other? They seem similar-ish?

• channelclemente says:

Your example also explains why most persons making that comparison use body surface area.

• John M. Switlik says:

That is due to several things, one of which is the intuitive match (learned through eons of practice, (carried forward memely?) - and, even admitted to by Hawkins). To look at the other factors, we would consider a long list, perhaps starting with idempotency's appeal and usefulness.

• John M. Switlik says:

This is not a bad start. What would an economic flavor bring beyond the specious views that can't seem to get away from their inherent leaning (hence, dismal to the core)? Actually, these techniques, no doubt, are getting some attention from those who want to correlate, and then control, desires via ads from that great quantity of data derived from the web activity.

Work like Hansen's do need to be visible to the general populace. The Prizes, perhaps, help in bringing things to fore.

http://fed-aerated.blogspot.com/2013/10/nobel-prize.html

• DavidRHenderson says:

Thanks. I wrote the piece on the Nobel winners that will be published in the Wall Street Journal tomorrow. I wish I had seen this before closing the piece.

• hartsa says:

No, the technical term for having more moment conditions than parameters is overidentification.

• John M. Switlik says:

The problem with the world? Under-identification, thus, aerial extractions.

• CJ says:

But they are one and the same (especially as he had a general audience in mind).

• Kevin Denny says:

This is ok but it might leave the reader with the idea that GMM is just a clever statistical trick. What's special about Hansens work I think is how it links the economic theory and the statistics. So the theory, for example models of asset prices, implies very particular restrictions. These are moment restrictions. You can use these to estimate and test the model. Awesome. Some of Hansens work, incidentally, generalizes ideas by the British econometrician Denis Sargan.

• Gray says:

This is an important point. Some of the early papers *using* GMM do a decent job of explaining why GMM would go on to be so widely used, e.g. from Hansen and Singleton's 1982 paper (link below):

"The basic idea underlying our estimation strategy is as follows. The dynamic optimization problems of economic agents typically imply a set of stochastic Euler equations that must be satisfied in equilibrium. These Euler equations in turn imply a set of population orthogonality conditions that depend in a nonlinear way on variables observed by an econometrician and on unknown parameters characterizing preferences, profit functions, etc. We construct nonlinear instrumental variables estimators for these parameters in the manner suggested by Amemiya [1, 2], Jorgenson and Laffont [18], and Hansen [10] by making sample versions of the orthogonality conditions close to zero according to a certain metric. An important feature of these estimators is that they are consistent and have a limiting normal distribution under fairly weak assumptions about the stochastic processes generating the observable time series. Also, more orthogonality conditions are typically available for use in estimation than there are parameters to be estimated and, in this sense, the models are "overidentified." The overidentifying restrictions can be tested using a procedure, justified in Hansen [10], that examines how close sample versions of population orthogonality conditions are to zero."