The scientific reasons it is not helpful to study the Newtown shooter's DNA

Tweet about this on TwitterShare on Facebook15Share on Google+4Share on LinkedIn3Email this to someone

The Connecticut Medical Examiner has asked to sequence and study the DNA of the recent Newtown shooter. I've been seeing this pop up over the last few days on a lot of popular media sites, where they mention some objections scientists (or geneticists) may have to this "scientific" study. But I haven't seen the objections explicitly laid out anywhere. So here are mine.

Ignoring the fundamentals of the genetics of complex disease: If the violent behavior of the shooter has any genetic underpinning, it is complex. If you only look at one person's DNA, without a clear behavior definition (violent? mental disorder? etc.?) it is impossible to assess important complications such as penetranceepistasis, and gene-environment interactions, to name a few. These make statistical analysis incredibly complicated even in huge, well-designed studies.

Small Sample Size:  One person hit on the issue that is maybe the biggest reason this is a waste of time/likely to lead to incorrect results. You can't draw a reasonable conclusion about any population by looking at only one individualThis is actually a fundamental component of statistical inference. The goal of statistical inference is to take a small, representative sample and use data from that sample to say something about the bigger population. In this case, there are two reasons that the usual practice of statistical inference can't be applied: (1) only one individual is being considered, so we can't measure anything about how variable (or accurate) the data are, and (2) we've picked one, incredibly high-profile, and almost certainly not representative, individual to study.

Multiple testing/data dredging: The small sample size problem is compounded by the fact that we aren't looking at just one or two of the shooter's genes, but rather the whole genome. To see why making statements about violent individuals based on only one person's DNA is a bad idea, think about the 20,000 genes in a human body. Let's suppose that only one of the genes causes violent behavior (it is definitely more complicated than that) and that there is no environmental cause to the violent behavior (clearly false). Furthermore, suppose that if you have the bad version of the violent gene you will do something violent in your life (almost definitely not a sure thing).

Now, even with all these simplifying (and incorrect) assumptions for each gene you flip a coin with a different chance of being heads. The violent gene turned up tails, but so did a large number of other genes. If we compare the set of genes that came up tails to another individual, they will have a huge number in common in addition to the violent gene. So based on this information, you would have no idea which gene causes violence even in this hugely simplified scenario.

Heavy reliance on prior information/intuition: This is a supposedly scientific study, but the small sample size/multiple testing problems mean any conclusions from the data will be very very weak. The only thing you could do is take the set of genes you found and then rely on previous studies to try to determine which one is the "violence gene". But now you are being guided by intuition, guesswork, and a bunch of studies that may or may not be relevant. The result is that more than likely you'd end up on the wrong gene.

The result is that it is highly likely that no solid statistical information will be derived from this experiment. Sometimes, just because the technology exists to run an experiment, doesn't mean that experiment will teach us anything.

  • Link Tran

    I agree that very little inference can be made from just one person's DNA sequence. However, I don't think that should stop one from proceeding. To apply an old saying, a journey of 1000 miles begins with the first step.

    • Andrew Jaffe

      Every person has millions of sites that differ from the "reference" genome - it's impossible to determine which of those million SNVs is linked to violence from N=1

      • Link Tran

        I think you missed my point. By beginning with one person and gradually adding more and more sequences over time, we could potentially have a large enough sample worth analyzing. While not much can be done now with N=1, it should not stop us from progressing. Comparably, if I were to save $1 everyday until the day I retire, I'd be rich. However, if I gave up and didn't bother reasoning that $1 is too little and of no value, then I'd never build up a retirement fund.

    • Sam_Sonite

      Also, the sample size is not actually one. If sequenced, the shooter's genome will be the nth genome where n-1 genomes have already been sequenced.

  • http://twitter.com/hspter Hilary Parker

    I assumed they were keeping a DNA sample so that if something new develops in 50 years, they'd have it--not unlike old murder or rape cases. Sequencing now makes no sense.

  • http://www.facebook.com/joseph.feeney Joseph Feeney

    I've been thinking about this too. The problem with blanket DNA screening is that your Genome is simply a map. While having a map is important, and is always the first step, you need to know how that map is being used (you can think of gene expression as traffic). In fact, different cells are using that map differently. How would you know which cells and the gene expression in those cells are the culprits?

    It seems like a gross simplification to say that such complex behavior could be bound to the expression of a single gene.

  • Pingback: NextGenSeek’s Stories This Week (03/01/13)()

  • http://twitter.com/Adrian_H Adrian Heilbut

    These criticisms are valid, especially with respect to trying to draw statistical associations from SNPs. However, it is also possible that one could observe a very rare event, such as a de novo translocation or deletion disrupting a region previously associated with, say, schizophrenia, which would be quite suggestive and interesting even if not really of much scientific value.

  • Pingback: Mistargeted genetic sequencing | Stats Chat()

  • disqus_z3ELGcSGJp

    However, despite these problems, at least one gene (MAO-A) has been found to be possibly associated with violence or antisocial behavior, when in combination with environment (trauma, abuse, etc.), and there are theories as to why. But this result came from the other end- studying the protein and variations, rather than big data.

  • Pingback: The scientific reasons it is not helpful to study the Newtown shooter’s DNA | Simply Statistics Papers()