Simply Statistics


Science is a calling and a career, here is a career planning guide for students and postdocs

Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someone

Editor’s note: This post was inspired by a really awesome career planning guide that Ben Langmead wrote up for his postdocs which you should go check out right now. You can also find the slightly adapted Leek group career planning guide here.

The most common reason that people go into science is altruistic. They loved dinosaurs and spaceships when they were a kid and that never wore off. On some level this is one of the reasons I love this field so much, it is an area where if you can get past all the hard parts can really keep introducing wonder into what you work on every day.

Sometimes I feel like this altruism has negative consequences. For example, I think that there is less emphasis on the career planning and development side in the academic community. I don’t think this is malicious, but I do think that sometimes people think of the career part of science as unseemly. But if you have any job that you want people to pay you to do, then there will be parts of that job that will be career oriented. So if you want to be a professional scientist, being brilliant and good at science is not enough. You also need to pay attention to and plan carefully your career trajectory.

A colleague of mine, Ben Langmead, created a really nice guide for his postdocs to thinking about and planning the career side of a postdoc which he has over on Github. I thought it was such a good idea that I immediately modified it and asked all of my graduate students and postdocs to fill it out. It is kind of long so there was no penalty if they didn’t finish it, but I think it is an incredibly useful tool for thinking about how to strategize a career in the sciences. I think that the more we are concrete about the career side of graduate school and postdocs, including being honest about all the realistic options available, the better prepared our students will be to succeed on the market.

You can find the Leek Group Guide to Career Planning here and make sure you also go check out Ben’s since it was his idea and his is great.



Is it species or is it batch? They are confounded, so we can't know

Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someone

In a 2005 OMICS paper, an analysis of human and mouse gene expression microarray measurements from several tissues led the authors to conclude that "any tissue is more similar to any other human tissue examined than to its corresponding mouse tissue". Note that this was a rather surprising result given how similar tissues are between species. For example, both mice and humans see with their eyes, breathe with their lungs, pump blood with their hearts, etc... Two follow-up papers (here and here) demonstrated that platform-specific technical variability was the cause of this apparent dissimilarity. The arrays used for the two species were different and thus measurement platform and species were completely confounded. In a 2010 paper, we confirmed that once this technical variability  was accounted for, the number of genes expressed in common  between the same tissue across the two species was much higher than the those expressed in common  between two species across the different tissues (see Figure 2 here).

So what is confounding and why is it a problem? This topic has been discussed broadly. We wrote a review some time ago. But based on recent discussions I've participated in, it seems that there is still some confusion. Here I explain, aided by some math, how confounding leads to problems in the context of estimating species effects in genomics. We will use

  • Xi to represent the gene expression measurements for human tissue i,
  • aX to represent the level of expression that is specific to humans and
  • bX to represent the batch effect introduced by the use of the human microarray platform.
  • Therefore Xi =a+ bX + ei,with ei the tissue i  effect and other uninteresting sources of variability.

Similarly, we will use:

  • Yi to represent the measurements for mouse tissue i
  • aY  to represent the mouse specific level and
  • bY the batch effect introduced by the use of the mouse microarray platform.
  • Therefore Yi =a+ bY +fi,with fi tissue i  effect and other uninteresting sources of variability.

If we are interested in estimating a species effect that is general across tissues, then we are interested in the following quantity:

 aY - aX

Naively, we would think that we can estimate this quantity using the observed differences between the species that cancel out the tissue effect. We observe a difference for each tissue: Y - X, Y2 - X, etc... The problem is that aand bare always together as are aand bY .We say that the batch effect bX is confounded with the species effect aX. Therefore, on average, the observed differences include both the species and the batch effects. To estimate the difference above we would write a a model like this:

Y - Xi = (aY - aX) + (bY - bX) + other sources of variability

and then estimate the unknown quantities of interest: (aY - aX) and (bY - bX) from the observed data Y1 - X1, Y2 - X2, etc... The problem is that, we can estimate the aggregate effect (aY - aX) + (bY - bX), but, mathematically, we can't tease apart the two differences.  To see this note that if we are using least squares, the estimates (aY - aX) = 7,  (bY - bX)=3  will fit the data exactly as well as (aY - aX)=3,(bY - bX)=7 since

{(Y-X) -(7+3))^2 = {(Y-X)- (3+7)}^2.

In fact, under these circumstances, there are an infinite number of solutions to the standard statistical estimation approaches. A simple analogy is to try to find a unique solution to the equations m+n = 0. If batch and species are not confounded then we are able to tease apart differences just as if we were given another equation: m+n=0; m-n=2. You can learn more about this in this linear models course.

Note that the above derivation apply to each gene affected by the batch effect. In practice we commonly see hundreds of genes affected. As a consequence, when we compute distances between two samples from different species we may see large differences even where there is no species effect. This is because the bY - bX  differences for each gene are squared and added up.

In summary, if you completely confound your variable of interest, in this case species, with a batch effect, you will not be able to estimate the effect of either. In fact, in a 2010 Nature Genetics Review  about batch effects we warned about "cases in which batch effects are confounded with an outcome of interest and result in misleading biological or clinical conclusions". We also warned that none of the existing solutions for batch effects (Combat, SVA, RUV, etc...) can save you from a situation with perfect confounding. Because we can't always predict what will introduce unwanted variability, we recommend randomization as an experimental design approach.

Almost a decade later after the OMICS paper was published, the same surprising conclusion was reached in this PNAS paper:  "tissues appear more similar to one another within the same species than to the comparable organs of other species". This time RNAseq was used for both species and therefore the different platform issue was not considered*. Therefore, the authors implicitly assumed that (bY - bX)=0. However, in a recent F1000 Research publication Gilad and Mizrahi-Man describe describe an exercise in forensic bioinformatics that led them to discover that mice and human samples were run in different lanes or different instruments. The confounding was near perfect (see Figure 1). As pointed out by these authors, with this experimental design we can't  simply accept that (bY - bX)=0, which implies that we can't estimate a species effect. Gilad and Mizrahi-Man then apply a linear model (ComBat) to account for the batch/species effect and find that samples cluster almost perfectly by tissue. However, Gilad and Mizrahi-Man correctly note that,  due to the confounding, if there is in fact a species effect, this approach will remove it along with the batch effect. Unfortunately, due to the experimental design it will be hard or impossible to determine if it's batch or if it's species. More data  and more analyses are needed.

Confounded designs ruin experiments. Current batch effect removal methods will not save you. If you are designing a large genomics experiments, learn about randomization.

 * The fact that RNAseq was used does not necessarily mean there is no platform effect. The species have different genomes, with different sequences and thus can lead to different biases during experimental protocols.

Update: Shin Lin has repeated a small version of the experiment described in the PNAS paper. The new experimental design does not confound lane/instrument with species. The new data confirms their original results pointing to the fact that lane/instrument do not explain the clustering by species. You can see his response in the comments here.


Residual expertise - or why scientists are amateurs at most of science

Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someone

Editor's note: I have been unsuccessfully attempting to finish a book I started 3 years ago about how and why everyone should get pumped about reading and understanding scientific papers. I've adapted part of one of the chapters into this blogpost. It is pretty raw but hopefully gets the idea across. 

An episode of The Daily Show with Jon Stewart featured physicist Lisa Randall, an incredible physicist and noted scientific communicator, as the invited guest.

Near the end of the interview, Stewart asked Randall why, with all the scientific progress we have made, that we have been unable to move away from fossil fuel-based engines. The question led to the exchange:

Randall: “So this is part of the problem, because I’m a scientist doesn’t mean I know the answer to that question.”

Stewart: ”Oh is that true? Here’s the thing, here’s what’s part of the answer. You could say anything and I would have no idea what you are talking about.”

Professor Randall is a world leading physicist, the first woman to achieve tenure in physics at Princeton, Harvard, and MIT, and a member of the National Academy of Sciences.2 But when it comes to the science of fossil fuels, she is just an amateur. Her response to this question is just perfect - it shows that even brilliant scientists can just be interested amateurs on topics outside of their expertise. Despite Professor Randall’s over-the-top qualifications, she is an amateur on a whole range of scientific topics from medicine, to computer science, to nuclear engineering. Being an amateur isn’t a bad thing, and recognizing where you are an amateur may be the truest indicator of genius. That doesn’t mean Professor Randall can’t know a little bit about fossil fuels or be curious about why we don’t all have nuclear-powered hovercrafts yet. It just means she isn’t the authority.

Stewart’s response is particularly telling and indicative of what a lot of people think about scientists. It takes years of experience to become an expert in a scientific field - some have suggested as many as 10,000 hours of dedicated time. Professor Randall is a scientist - so she must have more information about any scientific problem than an informed amateur like Jon Stewart. But of course this isn’t true, Jon Stewart (and you) could quickly learn as much about fossil fuels as a scientist if the scientist wasn't already an expert in the area. Sure a background in physics would help, but there are a lot of moving parts in our dependence on fossil fuels, including social, political, economic problems in addition to the physics involved.

This is an example of "residual expertise" - when people without deep scientific training are willing to attribute expertise to scientists even if it is outside their primary area of focus. It is closely related to the logical fallacy behind the argument from authority:

A is an authority on a particular topic

A says something about that topic

A is probably correct

the difference is that with residual expertise you assume that since A is an authority on a particular topic, if they say something about another, potentially related topic, they will probably be correct. This idea is critically important, it is how quacks make their living. The logical leap of faith from "that person is a doctor" to "that person is a doctor so of course they understand epidemiology, or vaccination, or risk communication" is exactly the leap empowered by the idea of residual expertise. It is also how you can line up scientific experts against any well established doctrine like evolution or climate change. Experts in the field will know all of the relevant information that supports key ideas in the field and what it would take to overturn those ideas. But experts outside of the field can be lined up and their residual expertise used to call into question even the most supported ideas.

What does this have to do with you?

Most people aren't necessarily experts in scientific disciplines they care about. But becoming a successful amateur requires a much smaller time commitment than becoming an expert, but can still be incredibly satisfying, fun, and useful. This book is designed to help you become a fired-up amateur in the science of your choice. Think of it like a hobby, but one where you get to learn about some of the coolest new technologies and ideas coming out in the scientific literature. If you can ignore the way residual expertise makes you feel silly for reading scientific papers you don't fully understand - you can still learn a ton and have a pretty fun time doing it.




The tyranny of the idea in science

Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someone

There are a lot of analogies between startups and academic science labs. One thing that is definitely very different is the relative value of ideas in the startup world and in the academic world. For example, Paul Graham has said:

Actually, startup ideas are not million dollar ideas, and here's an experiment you can try to prove it: just try to sell one. Nothing evolves faster than markets. The fact that there's no market for startup ideas suggests there's no demand. Which means, in the narrow sense of the word, that startup ideas are worthless.

In academics, almost the opposite is true. There is huge value to being first with an idea, even if you haven't gotten all the details worked out or stable software in place. Here are a couple of extreme examples illustrated with Nobel prizes:

  1. Higgs Boson - Peter Higgs postulated the Boson in 1964, he won the Nobel Prize in 2013 for that prediction, in between tons of people did follow on work, someone convinced Europe to build one of the most expensive pieces of scientific equipment ever built and conservatively thousands of scientists and engineers had to do a ton of work to get the equipment to (a) work and (b) confirm the prediction.
  2. Human genome - Watson and Crick postulated the structure of DNA in 1953, they won the Nobel Prize in  medicine in 1962 for this work. But the real value of the human genome was realized when the largest biological collaboration in history sequenced the human genome, along with all of the subsequent work in the genomics revolution.

These are two large scale examples where the academic scientific community (as represented by the Nobel committee, mostly because it is a concrete example) rewards the original idea and not the hard work to achieve that idea. I call this, "the tyranny of the idea." I notice a similar issue on a much smaller scale, for example when people don't recognize software as a primary product of science. I feel like these decisions devalue the real work it takes to make any scientific idea a reality. Sure the ideas are good, but it isn't clear that some ideas wouldn't be discovered by someone else - but surely we aren't going to build another large hadron collider. I'd like to see the scales correct back the other way a little bit so we put at least as much emphasis on the science it takes to follow through on an idea as on discovering it in the first place.


Mendelian randomization inspires a randomized trial design for multiple drugs simultaneously

Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someone

Joe Pickrell has an interesting new paper out about Mendelian randomization. He discusses some of the interesting issues that come up with these studies and performs a mini-review of previously published studies using the technique.

The basic idea behind Mendelian Randomization is the following. In a simple, randomly mating population Mendel's laws tell us that at any genomic locus (a measured spot in the genome) the allele (genetic material you got) you get is assigned at random. At the chromosome level this is very close to true due to properties of meiosis (here is an example of how this looks in very cartoonish form in yeast). A very famous example of this was an experiment performed by Leonid Kruglyak's group where they took two strains of yeast and repeatedly mated them, then measured genetics and gene expression data. The experimental design looked like this:



If you look at the allele inherited from the two parental strains (BY, RM)  at two separate genes on different chromsomes in each of the 112 segregants (yeast offspring)  they do appear to be random and independent:

Screen Shot 2015-05-07 at 11.20.46 AM



So this is a randomized trial in yeast where the yeast were each randomized to many many genetic "treatments" simultaneously. Now this isn't strictly true, since genes on the same chromosomes near each other aren't exactly random and in humans it is definitely not true since there is population structure, non-random mating and a host of other issues. But you can still do cool things to try to infer causality from the genetic "treatments" to downstream things like gene expression ( and even do a reasonable job in the model organism case).

In my mind this raises a potentially interesting study design for clinical trials. Suppose that there are 10 treatments for a disease that we know about. We design a study where each of the patients in the trial was randomized to receive treatment or placebo for each of the 10 treatments. So on average each person would get 5 treatments.  Then you could try to tease apart the effects using methods developed for the Mendelian randomization case. Of course, this is ignoring potential interactions, side effects of taking multiple drugs simultaneously, etc. But I'm seeing lots of interesting proposals for new trial designs (which may or may not work), so I thought I'd contribute my own interesting idea.


Rafa's citations above replacement in statistics journals is crazy high.

Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someone

Editor's note:  I thought it would be fun to do some bibliometrics on a Friday. This is super hacky and the CAR/Y stat should not be taken seriously. 

I downloaded data on the 400 most cited papers between 2000-2010 in some statistical journals from Web of Science. Here is a boxplot of the average number of citations per year (from publication date - 2015) to these papers in the journals Annals of Statistics, Biometrics, Biometrika, Biostatistics, JASA, Journal of Computational and Graphical Statistics, Journal of Machine Learning Research, and Journal of the Royal Statistical Society Series B.




There are several interesting things about this graph right away. One is that JASA has the highest median number of citations, but has fewer "big hits" (papers with 100+ citations/year) than Annals of Statistics, JMLR, or JRSS-B. Another thing is how much of a lottery developing statistical methods seems to be. Most papers, even among the 400 most cited, have around 3 citations/year on average. But a few lucky winners have 100+ citations per year. One interesting thing for me is the papers that get 10 or more citations per year but aren't huge hits. I suspect these are the papers that solve one problem well but don't solve the most general problem ever.

Something that jumps out from that plot is the outlier for the journal Biostatistics. One of their papers is cited 367.85 times per year. The next nearest competitor is 67.75 and it is 19 standard deviations above the mean! The paper in question is: "Exploration, normalization, and summaries of high density oligonucleotide array probe level data", which is the paper that introduced RMA, one of the most popular methods for pre-processing microarrays ever created. It was written by Rafa and colleagues. It made me think of the statistic "wins above replacement" which quantifies how many extra wins a baseball team gets by playing a specific player in place of a league average replacement.

What about a "citations /year above replacement" statistic where you calculate for each journal:

Median number of citations to a paper/year with Author X - Median number of citations/year to an average paper in that journal

Then average this number across journals. This attempts to quantify how many extra citations/year a person's papers generate compared to the "average" paper in that journal. For Rafa the numbers look like this:

  • Biostatistics: Rafa = 15.475, Journal = 1.855, CAR/Y =  13.62
  • JASA: Rafa = 74.5, Journal = 5.2, CAR/Y = 69.3
  • Biometrics: Rafa = 4.33, Journal = 3.38, CAR/Y = 0.95

So Rafa's citations above replacement is (13.62 + 69.3 + 0.95)/3 =  27.96! There are a couple of reasons why this isn't a completely accurate picture. One is the low sample size, the second is the fact that I only took the 400 most cited papers in each journal. Rafa has a few papers that didn't make the top 400 for journals like JASA - which would bring down his CAR/Y.



Figuring Out Learning Objectives the Hard Way

Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someone

When building the Genomic Data Science Specialization (which starts in June!) we had to figure out the learning objectives for each course. We initially set our ambitions high, but as you can see in this video below, Steven Salzberg brought us back to Earth.


Data analysis subcultures

Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someone

Roger and I responded to the controversy around the journal that banned p-values today in Nature. A piece like this requires a lot of information packed into very little space but I thought one idea that deserved to be talked about more was the idea of data analysis subcultures. From the paper:

Data analysis is taught through an apprenticeship model, and different disciplines develop their own analysis subcultures. Decisions are based on cultural conventions in specific communities rather than on empirical evidence. For example, economists call data measured over time 'panel data', to which they frequently apply mixed-effects models. Biomedical scientists refer to the same type of data structure as 'longitudinal data', and often go at it with generalized estimating equations.

I think this is one of the least appreciated components of modern data analysis. Data analysis is almost entirely taught through an apprenticeship culture with completely different behaviors taught in different disciplines. All of these disciplines agree about the mathematical optimality of specific methods under very specific conditions. That is why you see methods like randomized trials re-discovered across multiple disciplines.

But any real data analysis is always a multi-step process involving data cleaning and tidying, exploratory analysis, model fitting and checking, summarization and communication. If you gave someone from economics, biostatistics, statistics, and applied math an identical data set they'd give you back very different reports on what they did, why they did it, and what it all meant. Here are a few examples I can think of off the top of my head:

  • Economics calls longitudinal data panel data and uses mostly linear mixed effects models, while generalized estimating equations are more common in biostatistics (this is the example from Roger/my paper).
  • In genome wide association studies the family wise error rate is the most common error rate to control. In gene expression studies people frequently use the false discovery rate.
  • This is changing a bit, but if you learned statistics at Duke you are probably a Bayesian and if you learned at Berkeley you are probably a frequentist.
  • Psychology has a history of using parametric statistics, genomics is big into empirical Bayes, and you see a lot of Bayesian statistics in climate studies.
  • You see homoskedasticity tests used a lot in econometrics, but that is hardly ever done through formal hypothesis testing in biostatistics.
  • Training sets and test sets are used in machine learning for prediction, but rarely used for inference.

This is just a partial list I thought of off the top of my head, there are a ton more. These decisions matter a lot in a data analysis.  The problem is that the behavioral component of a data analysis is incredibly strong, no matter how much we'd like to think of the process as mathematico-theoretical. Until we acknowledge that the most common reason a method is chosen is because, "I saw it in a widely-cited paper in journal XX from my field" it is likely that little progress will be made on resolving the statistical problems in science.


Why is there so much university administration? We kind of asked for it.

Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someone

The latest commentary on the rising cost of college tuition is by Paul F. Campos and is titled The Real Reason College Tuition Costs So Much. There has been much debate about this article and whether Campos is right or wrong...and I don't plan to add to that. However, I wanted to pick up on a major point of the article that I felt got left hanging out there: The rising levels of administrative personnel at universities.

Campos argues that the reason college tuition is on the rise is not that colleges get less and less money from the government (mostly state government for state schools), but rather that there is an increasing number of administrators at universities that need to be paid in dollars and cents. He cites a study that shows that for the California State University system, in a 34 year period, the number of of faculty rose by about 3% whereas the number of administrators rose by 221%.

My initial thinking when I saw the 221% number was "only that much?" I've been a faculty member at Johns Hopkins now for about 10 years, and just in that short period I've seen the amount of administrative work I need to do go up what feels like at least 221%. Partially, of course, that is a result of climbing up the ranks. As you get more qualified to do administrative work, you get asked to do it! But even adjusting for that, there are quite a few things that faculty need to do now that they weren't required to do before.  Frankly, I'm grateful for the few administrators that we do have around here to help me out with various things.

Campos seems to imply (but doesn't come out and say) that the bulk of administrators are not necessary. And that if we were to cut these people from the payrolls, that we could reduce tuition down to what it was in the old days. Or at least, it would be cheaper. This argument reminds me about debates over the federal budget: Everyone thinks the budget is too big, but no one wants to suggest something to cut.

My point here is that the reason there are so many administrators is that there's actually quite a bit of administration to do. And the amount of administration that needs to be done has increased over the past 30 years.

Just for fun, I decided to go to the Johns Hopkins University Administration web site to see who all these administrators were.  This site shows the President's Cabinet and the Deans of the individual schools, which isn't everybody, but it represents a large chunk. I don't know all of these people, but I have met and worked with a few of them.

For the moment I'm going to skip over individual people because, as much as you might think they are overpaid, no individual's salary is large enough to move the needle on college tuition. So I'll stick with people who actually represent large offices with staff. Here's a sample.

  • University President. Call me crazy, but I think the university needs a President. In the U.S. the university President tends to focus on outward facing activities like raising money from various sources, liasoning with the government(s), and pushing university initiatives around the world. This is not something I want to do (but I think it's necessary), I'd rather have the President take care of it for me.
  • University Provost. At most universities in the U.S. the Provost is the "senior academic officer", which means that he/she runs the university. This is a big job, especially at big universities, and require coordinating across a variety of constituencies. Also, at JHU, the Provost's office deals with a number of compliance related issues like Title IX, accreditation, Americans with Disabilities Act, and many others. I suppose we could save some money by violating federal law, but that seems short-sighted.

    The people in this office do tough work involving a ton of paper. One example involves online education. Most states in the U.S. say that if you're going to run an education program in their state, it needs to be approved by some regulatory body. Some states have essentially a reciprocal agreement, so if it's okay in your state, then it's okay in their state. But many states require an entire approval process for a program to run in that state. And by "a program" I mean something like an M.S. in Mathematics. If you want to run an M.S. in English that's another approval, etc. So someone has to go to all the 50 states and D.C. and get approval for every online program that JHU runs in order to enroll students into that program from that state. I think Arkansas actually requires that someone come to Arkansas and testify in person about a program asking for approval.

    I support online education programs, and I'm glad the Provost's office is getting all those approvals for us.

  • Corporate Security. This may be a difficult one for some people to understand, but bear in mind that much of Johns Hopkins is located in East Baltimore. If you've ever seen the TV show The Wire, then you know why we need corporate security.
  • Facilities and Real Estate. Johns Hopkins owns and deals with a lot of real estate; it's a big organization. Who is supposed to take care of all that? For example, we just installed a brand new supercomputer jointly with the University of Maryland, called MARCC. I'm really excited to use this supercomputer for research, but systems like this require a bit of space. A lot of space actually. So we needed to get some land to put it on. If you've ever bought a house, you know how much paperwork is involved.
  • Development and Alumni Relations. I have a new appreciation for this office now that I co-direct a program that has enrolled over 1.5 million people in just over a year. It's critically important that we keep track of our students for many reasons: tracking student careers and success, tapping them to mentor current students, developing relationships with organizations that they're connected to are just a few.
  • General Counsel. I'm not he lawbreaking type, so I need lawyers to help me out.
  • Enterprise Development. This office involves, among other things, technology transfer, which I have recently been involved with quite a bit for my role in the Data Science Specialization offered through Coursera. This is just to say that I personally benefit from this office. I've heard people say that universities shouldn't be involved in tech transfer, but Bayh-Dole is what it is and I think Johns Hopkins should play by the same rules as everyone else. I'm not interested in filing patents, trademarks, and copyrights, so it's good to have people doing that for me.

Okay, that's just a few offices, but you get the point. These administrators seem to be doing a real job (imagine that!) and actually helping out the university. Many of these people are actually helping me out. Some of these jobs are essentially required by the existence of federal laws, and so we need people like this.

So, just to recap, I think there are in fact more administrators in universities than there used to be. Is this causing an increase in tuition? It's possible, but it's probably not the only cause. If you believe the CSU study, there was about a 3.5% annual increase in the number of administrators each year from 1975 to 2008. College tuition during that time period went up around 4% per year (inflation adjusted). But even so, much of this administration needs to be done (because faculty don't want to do it), so this is a difficult path to go down if you're looking for ways to lower tuition.

Even if we've found the smoking gun, the question is what do we do about it?


Genomics Case Studies Online Courses Start in Two Weeks (4/27)

Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someone

The last month of the HarvardX Data Analysis for Genomics series start on 4/27. We will cover case studies on RNAseq, Variant calling, ChipSeq and DNA methylation. Faculty includes Shirley Liu, Mike Love, Oliver Hoffman and the HSPH Bioinformatics Core. Although taking the previous courses on the series will help, the four case study courses were developed as stand alone and you can obtain a certificate for each one without taking any other course.

Each course is presented over two weeks but will remain open until June 13 to give students an opportunity to take them all if they wish. For more information follow the links listed below.

  1. RNA-seq data analysis will be lead by Mike Love
  2. Variant Discovery and Genotyping will be taught by Shannan Ho Sui, Oliver Hofmann, Radhika Khetani and Meeta Mistry (from the The HSPH Bioinformatics Core)
  3. ChIP-seq data analysis will be lead by Shirley Liu
  4. DNA methylation data analysis will be lead by Rafael Irizarry