Tag: computer science


Sunday data/statistics link roundup (11/25/2012)

  1. My wife used to teach at Grinnell College, so we were psyched to see that a Grinnell player set the NCAA record for most points in a game. We used to go to the games, which were amazing to watch, when we lived in Iowa. The system the coach has in place there is a ton of fun to watch and is based on statistics!
  2. Someone has to vet the science writers at the Huffpo. This is out of control, basically claiming that open access publishing is harming science. I mean, I'm all about being a curmudgeon and all, but the internet exists now, so we might as well get used to it. 
  3. This one is probably better for Steven's blog, but this is a pretty powerful graph about the life-saving potential of vaccines.  
  4. Roger posted yesterday about the NY Times piece on deep learning. It is one of our most shared posts of all time, you should also check out the comments, which are exceedingly good. Two things I thought I'd point out in response to a lot of the reaction: (1) I think part of Roger's post was suggesting that the statistics community should adopt some of CS's culture of solving problems with already existing, really good methods and (2) I tried searching for a really clear example of "deep learning" yesterday so we could try some statistics on it and didn't find any really clear explanations. Does anyone have a really simple example of deep learning (ideally with code) so we can see how it relates to statistical concepts? 

Computational biologist blogger saves computer science department

People who read the news should be aware by now that we are in the midst of a big data era. The New York Times, for example, has been writing about this frequently. One of their most recent articles describes how UC Berkeley is getting $60 million dollars for a new computer science center. Meanwhile, at University of Florida the administration seems to be oblivious to all this and about a month ago announced it was dropping its computer science department to save $. Blogger Steven Salzberg, a computational biologists known for his work in genomics, wrote a post titled “University of Florida eliminates Computer Science Department. At least they still have football” ridiculing UF for their decisions. Here are my favorite quotes:

 in the midst of a technology revolution, with a shortage of engineers and computer scientists, UF decides to cut computer science completely? 

Computer scientist Carl de Boor, a member of the National Academy of Sciences and winner of the 2003 National Medal of Science, asked the UF president “What were you thinking?”

Well, his post went viral and days later UF reversed it’s decision! So my point is this: statistics departments, be nice to bloggers that work in genomics… one of them might save your butt some day.

Disclaimer: Steven Salzberg has a joint appointment in my department and we have joint lab meetings.


An essay on why programmers need to learn statistics

This is awesome. There are a few places with some strong language, but overall I think the message is pretty powerful. Via Tariq K. I agree with Tariq, one of the gems is:

If you want to measure something, then don’t measure other sh**. 


Interview with Héctor Corrada Bravo

Héctor Corrada Bravo

Héctor Corrada Bravo is an assistant professor in the Department of Computer Science and the Center for Bioinformatics and Computational Biology at the University of Maryland, College Park. He moved to College Park after finishing his Ph.D. in computer science at the University of Wisconsin and a postdoc in biostatistics at the Johns Hopkins Bloomberg School of Public Health. He has done outstanding work at the intersection of molecular biology, computer science, and statistics. For more info check out his webpage.

Which term applies to you: statistician/data scientist/computer
scientist/machine learner?

I want to understand interesting phenomena (in my case mostly in
biology and medicine) and I believe that our ability to collect a large number of relevant
measurements and infer characteristics of these phenomena can drive
scientific discovery and commercial innovation in the near future.
Perhaps that makes me a data scientist and means that depending on the
task at hand one or more of the other terms apply.

A lot of the distinctions many people make between these terms are
vacuous and unnecessary, but some are nonetheless useful to think
about. For example, both statisticians and machine learners [sic] know
how to create statistical algorithms that compute interesting and informative objects using measurements (perhaps) obtained through some stochastic or partially observed
process. These objects could be genomic tools for cancer screening, or
statistics that better reflect the relative impact of baseball players
on team success.

Both fields also give us ways to evaluate and characterize these objects.
However, there are times when these objects are tools that fulfill an
immediately utilitarian purpose and thinking like an engineer might
(as many people in Machine Learning do) is the right approach.
Other times, these objects are there to help us get insights about our
world and thinking in ways that many statisticians do is the right
approach.  You need both of these ways of thinking to do interesting
science and dogmatically avoiding either of them is a terrible idea.

How did you get into statistics/data science (i.e. your history)?

I got interested in Artificial Intelligence at one point, and found
that my mathematics background was nicely suited to work on this. Once
I got into it, thinking about statistics and how to analyze and
interpret data was natural and necessary. I started working with two
wonderful advisors at Wisconsin, Raghu Ramakrishnan (CS) and Grace Wahba (Statistics)
that helped shape the way I approach problems from different angles
and with different goals. The last piece was discovering that
computational biology is a fantastic setting in which to apply and
devise these methods to answer really interesting questions.

What is the problem currently driving you?

I’ve been working on cancer epigenetics to find specific genomic
measurements for which increased stochasticity appears to be general
across multiple cancer types. Right now, I’m really wondering how far
into the clinic can these discoveries be taken, if at all. For
example, can we build tools that use these genomic measurements to
improve cancer screening?

How do you see CS/statistics merging in the future?

I think that future got here some time ago, but is about to get much
more interesting.

Here is one example: Computer Science is about creating and analyzing
algorithms and building the systems that can implement them. Some of
what many computer scientists have done looks at problems concerning how to
keep, find and ship around information (Operating Systems, Networks,
Databases, etc.). Many times these have been driven by very specific
needs, e.g., commercial transactions in databases. In some ways,
companies have moved from from asking how do I use data to keep track
of my activities to how do I use data to decide which activities to do
and how to do them. Statistical tools should be used to answer these
questions, and systems built by computer scientists have statistical
algorithms at their core.

Beyond R, what are some really useful computational tools for
statisticians to know about?

I think a computational tool that everyone can benefit a lot from
understanding better is algorithm design and analysis. This doesn’t
have to be at a particularly deep level, but just getting a sense of
how long a particular process might take, and how to devise a different way of doing it that might make it more efficient is really useful. I’ve been toying with the idea of creating a CS course called (something like) “Highlights of continuous
mathematics for computer science” that reminds everyone of the cool
stuff that one learns in math now that we can appreciate their usefulness. Similarily, I think
statistics students can benefit from “Highlights of discrete
mathematics for statisticians”.

Now a request for comments below from you and readers: (5a) Beyond R,
what are some really useful statistical tools for computer scientists
to know about?

Review times in statistics journals are long, should statisticians
move to conference papers?

I don’t think so. Long review times (anything more than 3 weeks) are
really not necessary. We tend to publish in journals with fairly quick
review times that produce (for the most part) really useful and
insightful reviews.

I was recently talking to senior members in my field who were telling
me stories about the “old times” when CS was moving from mainly
publishing in journals to now mainly publishing in conferences. But
now, people working in collaborative projects (like computational biology) work in fields
that primarily publish in journals, so the field needs to be able to
properly evaluate their impact and productivity. There is no perfect

For instance, review requests in fields where conferences are the main
publication venue come in waves (dictated by conference schedule).
Reviewers have a lot of papers to go over in a relatively short time
which makes their job of providing really helpful and fair reviews not
so easy. So, in that respect, the journal system can be better. The one thing that is universally true is that you don’t need long review times.

Previous Interviews: Daniela Witten, Chris Barr, Victoria Stodden