<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Simply Statistics</title>
	<atom:link href="http://simplystatistics.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://simplystatistics.org</link>
	<description></description>
	<lastBuildDate>Sun, 19 May 2013 16:01:52 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5</generator>
		<item>
		<title>Sunday data/statistics link roundup (5/19/2013)</title>
		<link>http://simplystatistics.org/2013/05/19/sunday-datastatistics-link-roundup-5192013/</link>
		<comments>http://simplystatistics.org/2013/05/19/sunday-datastatistics-link-roundup-5192013/#comments</comments>
		<pubDate>Sun, 19 May 2013 16:01:52 +0000</pubDate>
		<dc:creator>Jeff Leek</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://simplystatistics.org/?p=1356</guid>
		<description><![CDATA[This is a ridiculously good post on 20th versus 21st century problems and the rise of the importance of empirical science. I particularly like the discussion of what it means to be a "solved" problem and how that has changed. &#8230; <a href="http://simplystatistics.org/2013/05/19/sunday-datastatistics-link-roundup-5192013/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<ol>
<li>This is a <a href="http://camdp.com/blogs/21st-century-problems">ridiculously good post </a>on 20th versus 21st century problems and the rise of the importance of empirical science. I particularly like the discussion of what it means to be a "solved" problem and how that has changed.</li>
<li><a href="http://www.sciencemag.org/content/340/6134/787.full">A discussion</a> in Science about the (arguably) most important statistics among academics, the impact factor and h-index. This comes on the heels of the<a href="http://am.ascb.org/dora/"> San Francisco Declaration of Research Assessment</a>. I like the idea that we should focus on evaluating science for its own merit rather than focusing on summaries like impact factor. But I worry that the "gaming" people are worried about with quantitative numbers like IF will be replaced with "politicking" if it becomes too qualitative. (via Rafa)</li>
<li>A <a href="http://blogs.telegraph.co.uk/news/tomchiversscience/100217094/depressing-just-nine-per-cent-of-britons-trust-stats-over-our-own-experience-though-most-of-us-wont-believe-that/">write-up</a> about a survey  in Britain that suggests people don't believe statistics (surprise!). I think this is symptomatic of a bigger issue which is being raised over and over. In the era when scientific problems don't have deterministic solutions how do we determine if a problem has been solved? There is no good answer for this yet and it threatens to undermine a major fraction of the scientific enterprise going forward.</li>
<li>Businesses are confusing <a href="http://qz.com/81661/most-data-isnt-big-and-businesses-are-wasting-money-pretending-it-is/">data analysis and big data</a>. This is so important and true. Big data infrastructure is often critical for creating/running data products. But discovering new ideas from data often happens on much smaller data sets with good intuition and interactive data analysis.</li>
<li><a href="http://www.nytimes.com/2013/05/19/sports/topps-changes-baseball-card-numbering-to-criticism.html?_r=1&amp;">Really interesting article</a> about how the baseball card numbering system matters and how changing it can upset collectors (via Chris V.).</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://simplystatistics.org/2013/05/19/sunday-datastatistics-link-roundup-5192013/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>When does replication reveal fraud?</title>
		<link>http://simplystatistics.org/2013/05/17/when-does-replication-reveal-fraud/</link>
		<comments>http://simplystatistics.org/2013/05/17/when-does-replication-reveal-fraud/#comments</comments>
		<pubDate>Fri, 17 May 2013 13:32:01 +0000</pubDate>
		<dc:creator>Roger Peng</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://simplystatistics.org/?p=1354</guid>
		<description><![CDATA[Here's a little thought experiment for your weekend pleasure. Consider the following: Joe Scientist decides to conduct a study (call it Study A) to test the hypothesis that a parameter D &#62; 0 vs. the null hypothesis that D = 0. He &#8230; <a href="http://simplystatistics.org/2013/05/17/when-does-replication-reveal-fraud/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Here's a little thought experiment for your weekend pleasure. Consider the following:</p>
<p>Joe Scientist decides to conduct a study (call it Study A) to test the hypothesis that a parameter <em>D</em> &gt; 0 vs. the null hypothesis that <em>D</em> = 0. He designs a study, collects some data, conducts an appropriate statistical analysis and concludes that <em>D</em> &gt; 0. This result is published in the Journal of Awesome Results along with all the details of how the study was done.</p>
<p>Jane Scientist finds Joe's study very interesting and tries to replicate his findings. She conducts a study (call it Study B) that is similar to Study A but completely independent of it (and does not communicate with Joe). In her analysis she does not find strong evidence that <em>D</em> &gt; 0 and concludes that she cannot rule out the possibility that <em>D</em> = 0. She publishes her findings in the Journal of Null Results along with all the details.</p>
<p>From these two studies, which of the following conclusions can we make?</p>
<ol>
<li>Study A is obviously a fraud. If the truth were that <em>D</em> &gt; 0, then Jane should have concluded that <em>D</em> &gt; 0 in her independent replication.</li>
<li>Study B is obviously a fraud. If Study A were conducted properly, then Jane should have reached the same conclusion.</li>
<li>Neither Study A nor Study B was a fraud, but the result for Study A was a Type I error, i.e. a false positive.</li>
<li>Neither Study A nor Study B was a fraud, but the result for Study B was a Type II error, i.e a false negative.</li>
</ol>
<p>I realize that there are a number of subtle details concerning why things might happen but I've purposely left them out. My question is, based on the information that you <em>actually have</em> about the two studies, what would you consider to be the most likely case? What further information would you like to know beyond what was given here?<em><br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://simplystatistics.org/2013/05/17/when-does-replication-reveal-fraud/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>The bright future of applied statistics</title>
		<link>http://simplystatistics.org/2013/05/15/the-bright-future-of-applied-statistics/</link>
		<comments>http://simplystatistics.org/2013/05/15/the-bright-future-of-applied-statistics/#comments</comments>
		<pubDate>Wed, 15 May 2013 14:00:33 +0000</pubDate>
		<dc:creator>Rafael Irizarry</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[applied statistics]]></category>
		<category><![CDATA[future of statistics]]></category>

		<guid isPermaLink="false">http://simplystatistics.org/?p=1328</guid>
		<description><![CDATA[In 2013, the Committee of Presidents of Statistical Societies (COPSS) celebrates its 50th Anniversary. As part of its celebration, COPSS will publish a book, with contributions from past recipients of its awards, titled “Past, Present and Future of Statistical Science". Below is &#8230; <a href="http://simplystatistics.org/2013/05/15/the-bright-future-of-applied-statistics/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;">In 2013, the Committee of Presidents of Statistical Societies (COPSS) celebrates its 50th Anniversary. As part of its celebration, COPSS will publish a book, with contributions from past recipients of its awards, titled “Past, Present and Future of Statistical Science". Below is my contribution titled <em style="font-size: 16px;">The bright future of applied </em><span style="font-size: 16px;"><em>statistics</em>.</span></p>
<p class="noindent" style="text-align: left;">When I was asked to contribute to this issue, titled Past, Present, and Future of Statistical Science, I contemplated my career while deciding what to write about. One aspect that stood out was how much I benefited from the right circumstances. I came to one clear conclusion: it is a great time to be an applied statistician. I decided to describe the aspects of my career that I have thoroughly enjoyed in the <em style="font-size: 16px;">past </em>and <em style="font-size: 16px;">present </em>and explain why I this has led me to believe that the <em style="font-size: 16px;">future </em>is bright for applied statisticians. <!--l. 32--></p>
<p class="indent" style="text-align: left;">I became an applied statistician while working with David Brillinger on my PhD thesis. When searching for an advisor I visited several professors and asked them about their interests. David asked me what I liked and all I came up with was "<em style="font-size: 16px;">I don't know. Music?</em>", to which he responded "<em style="font-size: 16px;">That's what we will work </em><em>on</em>". Apart from the necessary theorems to get a PhD from the Statistics Department at Berkeley, my thesis summarized my collaborative work with researchers at the Center for New Music and Audio Technology. The work<br />
involved separating and parameterizing the harmonic and non-harmonic components of musical sound signals [<a href="#Xirizarry2001local">5</a>]. The sounds had been digitized into data. The work was indeed fun, but I also had my first glimpse into the incredible potential of statistics in a world becoming more and more data-driven. <!--l. 47--></p>
<p class="indent" style="text-align: left;">Despite having expertise only in music, and a thesis that required a CD player to hear the data, fitted models and residuals (<a class="url" style="font-size: 16px;" href="http://www.biostat.jhsph.edu/~ririzarr/Demo/index.html">http://www.biostat.jhsph.edu/~ririzarr/Demo/index.html</a>), I was hired by the Department of Biostatistics at Johns Hopkins School of Public Health. Later I realized what was probably obvious to the School’s leadership: that regardless of the subject matter of my thesis, my time series expertise could be applied to several public health applications [<a href="#Xirizarry2001assessing">8</a>, <a href="#Xdipietro2001cross">2</a>, <a href="#Xcrone2001electrocorticographic">1</a>]. The public health and biomedical challenges surrounding me were simply too hard to resist and my new<br />
department knew this. It was inevitable that I would quickly turn into an applied Biostatistician. <!--l. 60--></p>
<p class="indent" style="text-align: left;">Since the day that I arrived at Hopkins 15 years ago, Scott Zeger, the department chair, fostered and encouraged faculty to leverage their statistical expertise to make a difference and to have an immediate impact in science. At that time, we were in the midst of a measurement revolution that was transforming several scientific fields into data-driven ones. By being located in a School of Public Health and next to a medical school, we were surrounded by collaborators working in such fields. These included environmental science, neuroscience, cancer biology, genetics, and molecular biology. Much of my work was motivated by collaborations with biologists that, for the first time, were collecting large amounts of data. Biology was changing from a data poor discipline to a data intensive<br />
ones.<br />
<!--l. 75--></p>
<p class="indent" style="text-align: left;">A specific example came from the measurement of gene expression. Gene expression is the process where DNA, the blueprint for life, is copied into RNA, the templates for the synthesis of proteins, the building blocks for life. Before microarrays were invented in the 1990s, the analysis of gene expression data amounted to spotting black dots on a piece of paper (see Figure <a style="font-size: 16px;" href="#x1-10011">1</a>A below). With microarrays, this suddenly changed to sifting through tens of thousands of numbers (see Figure <a style="font-size: 16px;" href="#x1-10011">1</a>B). Biologists went from using their eyes to categorize results to having thousands (and now millions) of measurements per sample to analyze. Furthermore, unlike genomic DNA, which is static, gene expression is a dynamic quantity: different tissues express different genes at different levels and at different times. The complexity was exacerbated by unpolished technologies that made measurements much noisier than anticipated. This complexity and level of variability made statistical thinking an important aspect of the analysis. The Biologists that used to say, "if I need statistics, the experiment went wrong" were now seeking out our help. The results of these collaborations have led to, among other things, the development of breast cancer recurrence gene expression assays making it possible to identify patients at risk of distant recurrence following surgery</p>
<p style="text-align: left;">[<a href="#Xvan2002gene">9</a>].<br />
<!--l. 97--></p>
<div class="figure" style="text-align: left;">
<p class="noindent"><a href="http://simplystatistics.org/2013/05/15/the-bright-future-of-applied-statistics/expression/" rel="attachment wp-att-1329"><img class="alignnone size-full wp-image-1329" alt="expression" src="http://simplystatistics.org/wp-content/uploads/2013/05/expression.jpg" width="5834" height="2569" /></a></p>
<div class="caption">Figure 1: Illustration of gene expression data before and after micorarrays.</div>
<p><!--tex4ht:label?: x1-10011 --></p>
</div>
<p style="text-align: left;">When biologists at Hopkins first came to our department for help with their  microarray data, Scott put them in touch with me because I had experience with (what was then) large datasets (digitized music signals are represented by 44,100 points per second). The more I learned about the scientific problems and the more data I explored, the more motivated I became. The potential for statisticians having an impact in this nascent field was clear and my department was encouraging me to take the plunge. This institutional encouragement and support was crucial as successfully working in this field made it harder to publish in the mainstream statistical journals; an accomplishment that had traditionally been heavily weighted in the promotion process. The message was clear: having an immediate impact on<br />
specific scientific fields would be rewarded as much as mathematically rigorous methods with general applicability.</p>
<p style="text-align: left;">As with my thesis applications, it was clear that to solve some of the challenges posed by microarray data I would have to learn all about the technology. For this I organized a sabbatical with Terry Speed's group in Melbourne where they helped me accomplish this goal. During this visit I reaffirmed my preference for attacking applied problems with simple statistical methods, as opposed to overcomplicated ones or developing new techniques. Learning that deciphering clever ways of putting the existing statistical toolbox to work was good enough for an accomplished statistician like Terry gave me the necessary confidence to continue working this way. More than a decade later this continues to be my approach to applied statistics. This approach has been instrumental for some of my current collaborative work. In particular, it led to important new biological discoveries made together with Andy Feinberg’s lab [<a href="#Xirizarry2009human">7</a>].</p>
<p class="indent" style="text-align: left;">During my sabbatical we developed preliminary solutions that improved precision and aided in the removal of systematic biases for microarray data [<a href="#Xirizarry2003exploration">6</a>]. I was aware that hundreds, if not thousands, of other scientists were facing the same problematic data and were searching for solutions. Therefore I was also thinking hard about ways in which I could share whatever solutions I developed with others. During this time I received an email from Robert Gentleman asking if I was interested in joining a new software project for the delivery of statistical methods for genomics data. This new collaboration eventually became the Bioconductor project, (<a href="http://www.bioconductor.org">http://www.bioconductor.org</a>) which to this day continues to grow its user and developer base [<a href="#Xgentleman2004bioconductor">4</a>]. Bioconductor was the perfect vehicle for having the impact that my department had encouraged me to seek. With Ben Bolstad and others we wrote an R package that has been downloaded tens of thousands of times [<a href="#Xgautier2004affy">3</a>]. Without the availability of software, the statistical method would not have received nearly as much attention. This lesson served me well throughout my career, as developing software packages has greatly helped disseminate my statistical ideas. The fact that my department and school rewarded software publications provided important support. <!--l. 159--></p>
<p class="indent" style="text-align: left;">The impact statisticians have had in genomics is just one example of our fields accomplishment in the 21st century. In academia, the number of statistician becoming leaders in fields like environmental sciences, human genetics, genomics, and social sciences continues to grow. Outside of academia, Sabermetrics has become a standard approach in several sports (not just baseball) and inspired the Hollywood movie Money Ball. A PhD Statistician led the team that won the Netflix million dollar prize [<a class="url" style="font-size: 16px;" href="http://www.netflixprize.com/">http://www.netflixprize.com/</a>]. Nate Silver <a class="url" style="font-size: 16px;" href="http://mashable.com/2012/11/07/nate-silver-wins/">http://mashable.com/2012/11/07/nate-silver-wins/</a> proved the pundits wrong by once again using statistical models to predict election results almost perfectly. R has become a widely used programming language. It is no surprise that Statistics majors at Harvard have more than quadrupled since 2000 <a class="url" style="font-size: 16px;" href="http://nesterko.com/visuals/statconcpred2012-with-dm/">http://nesterko.com/visuals/statconcpred2012-with-dm/</a> and that statistics MOOCs are among the most popular <a class="url" style="font-size: 16px;" href="http://edudemic.com/2012/12/the-11-most-popular-open-online-courses/">http://edudemic.com/2012/12/the-11-most-popular-open-online-courses/</a>.</p>
<p class="indent" style="text-align: left;">The unprecedented advance in digital technology during the second half of the 20th century has produced a measurement revolution that is transforming science. Scientific fields that have traditionally relied upon simple data analysis techniques have been turned on their heads by these technologies. Furthermore, advances such as these have brought about a shift from hypothesis to discovery-driven research. However, interpreting information extracted from these massive and complex datasets requires sophisticated statistical skills as one can easily be fooled by patterns that arise by chance. This has greatly elevated the importance of our discipline in biomedical research. <!--l. 186--></p>
<p class="indent" style="text-align: left;">I think that the data revolution is just getting started. Datasets are currently being, or have already been, collected that contain, hidden in their complexity, important truths waiting to be discovered. These discoveries will increase the scientific understanding of our world. Statisticians should be excited and ready to play an important role in the new scientific renaissance driven by the measurement revolution.</p>
<h2 class="likechapterHead" style="text-align: left;"><a id="x1-20001"></a>Bibliography</h2>
<div class="thebibliography">
<p class="bibitem" style="text-align: left;">[1]   <a id="Xcrone2001electrocorticographic"></a>NE Crone, L Hao, J Hart, D Boatman, RP Lesser, R Irizarry, and<br />
B Gordon. Electrocorticographic gamma activity during word production<br />
in spoken and sign language. Neurology, 57(11):2045–2053, 2001.</p>
<p class="bibitem" style="text-align: left;">[2]   <a id="Xdipietro2001cross"></a>Janet A DiPietro, Rafael A Irizarry, Melissa Hawkins, Kathleen A<br />
Costigan, and Eva K Pressman. Cross-correlation of fetal cardiac and<br />
somatic activity as an indicator of antenatal neural development. American<br />
journal of obstetrics and gynecology, 185(6):1421–1428, 2001.</p>
<p class="bibitem" style="text-align: left;">[3]   <a id="Xgautier2004affy"></a>Laurent Gautier, Leslie Cope, Benjamin M Bolstad, and Rafael A<br />
Irizarry. affyanalysis of affymetrix genechip data at the probe level.<br />
Bioinformatics, 20(3):307–315, 2004.</p>
<p class="bibitem" style="text-align: left;">[4]   <a id="Xgentleman2004bioconductor"></a>Robert C Gentleman, Vincent J Carey, Douglas M Bates, Ben Bolstad,<br />
Marcel Dettling, Sandrine Dudoit, Byron Ellis, Laurent Gautier, Yongchao<br />
Ge, Jeff Gentry, et al. Bioconductor: open software development for<br />
computational biology and bioinformatics. Genome biology, 5(10):R80, 2004.</p>
<p class="bibitem" style="text-align: left;">[5]   <a id="Xirizarry2001local"></a>Rafael A Irizarry. Local harmonic estimation in musical sound signals.<br />
Journal of the American Statistical Association, 96(454):357–367, 2001.</p>
<p class="bibitem" style="text-align: left;">[6]   <a id="Xirizarry2003exploration"></a>Rafael A Irizarry, Bridget Hobbs, Francois Collin, Yasmin D<br />
Beazer-Barclay, Kristen J Antonellis, Uwe Scherf, and Terence P Speed.<br />
Exploration, normalization, and summaries of high density oligonucleotide<br />
array probe level data. Biostatistics, 4(2):249–264, 2003.</p>
<p class="bibitem" style="text-align: left;">[7]   <a id="Xirizarry2009human"></a>Rafael A Irizarry, Christine Ladd-Acosta, Bo Wen, Zhijin Wu, Carolina<br />
Montano, Patrick Onyango, Hengmi Cui, Kevin Gabo, Michael Rongione,<br />
Maree Webster, et al. The human colon cancer methylome shows similar<br />
hypo-and hypermethylation at conserved tissue-specific cpg island shores.<br />
Nature genetics, 41(2):178–186, 2009.</p>
<p class="bibitem" style="text-align: left;">[8]   <a id="Xirizarry2001assessing"></a>Rafael A Irizarry, Clarke Tankersley, Robert Frank, and Susan<br />
Flanders. Assessing homeostasis through circadian patterns. Biometrics,<br />
57(4):1228–1237, 2001.</p>
<p class="bibitem" style="text-align: left;">[9]   <a id="Xvan2002gene"></a>Laura J van’t Veer, Hongyue Dai, Marc J Van De Vijver, Yudong D<br />
He, Augustinus AM Hart, Mao Mao, Hans L Peterse, Karin van der Kooy,<br />
Matthew J Marton, Anke T Witteveen, et al. Gene expression profiling<br />
predicts clinical outcome of breast cancer. nature, 415(6871):530–536, 2002.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://simplystatistics.org/2013/05/15/the-bright-future-of-applied-statistics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sunday data/statistics link roundup (5/12/2013, Mother&#039;s Day!)</title>
		<link>http://simplystatistics.org/2013/05/12/sunday-datastatistics-link-roundup-5122013-mothers-day/</link>
		<comments>http://simplystatistics.org/2013/05/12/sunday-datastatistics-link-roundup-5122013-mothers-day/#comments</comments>
		<pubDate>Mon, 13 May 2013 02:29:17 +0000</pubDate>
		<dc:creator>Jeff Leek</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://simplystatistics.org/?p=1324</guid>
		<description><![CDATA[A tutorial on deep-learning, I really enjoyed reading it, but I'm still trying to figure out how this is different than non-linear logistic regression to estimate features then supervised prediction using those features? Or maybe I'm just naive.... Rafa on &#8230; <a href="http://simplystatistics.org/2013/05/12/sunday-datastatistics-link-roundup-5122013-mothers-day/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<ol>
<li><span style="line-height: 16px;">A tutorial on <a href="http://deeplearning.net/tutorial/">deep-learning</a>, I really enjoyed reading it, but I'm still trying to figure out how this is different than non-linear logistic regression to estimate features then supervised prediction using those features? Or maybe I'm just naive....</span></li>
<li>Rafa on <a href="http://www.80grados.net/la-importancia-de-la-autonomia-politica-para-las-ciencias/">political autonomy for science</a> for a blog in PR called <a href="http://www.80grados.net/">80 grados. </a> He writes about Rep. Lamar Smith and then focuses more closely on issues related to the University of Puerto Rico. A very nice read. (via Rafa)</li>
<li><a href="http://deadspin.com/infographic-is-your-states-highest-paid-employee-a-co-489635228">Highest paid employees by state</a>. I should have coached football...</li>
<li><a href="http://www.motherjones.com/kevin-drum/2013/05/groundbreaking-isaac-newton-invention-youve-never-heard">Newton took the mean.</a> It warms my empirical heart to hear about how the theoretical result was backed up by averaging (via David S.)</li>
<li>Reinhart and Rogoff <a href="http://www.cnbc.com/id/100721630">publish a correction but stand by their original claims</a>. I'm not sure whether this is a good or a bad thing. But it definitely is an overall win for reproducibility.</li>
<li>Statesy folks are getting some much-deserved attention. Terry Speed is a <a href="http://royalsociety.org/people/terence-speed/">Fellow of the Royal Society</a>, Peter Hall is a<a href="http://www.nasonline.org/news-and-multimedia/news/2013_04_30_NAS_Election.html"> foreign associate of the NAS</a>, Gareth Roberts is <a href="http://royalsociety.org/people/gareth-roberts/">also a Fellow of the Royal Society</a> (via Peter H.)</li>
<li><a href="http://www.nytimes.com/2013/05/06/business/media/solving-equation-of-a-hit-film-script-with-data.html?src=rechp&amp;_r=1&amp;">Statisticians go to the movies </a>and <a href="http://well.blogs.nytimes.com/2013/05/08/are-hot-hands-in-sports-for-real/">the hot hand analysis makes the NY Times</a> (via Dan S.)</li>
</ol>
<p><strong>Bonus Link! </strong> Karl B.'s Github <a href="http://www.statsblogs.com/2013/05/10/tutorials-on-gitgithub-and-gnu-make/">tutorial is awesome</a> and every statistician should be required to read it. I only ask why he gives all the love to Nacho's admittedly awesome <a href="https://github.com/nachocab/clickme">Clickme package</a> and no love to <a href="http://healthvis.org/">healthvis</a>, we are on <a href="https://github.com/hcorrada/healthvis">Github too</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://simplystatistics.org/2013/05/12/sunday-datastatistics-link-roundup-5122013-mothers-day/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A Shiny web app to find out how much medical procedures cost in your state.</title>
		<link>http://simplystatistics.org/2013/05/08/a-shiny-web-app-to-find-out-how-much-medical-procedures-cost-in-your-state/</link>
		<comments>http://simplystatistics.org/2013/05/08/a-shiny-web-app-to-find-out-how-much-medical-procedures-cost-in-your-state/#comments</comments>
		<pubDate>Wed, 08 May 2013 21:09:08 +0000</pubDate>
		<dc:creator>Jeff Leek</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[Huffingtonpost]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Rstudio]]></category>
		<category><![CDATA[shiny]]></category>

		<guid isPermaLink="false">http://simplystatistics.org/?p=1316</guid>
		<description><![CDATA[Today the front page of the Huffington Post featured the new data available from the CMS that shows the cost of many popular procedures broken down by hospital. We here at Simply Statistics think you should be able to explore &#8230; <a href="http://simplystatistics.org/2013/05/08/a-shiny-web-app-to-find-out-how-much-medical-procedures-cost-in-your-state/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Today the <a href="http://www.huffingtonpost.com/2013/05/08/hospital-prices-cost-differences_n_3232678.html">front page of the Huffington Post featured</a> the <a href="https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/index.html">new data available from the CMS </a>that shows the cost of many popular procedures broken down by hospital. We here at Simply Statistics think you should be able to explore these data more easily. So we asked <a href="http://biostat.jhsph.edu/~jmuschel/">John Muschelli </a>to help us build a Shiny App that allows you to interact with these data. You can choose your state and your procedure and see how much the procedure costs at hospitals in your state. It takes a second to load because it is a lot of data....</p>
<p><a href="http://glimmer.rstudio.com/muschellij2/Shiny_Health_Data/">Here is the link the app. </a></p>
<p>Here are some screenshots for intracranial hemmhorage for the US and for Idaho.</p>
<p><a href="http://simplystatistics.org/2013/05/08/a-shiny-web-app-to-find-out-how-much-medical-procedures-cost-in-your-state/screen-shot-2013-05-08-at-4-57-56-pm/" rel="attachment wp-att-1317"><img class="alignnone size-full wp-image-1317" alt="Screen Shot 2013-05-08 at 4.57.56 PM" src="http://simplystatistics.org/wp-content/uploads/2013/05/Screen-Shot-2013-05-08-at-4.57.56-PM.png" width="516" height="439" /></a><a href="http://simplystatistics.org/2013/05/08/a-shiny-web-app-to-find-out-how-much-medical-procedures-cost-in-your-state/screen-shot-2013-05-08-at-4-58-09-pm/" rel="attachment wp-att-1318"><img class="alignnone size-full wp-image-1318" alt="Screen Shot 2013-05-08 at 4.58.09 PM" src="http://simplystatistics.org/wp-content/uploads/2013/05/Screen-Shot-2013-05-08-at-4.58.09-PM.png" width="549" height="460" /></a>\</p>
<p><a href="https://github.com/muschellij2/Shiny_Health_Data">The R code is here</a> if you want to tweak/modify.</p>
]]></content:encoded>
			<wfw:commentRss>http://simplystatistics.org/2013/05/08/a-shiny-web-app-to-find-out-how-much-medical-procedures-cost-in-your-state/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Why the current over-pessimism about science is the perfect confirmation bias vehicle and we should proceed rationally</title>
		<link>http://simplystatistics.org/2013/05/06/why-the-current-over-pessimism-about-science-is-the-perfect-confirmation-bias-vehicle-and-we-should-proceed-rationally/</link>
		<comments>http://simplystatistics.org/2013/05/06/why-the-current-over-pessimism-about-science-is-the-perfect-confirmation-bias-vehicle-and-we-should-proceed-rationally/#comments</comments>
		<pubDate>Mon, 06 May 2013 18:30:41 +0000</pubDate>
		<dc:creator>Jeff Leek</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://simplystatistics.org/?p=1261</guid>
		<description><![CDATA[Recently there have been some high profile flameouts in scientific research. A couple examples include the Duke saga, the replication issues in social sciences, p-value hacking, fabricated data, not enough open-access publication, and on and on. Some of these results &#8230; <a href="http://simplystatistics.org/2013/05/06/why-the-current-over-pessimism-about-science-is-the-perfect-confirmation-bias-vehicle-and-we-should-proceed-rationally/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Recently there have been some high profile flameouts in scientific research. A couple examples include <a href="http://simplystatistics.org/2012/02/27/the-duke-saga-starter-set/">the Duke saga</a>, <a href="http://simplystatistics.org/2012/07/03/replication-and-validation-in-omics-studies-just-as/">the replication issues in social sciences</a>, <a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1850704">p-value hacking</a>, <a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2114571&amp;http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2114571">fabricated data</a>, <a href="http://www.michaeleisen.org/blog/?p=1312">not enough open-access publication</a>, and on and on.</p>
<p>Some of these results have had major non-scientific consequences, which is the reason they have drawn so much attention both inside and outside of the academic community. For example, the Duke saga <a href="http://www.nytimes.com/2011/07/08/health/research/08genes.html?_r=0">led to the end of in-progress clinical trials </a>, the lack of replication has led to high-profile arguments between scientists in <a href="http://blogs.discovermagazine.com/notrocketscience/?p=7765#.UYfhJitKnKo">Discover</a> and <a href="http://www.nature.com/news/replication-studies-bad-copy-1.10634">Nature</a> among other outlets, and the <a href="http://www.businessinsider.com/why-the-reinhart-rogoff-excel-debacle-could-be-devastating-for-the-austerity-movement-2013-4">whole of austerity is under question </a>(sometimes <a href="http://www.colbertnation.com/the-colbert-report-videos/425748/april-23-2013/austerity-s-spreadsheet-error">comically</a>) <a href="http://simplystatistics.org/2013/04/19/podcast-7-reinhart-rogoff-reproducibility/">because of a lack of reproducibility</a>.</p>
<p>The result of this high-profile attention is that there is a movement on to "<a href="http://www.newyorker.com/online/blogs/newsdesk/2012/12/cleaning-up-science.html">clean up science</a>". As has been pointed out, there is a group of scientists who are making names for themselves primarily as critics of what is wrong with the scientific process. The good news is that these key players are calling attention to issues: reproducibility, replicability, and open access, among others, that are critically important for the scientific enterprise.</p>
<p>I too am concerned about these issues and have altered my own research process to try to address them for my own research group.  I also think that the solutions others have proposed on a larger scale like <a href="http://www.alltrials.net/">alltrials.net</a> or <a href="http://www.plos.org/">PLoS</a> are great advances for the scientific community.</p>
<p>I am also very worried that people are using a few high-profile cases to hyperventilate about the real, solvable, and recognized problems in the scientific process These people get credit and a lot of attention for pointing out how science is "failing". But they aren't giving proportional time to all of the incredible success stories we have had, both in performing research and in reforming research with reproducibility, open access, and replication initiatives.</p>
<p>We should recognize that science is hard and even dedicated, diligent, and honest scientists will make mistakes , perform irreproducible or irreplicable studies, or publish in closed access journals.  Sometimes this is because of ignorance of good research principles, sometimes it is because people are new to working in a world where data/computation are a major player, and some will be because it is legitimately, really hard to make real advances in science. I think people who participate in real science recognize these problems and are eager to solve them. I also have noticed that real scientists generally try to propose a solution when they complain about these issues.</p>
<p>But it seems like sometimes people use these high-profile mistakes out of context to push their own scientific pet peeves. For example:</p>
<ol>
<li><strong>I don't like p-values and there are lots of results that fail to replicate so it must be the fault of p-values.</strong>  Many studies fail to replicate not because the researchers used p-values, but because they performed studies that were either weak or had poorly understood scientific mechanisms.</li>
<li><strong>I don't like not being able to access people's code so lack of reproducibility is causing science to fail. </strong>Even in the two most infamous cases (Potti and Reinhart - Rogoff) the problem with the science wasn't reproducibility - it was that the analysis was incorrect/flawed. Reproducibility compounded the problem but wasn't the root cause of the problem.</li>
<li><strong>I don't like not being able to access scientific papers so closed-access journals are evil. </strong>For whatever reason (I don't know if I understand why) it is expensive to publish journals. Clearly, because <a href="http://simplystatistics.org/2011/11/03/free-access-publishing-is-awesome-but-expensive-how/">publishing open access is expensive</a> and closed access journals are expensive. If I'm a junior researcher, I'll definitely post my preprints online, but I also want papers in "good" journals and don't have a ton of grant money, so sometimes I'll choose close access.</li>
<li><strong>I don't like these crazy headlines from social psychology (substitute other field here) and there have been some that haven't replicated, so none must replicate. </strong>Of course some papers won't replicate, including even high profile papers. If you are doing statistics, then by definition some papers won't replicate since you have to make a decision on noisy data.</li>
</ol>
<p>These are just a few examples where I feel like a basic, fixable flaw in science has been used to justify a hugely pessimistic view of science in general. I'm not saying it is all rainbows and unicorns. Of course we want to improve the process. But I'm worried that the rational reasonable problems we have, with enough hyperbole, will make it look like the scientific process "sky is falling" and will leave the door open for individuals like Rep. Lamar Smith to come in and <a href="http://www.huffingtonpost.com/2013/04/30/lamar-smith-science-peer-review_n_3189107.html?utm_hp_ref=politics">turn the scientific process into a political one</a>.</p>
<p>P.S. <a href="http://andrewgelman.com/2013/05/06/against-optimism-about-social-science/#more-18943">Andrew Gelman</a> posted on a similar topic yesterday as well.. He argues the case for less optimism and to make sure we don't stay complacent. He added a P.S. and mentioned two points on which we can agree: (1) science is hard and is a human system and we are working to fix the flaws inherent in such systems and (2) that it is still easier to publish as splashy claim than to publish a correction. I do definitely agree with both. I think Gelman would also likely agree that we need to be careful about <a href="http://simplystatistics.org/2013/04/30/reproducibility-and-reciprocity/">reciprocity</a> with these issues. If earnest scientists work hard to address reproducibility, replicability, open access, etc. then people who criticize them should have to work just as hard to justify their critiques. Just because it is a critique doesn't mean it should automatically get the same treatment as the original paper.</p>
]]></content:encoded>
			<wfw:commentRss>http://simplystatistics.org/2013/05/06/why-the-current-over-pessimism-about-science-is-the-perfect-confirmation-bias-vehicle-and-we-should-proceed-rationally/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Talking about MOOCs on MPT Direct Connection</title>
		<link>http://simplystatistics.org/2013/05/06/talking-about-moocs-on-mpt-direct-connection/</link>
		<comments>http://simplystatistics.org/2013/05/06/talking-about-moocs-on-mpt-direct-connection/#comments</comments>
		<pubDate>Mon, 06 May 2013 13:01:06 +0000</pubDate>
		<dc:creator>Roger Peng</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[MOOC]]></category>

		<guid isPermaLink="false">http://simplystatistics.org/?p=1257</guid>
		<description><![CDATA[Watch Monday, April 29, 2013 on PBS. See more from Direct Connection. I appeared on Maryland Public Television's Direct Connection with Jeff Salkin last Monday to talk about MOOCs (along with our Dean Mike Klag).]]></description>
				<content:encoded><![CDATA[<p><object width="512" height="328" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" bgcolor="#000000"><param name="flashvars" value="video=http://video.mpt.tv/videoPlayerInfo/2365006588&amp;player=viral&amp;end=0" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="wmode" value="transparent" /><param name="src" value="http://dgjigvacl6ipj.cloudfront.net/media/swf/PBSPlayer.swf" /><param name="allowfullscreen" value="true" /><embed width="512" height="328" type="application/x-shockwave-flash" src="http://dgjigvacl6ipj.cloudfront.net/media/swf/PBSPlayer.swf" flashvars="video=http://video.mpt.tv/videoPlayerInfo/2365006588&amp;player=viral&amp;end=0" allowFullScreen="true" allowscriptaccess="always" wmode="transparent" allowfullscreen="true" bgcolor="#000000" /></object></p>
<p style="font-size: 11px; font-family: Arial, Helvetica, sans-serif; color: #808080; margin-top: 5px; background: transparent; text-align: center; width: 512px;">Watch <a style="text-decoration: none !important; font-weight: normal !important; height: 13px; color: #4eb2fe !important;" href="http://video.mpt.tv/video/2365006588" target="_blank">Monday, April 29, 2013</a> on PBS. See more from <a style="text-decoration: none !important; font-weight: normal !important; height: 13px; color: #4eb2fe !important;" href="http://www.mpt.org/dc" target="_blank">Direct Connection.</a></p>
<p>I appeared on Maryland Public Television's Direct Connection with Jeff Salkin last Monday to talk about MOOCs (along with our Dean Mike Klag).</p>
]]></content:encoded>
			<wfw:commentRss>http://simplystatistics.org/2013/05/06/talking-about-moocs-on-mpt-direct-connection/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reproducibility at Nature</title>
		<link>http://simplystatistics.org/2013/05/02/reproducibility-at-nature/</link>
		<comments>http://simplystatistics.org/2013/05/02/reproducibility-at-nature/#comments</comments>
		<pubDate>Thu, 02 May 2013 21:22:32 +0000</pubDate>
		<dc:creator>Roger Peng</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://simplystatistics.org/?p=1255</guid>
		<description><![CDATA[Nature has jumped on to the reproducibility bandwagon and has announced a new approach to improving reproducibility of submitted papers. The new effort is focused primarily and methodology, including statistics, and in making sure that it is clear what an &#8230; <a href="http://simplystatistics.org/2013/05/02/reproducibility-at-nature/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Nature has jumped on to the reproducibility bandwagon and has <a href="http://www.nature.com/news/announcement-reducing-our-irreproducibility-1.12852">announced</a> a new approach to improving reproducibility of submitted papers. The new effort is focused primarily and methodology, including statistics, and in making sure that it is clear what an author has done.</p>
<blockquote><p>To ease the interpretation and improve the reliability of published results we will more systematically ensure that key methodological details are reported, and we will give more space to methods sections. We will examine statistics more closely and encourage authors to be transparent, for example by including their raw data.</p></blockquote>
<p>To this end they have created a <a href="http://www.nature.com/authors/policies/checklist.pdf">checklist</a> for highlighting key aspects that need to be clear in the manuscript. A number of these points are statistical, and two specifically highlight data deposition and computer code availability. I think an important change is the following:</p>
<blockquote><p>To allow authors to describe their experimental design and methods in as much detail as necessary, the participating journals, including <i>Nature</i>, will abolish space restrictions on the methods section.</p></blockquote>
<p>I think this is particularly important because of the message it sends. Most journals have overall space limitations and some journals even have specific limits on the Methods section. This sends a clear message that "methods aren't important, results are". Removing space limits on the Methods section will allow people to just say what they actually did, rather than figure out some tortured way to summarize years of work into a smattering of key words.</p>
<p>I think this is a great step forward by a leading journal. The next step will be for Nature to stick to it and make sure that authors live up to their end of the bargain.</p>
]]></content:encoded>
			<wfw:commentRss>http://simplystatistics.org/2013/05/02/reproducibility-at-nature/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Reproducibility and reciprocity</title>
		<link>http://simplystatistics.org/2013/04/30/reproducibility-and-reciprocity/</link>
		<comments>http://simplystatistics.org/2013/04/30/reproducibility-and-reciprocity/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 13:58:47 +0000</pubDate>
		<dc:creator>Roger Peng</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://simplystatistics.org/?p=1246</guid>
		<description><![CDATA[One element about the entire discussion about reproducible research that I haven't seen talked about very much is the potential for the lack of reciprocity. I think even if scientists were not concerned about the possibility of getting scooped by &#8230; <a href="http://simplystatistics.org/2013/04/30/reproducibility-and-reciprocity/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>One element about the entire discussion about reproducible research that I haven't seen talked about very much is the potential for the lack of reciprocity. I think even if scientists were not concerned about the possibility of getting scooped by others by making their data/code available this issue would be sufficient to give people pause about making their work reproducible.</p>
<p>What do I mean by reciprocity? Consider the following (made up) scenario:</p>
<ol>
<li>I conduct a study (say, a randomized controlled trial, for concreteness) that I register at clinicaltrials.gov beforehand and specify details about the study like the design, purpose, and primary and secondary outcomes.</li>
<li>I rigorously conduct the study, ensuring safety and privacy of subjects, collect the data, and analyze the data.</li>
<li>I publish the results for the primary and secondary outcomes in the peer-reviewed literature where I describe how the study was conducted and the statistical methods that were used. For the sake of concreteness, let's say the results were "significant" by whatever definition of significant you care to use and that the paper was highly influential.</li>
<li>Along with publishing the paper I make the analytic dataset and computer code available so that others can look at what I did and, if they want, reproduce the result.</li>
</ol>
<p>So far so good right? It seems this would be a great result for any study. Now consider the following possible scenarios:</p>
<ol>
<li>Someone obtains the data and the code from the web site where it is hosted, analyzes it, and then publishes a note claiming that the intervention negatively affected a different outcome not described in the original study (i.e. not one of the primary or secondary outcomes).</li>
<li>A second person obtains the data, analyzes it, and then publishes a note on the web claiming that the intervention was ineffective for the primary outcome in a the subset of participants that were male.</li>
<li>A third person obtains the data, analyzes the data, and then publishes a note on the web saying that the study is flawed and that the original results of the paper are incorrect. No code, data, or details of their methods are given.</li>
</ol>
<p>Now, how should one react to the follow-up note claiming the study was flawed? It's easy to imagine a spectrum of possible responses ranging from accusations of fraud to staunch defenses of the original study. Because the original study was influential, there is likely to be a kerfuffle either way.</p>
<p>But what's the problem with the three follow-up scenarios described? The one thing that they have in common is that none of the three responding people were subjected to the same standards to which the original investigator (me) was subjected. I was required to register my trial and state the outcomes in advance. In an ideal world you might argue I should have stated my hypotheses in advance too. That's fine, but the point is that the people analyzing the data subsequently were not required to do any of this. Why should they be held to a lower standard of scrutiny?</p>
<p>The first person analyzed a different outcome that was not a primary or secondary outcome. How many outcomes did they test before the came to that one negatively significant one? The second person examined a subset of the participants. Was the study designed (or powered) to look at this subset? Probably not. The third person claims fraud, but does not provide any details of what they did.</p>
<p>I think it's easy to take care of the third person--just require that they make their work reproducible too. That way we can all see what they did and verify that there was in fact fraud. But the first two people are a little more difficult. If there are no barriers to obtaining the data, then they can just get the data and run a bunch of analyses. If the results don't go their way, they can just move on and no one would be the wiser. If they did, they can try to publish something.</p>
<p>What I think a good reproducibility policy should have is a type of "viral" clause. For example, the GNU General Public License (GPL) is an open source software license that requires, among other things, that anyone who writes their own software, but links to or integrates software covered under the GPL, must publish their software under the GPL too. This "viral" requirement ensures that people cannot make use of the efforts of the open source community without also giving back to that community. There have been numerous heated discussions in the software community regarding the pros and cons of such a clause, with (large) commercial software developers often coming down against it. Open source developers have largely beens skeptical of the arguments of large commercial developers, claiming that those companies simply want to "steal" open source software and/or maintain their dominance.</p>
<p>I think it is important that if we are going to make reproducibility the norm in science, that we have analogous "viral" clauses to ensure that everyone is held to the same standard. This is particularly important in policy-relevant or in politically sensitive subject areas where there are often parties involved who have essentially no interest (and are in fact paid to have no interest) in holding themselves to the same standard of scientific conduct.</p>
<p>Richard Stallman was right to assume that without the <a href="http://en.wikipedia.org/wiki/Copyleft">copyleft clause</a> in the GPL that large commercial interests would simply usurp the work of the free software community and essentially crush it before it got started. Reproducibility needs its own version of copyleft or else scientists will be left to defend themselves against unscrupulous individuals who are not held to the same standard.</p>
]]></content:encoded>
			<wfw:commentRss>http://simplystatistics.org/2013/04/30/reproducibility-and-reciprocity/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Sunday data/statistics link roundup (4/28/2013)</title>
		<link>http://simplystatistics.org/2013/04/28/sunday-datastatistics-link-roundup-4282013/</link>
		<comments>http://simplystatistics.org/2013/04/28/sunday-datastatistics-link-roundup-4282013/#comments</comments>
		<pubDate>Mon, 29 Apr 2013 02:31:21 +0000</pubDate>
		<dc:creator>Jeff Leek</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://simplystatistics.org/?p=1242</guid>
		<description><![CDATA[What it feels like to be bad at math. My personal experience like this culminated in some difficulties with Green's functions back in my early days at USU. I think almost everybody who does enough math eventually runs into a &#8230; <a href="http://simplystatistics.org/2013/04/28/sunday-datastatistics-link-roundup-4282013/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<ol>
<li><a href="http://mathwithbaddrawings.com/2013/04/25/were-all-bad-at-math-1-i-feel-stupid-too/">What it feels like to be bad at math</a>. My personal experience like this culminated in some difficulties <a href="http://en.wikipedia.org/wiki/Green's_function">with Green's functions</a> back in my early days at USU. I think almost everybody who does enough math eventually runs into a situation where they don't understand what is going on and it stresses them out.</li>
<li><a href="http://www.nytimes.com/2013/04/28/technology/how-big-data-is-playing-recruiter-for-specialized-workers.html?_r=0">An article</a> about companies that are using data to try to identify people for jobs (via Rafa).</li>
<li><a href="http://www.forbes.com/sites/davidleinweber/2013/04/26/big-data-gets-bigger-now-google-trends-can-predict-the-market/">Google trends for predicting the market</a>. I'm not sure that "predicting" is the right word here. I think a better word might be "explaining/associating". I also wonder if <a href="http://www.nature.com/news/when-google-got-flu-wrong-1.12413">this could go off the rails</a>.</li>
<li>This article <a href="http://www.r-bloggers.com/faster-higher-stonger-a-guide-to-speeding-up-r-code-for-busy-people/?utm_source=feedly&amp;utm_medium=feed&amp;utm_campaign=Feed:+RBloggers+(R+bloggers)">is ridiculously useful </a>in terms of describing the ways that you can speed up R code. My favorite part of it is that it starts with the "why". Exactly. <a href="http://en.wikiquote.org/wiki/Donald_Knuth">Premature optimization is the root of all evi</a>l.</li>
<li><a href="http://blog.mortardata.com/post/47549853491/data-science-at-tumblr">A discussion of data science at Tumblr</a>. The author/speaker <a href="http://www.adamlaiacano.com/">also has a great blog</a>.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://simplystatistics.org/2013/04/28/sunday-datastatistics-link-roundup-4282013/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
