[Insight-developers] Fwd: Reproducibility: When the Reproduction Fails !

Luis Ibanez luis.ibanez at kitware.com
Sun Mar 27 06:58:58 EDT 2011


---------- Forwarded message ----------
From:   Kitware Blog at http://www.kitware.com/blog/home/post/105

Reproducibility: When the Reproduction Fails ! by Luis Ibanez (Open
Source, Data Publishing)

Just try to repeat it...!

In a recent blog post:

http://blog.stodden.net/2011/03/19/a-case-study-in-the-need-for-open-data-and-code/

Victoria Stodden elaborates on a clear example of why reproducibility
must  be a practical requirement for scientific publications, and how
open-data an open-code are essential to make reproducibility verification
a practical reality.


This is the case of a team at MD Anderson led by Keith Baggerly, that
went out an try to replicate the results of a  set of papers published
by a research group at Duke University. To their surprise, the
Baggerly team was unable to replicate the results disclosed in the
original papers, and thanks to the fact that most of the data and
software was publicly available, they were able to perform "Forensics"
on the papers and discover that they were plagued with poor practices,
inconsistent management of data, and unexplicable results.


This forensics work led to the termination of clinical trials at Duke
last November and the resignation of Anil Potti.


Among the interesting aspects of this case, is the fact that the
forensics team had to use the Freedom of Information Act (FOIA) in
order to get an investigation report that Duke sent to the National
Cancer Institute. Duke refuse to share that report with the forensic
team under the argument that it was "confidential" information.
However, given that NCI is a federal agency, is subject to comply with
the Freedom of Information Act.

________________________________
"The Importance of Reproducible Research in High Throughput Biology:
  Case Studies in Forensic Bioinformatics"
    Keith A. Baggerly,

A full video of the presentation is available here:

http://videolectures.net/cancerbioinformatics2010_baggerly_irrh/

This lecture is mandatory material for anyone who honestly cares about
scientific research.


As good practitioners,
Baggerly's team put online, publicly available, ALL THE MATERIALS
needed to replicate what they did. You can find them here:

http://bioinformatics.mdanderson.org/Supplements/ReproRsch-Chemo/

A good case of a group who practice what they preach, and teach by example.

Reports on bioinformatics work at MD Anderson are now required to be
reported in Sweave (A combination of R and LaTeX) to ensure that the
report are REPRODUCIBLE from the original data to the final results.


Memorable quotes from this talk:

"We wrote a paper on this [the fact that the results of the original
experiment couldn't be reproduced], we first circulate it to a
Biological Journal, and we got the comment back that: "We are sorry,
this story seems TOO NEGATIVE, can you FIX that ?".
"Duke administrators accomplished something monumental:
They triggered a public expression of outrage from biostatisticians".
"The most common mistakes, are simple ones"

Off-by-one in an data table / software
Mixing up the sample labels
Mixing up the gene lables...
The fun thing about simple mistakes is that if you see them, they are
easy to fix. But if the documentation (the paper report) is poor, you
will not see them.

We suspect that the simple mistakes are more common that we would like to admit.
Please "label the columns" and "provide code".
When I'm reviewing papers, I look and see

Do they tell me where the data is ?
Do they give me code ?
Do they have URLs, are the links live ?

"Reconstructing this takes a lot of time. We estimate that this took
us between 1500 and 2000 hours to figure out how all this work" (the
effort of replicating a published paper, due to the lack of details in
the processes used).
"At some stage, not only us, but also others should be able to
precisely reproduce what we did, in other words :if they start with
the same numbers they should get the same results"

________________________________
Useful Links

Google group on Reproducible Research
http://groups.google.com/group/reproducible-research
http://reproducibleresearch.net/index.php/Main_Page

________________________________
http://www.nature.com/nm/journal/v17/n1/full/nm0111-135.html

"We wish to retract this article because we have been unable to
reproduce certain crucial experiments showing validation of signatures
for predicting response to chemotherapies, including docetaxel and
topotecan. Although we believe that the underlying approach to
developing predictive signatures is valid, a corruption of several
validation data sets precludes conclusions regarding these signatures.
As these results are fundamental to the conclusions of the paper, we
formally retract the paper. We deeply regret the impact of this action
on the work of other investigators."

http://www.nature.com/nm/journal/v12/n11/abs/nm1491.html

________________________________


More information about the Insight-developers mailing list