Science and Celera

From: George Grills (grills@aecom.yu.edu)
Date: Sat Dec 16 2000 - 00:39:50 EST


>To: mol-evol@net.bio.net
>Newsgroups: bionet.molbio.evolution
>Date: 14 Dec 2000 15:57:12 -0000
>From: James McInerney <james.o.mcinerney@may.ie>
>Subject: Science and Celera
>Sender: owner-mol-evol@hgmp.mrc.ac.uk
>
>Dear all,
>
>I have had this message passed to me. I hope the authors don't mind me
>passing it on to you.
>
>James
>
>--
>Dr. James O. McInerney,
>Department of Biology,
>National University of Ireland,
>Maynooth,
>Co. Kildare,
>Ireland.
>+353 1 708 3860
>+353 1 708 3845
>http://www.may.ie/academic/biology/jmbioinformatics.shtml
>http://www.bioinf.org/
>
>
>
>====snip=====
>
>Dear fellow bioinformatics developers:
>
>By now you have probably heard that Celera Genomics has submitted
>their human genome paper to the journal Science. Science and Celera
>have agreed to special terms for the release of the human genome
>sequence data. It will be made available through the Celera website,
>and will not be submitted to the international DNA database consortium
>(GenBank, EMBL and DDBJ). Science's statement regarding the agreement
>is at:
>
>http://www.sciencemag.org/feature/data/announcement/genomesequenceplan.shl
>
>All major journals, including Science, have a policy of deposition of
>sequence data with the "appropriate data bank". The accepted community
>standard is submission to GenBank/EMBL/DDBJ. The reason for this
>deposition is to make the results of the work openly available for
>future research. This principle was specifically mentioned in the
>Clinton/Blair statement on human genome sequencing -
> http://www.usinfo.state.gov/topical/global/biotech/00031401.htm
>- - who strongly upheld the view that "unencumbered access" to genome
>data was critical.
>
>The terms of the Celera/Science agreement will give us access to the
>genome sequence, but not unencumbered access. Celera is suggesting
>publishing their data under a MTA (Material Transfer Agreement) which
>would prevent large scale downloads and incorporation of this data
>into GenBank/EMBL/DDBJ. In order to download the data, you and your
>institution will have to sign a contract guaranteeing that you will
>not "redistribute" the Celera data.
>
>Science believes that the deal is an adequate compromise because it
>provides us the right to download the data and publish our results.
>We believe Science is thinking in terms of single gene biology, not
>large scale bioinformatics. It is probably not hard for you to imagine
>scenarios in bioinformatics in which "publication" and
>"redistribution" are virtually the same thing; we cannot imagine
>Celera allowing us to incorporate data into Pfam, for example,
>nor into Ensembl.
>
>We are asking for your support in writing to Science to politely
>insist that genome sequence papers should be accompanied by
>unencumbered deposition to GenBank/EMBL/DDBJ. Please note that we have
>no issue with Celera either keeping this data unpublished for
>commercial reasons, nor with them combining their data with freely
>available data from the public genome projects. We would defend their
>right to do either. Our view is simply that the genome community has
>established a clear principle that published genome data must be
>deposited in the international databases, that bioinformatics is
>fueled by this principle, and that Science therefore threatens to set
>a precedent that undermines our research.
>
>We encourage you to express your views on this matter to Donald
>Kennedy (kennedyd@kennedyd.pobox.stanford.edu), the Editor-in-Chief of
>Science, and/or to Barbara Jasny (bjasny@aaas.org), the managing
>editor in charge of genomics papers at Science.
>
>Here is a Q/A about some points.
>
>* Why does this matter?
>
>A classic example of how our field began to have an impact on
>molecular biology was Russ Doolittle's discovery of a significant
>sequence similarity between a viral oncogene and a cellular growth
>factor receptor. Russ could not have found that result if he did not
>have an aggregate database of previously published sequences. We have
>come a long way from Russ and his son typing data into the NEWAT
>protein sequence database by hand.
>
>Throughout the 80's the international database community fought hard
>to insist that DNA sequence data be deposited into the public domain
>databases. Journals now generally require deposition as a condition of
>accepting a paper. The forming of these databases and the
>international agreements on data sharing between the European,
>American and Japanase databases fostered the rapid development of
>bioinformatics research. We now all take for granted the fact that
>large DNA databases are accessible from a single point of contact, and
>the identifiers are coordinated worldwide.
>
>Bioinformatics research relies on open data with minimal legal
>encumberances submitted to public databases. Without these databases
>there is no real substrate for bioinformatics research.
>
>* What would happen if this precedent was set?
>
>There are a number of consequences if Science set a precedent that
>allowed people to publish DNA data under a variety of MTAs.
>
>- - One would not be able to form a single DNA database on which to
> do bioinformatics research, and the derivative databases (Swissprot,
> PIR, Pfam, PROSITE, etc.) would not be legal.
>
>- - Bench biologists would have to visit a number of websites and
> possibly enter into a number of different contracts for access to DNA
> data. Unexpected informative homologies could become prohibitively
> difficult to find.
>
>- - You may need to get a legal review before you can publish
> the results of an analysis, if your analysis is large-scale and
> detailed enough that it could be reasonably interpreted as a
> "redistribution" of the primary sequence data. You could
> be sued for breach of contract for a Web Supplement page
> that discloses extensive sequence data supporting your results.
>
>- - Scientific openness will be undermined. Efforts to engage the
> community in cooperative annotation of large genomes, for instance,
> would be blocked -- we can't usefully annotate a genome we can't
>freely
> redistribute.
>
>* Celera paid for it. Can't they set their own access terms?
>
>Absolutely. We have no issue with Celera's commercial data gathering,
>and their right to set their own access terms to their data. We do
>feel, though, that scientific publications carry a certain ethical
>responsibility. The purpose of a paper is to enable the community to
>efficiently build on your work. There is always a tension between
>disclosing your work to your competitors (this is not unique to
>private companies!) and receiving scientific credit for your work via
>publication. This tension is natural, and maintaining a consistent
>and acceptable balance is the reason that scientist and journals
>establish community standards that dictate how data are required to be
>disclosed. In this case, the clearly accepted community standard is
>that DNA sequence data are deposited in Genbank/EMBL/DDBJ upon
>publication.
>
>We certainly do not blame Celera (much) for seeking a special deal
>that lets them have their cake and eat it too -- they would
>understandably like scientific credit for their terrific and important
>work in human sequencing, and they would also like a profitable
>business model.
>
>We do blame Science for failing to take a strong stand in upholding
>accepted scientific publication practices. We cannot accept that it is
>necessary to sacrifice ethics for expediency.
>
>* Science claims they are honouring their own policy. What gives?
>
>Science now claims that all their policy really requires is that
>archival data be available via a publicly accessible database. We
>think this is a conveniently revisionist view of their own policy,
>which states (in Instructions to Authors):
>
>"archival data sets (such as sequence and structural data) must be
>deposited with the appropriate data bank and the identifier code should
>be
>sent to Science for inclusion in the published manuscript (coordinates
>must be released at the time of publication)"
>
>Notice the use of the definitive article "THE appropiate data bank",
>the notion of "deposition", and the additional rider that the
>identifier code should be sent.
>
>The spirit of this statement seems clear to us. Science's statement
>anticipates that there is an appropriate, single, aggregrate community
>database for each sort of archival data, whether DNA sequence, protein
>structure coordinates, or something else. Sensibly, they don't name
>every possible database for every possible archival data set. They
>expect that recognized community standards exist. In no way does
>Science's statement seem consistent with the view that an individual
>lab could start its own "public" DNA sequence database and send a
>meaningless internal database identifier; to try to read it that way
>is a post hoc rationalisation.
>
>* What can Science do? This is a done deal.
>
>It's true that this is a done deal. Science and Celera have mutually
>agreed to the general terms of data release. But there are two ways
>that we can minimize the damage.
>
>First, the details of the agreement are not set. In particular, there
>is no definition of allowed "publication" versus prohibited
>"redistribution". Science could specify definitions that did not
>interfere with noncommercial uses of the data in bioinformatics,
>allowing us redistribution rights if it made sense in the context of
>our project (for example, a genome annotation project like Ensembl).
>
>Second, and preferably, Science -- or even the peer reviewers -- can
>uphold Science's own data access policy, and reject the paper.
>
>Incidentally, they might also choose to enforce Science's policy on
>prior publication, which states "...the main findings of a paper
>should not have been reported in the mass media. Authors are, however,
>permitted to present their data at open meetings but should not
>overtly seek media attention." If I issued a press release upon
>submission of a manuscript to Science, like Celera did, Science would
>rightly fire it back to me without review.
>
>* What can I do?
>
>Agitate. Let Science know that you care. They consider this deal to be
>a trial balloon for future genome papers. Even if we can't change the
>deal with Celera, we can try to make sure it's a one-time-only deal
>that's viewed as a Big Mistake. Write a letter to Science and tell
>them how their actions would impact your research, both in the long
>term and in the short term. Also, you can pass on this open letter to
>other bioinformatics researchers you know.
>
>Dr Sean Eddy,
>Alvin Goldfarb Professor of Computational Biology,
>Howard Hughes Medical Institute, Washington University in St. Louis,
>USA
>
>Dr Ewan Birney
>Team Leader, Genomic Annotation
>European Bioinformatics Institute, UK
>
>
>---
>
>
>
>
>



This archive was generated by hypermail 2b29 : Fri Jan 05 2001 - 08:42:46 EST