Addressing the problems with life-science databases for traditional uses and systems biology

Nature

Select a language for the TTS:
UK English Female
UK English Male
US English Female
US English Male
Australian Female
Australian Male
Language selected: (auto detect) - EN

Play all audios:

ABSTRACT A prerequisite to systems biology is the integration of heterogeneous experimental data, which are stored in numerous life-science databases. However, a wide range of obstacles that

relate to access, handling and integration impede the efficient use of the contents of these databases. Addressing these issues will not only be essential for progress in systems biology,

it will also be crucial for sustaining the more traditional uses of life-science databases. Access through your institution Buy or subscribe This is a preview of subscription content, access

via your institution ACCESS OPTIONS Access through your institution Subscribe to this journal Receive 12 print issues and online access $209.00 per year only $17.42 per issue Learn more Buy

this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices may be subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: *

Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support SIMILAR CONTENT BEING VIEWED BY OTHERS JULIA FOR BIOLOGISTS Article 06 April 2023 RECONSTRUCTING

ORGANISMS IN SILICO: GENOME-SCALE MODELS AND THEIR EMERGING APPLICATIONS Article 21 September 2020 DIVERSIFYING THE CONCEPT OF MODEL ORGANISMS IN THE AGE OF -OMICS Article Open access 19

October 2023 REFERENCES * Kitano, H. Systems biology: a brief overview. _Science_ 295, 1662–1664 (2002). Article CAS PubMed Google Scholar * Pennisi, E. How will big pictures emerge from

a sea of biological data? _Science_ 309, 94 (2005). Article CAS PubMed Google Scholar * Roos, D. S. Computational biology. Bioinformatics — trying to swim in a sea of data. _Science_

291, 1260–1261 (2001). Article CAS PubMed Google Scholar * Augen, J. Information technology to the rescue! _Nature Biotechnol._ 19, BE39–BE40 (2001). Article CAS Google Scholar * Ge,

H., Walhout, A. J. & Vidal, M. Integrating 'omic' information: a bridge between genomics and systems biology. _Trends Genet._ 19, 551–560 (2003). Article CAS PubMed Google

Scholar * Carel, R. Practical data integration in biopharmaceutical research and development. _PharmaGenomics_ 22–35 (June 2003). * Galperin, M. Y. The Molecular Biology Database

Collection: 2006 update. _Nucleic Acids Res._ 34, D3–D5 (2006). Article CAS PubMed Google Scholar * Cerami, E. _Web services essentials_ (O'Reilly, Beijing; Sebastopol, California,

2002). Google Scholar * Sugawara, H. & Miyazaki, S. Biological SOAP servers and web services provided by the public sequence data bank. _Nucleic Acids Res._ 31, 3836–3839 (2003).

Article CAS PubMed PubMed Central Google Scholar * Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. & Hattori, M. The KEGG resource for deciphering the genome. _Nucleic Acids Res._

32, D277–D280 (2004). Article CAS PubMed PubMed Central Google Scholar * Pillai, S. et al. SOAP-based services provided by the European Bioinformatics Institute. _Nucleic Acids Res._

33, W25–W28 (2005). Article CAS PubMed PubMed Central Google Scholar * Stajich, J. E. et al. The Bioperl toolkit: Perl modules for the life sciences. _Genome Res._ 12, 1611–1618 (2002).

Article CAS PubMed PubMed Central Google Scholar * Mangalam, H. The Bio * toolkits — a brief overview. _Brief. Bioinformatics_ 3, 296–302 (2002). Article PubMed Google Scholar *

Wang, L., Riethoven, J. J. & Robinson, A. XEMBL: distributing EMBL data in XML format. _Bioinformatics_ 18, 1147–1148 (2002). Article CAS PubMed Google Scholar * Bairoch, A. et al.

The Universal Protein Resource (UniProt). _Nucleic Acids Res._ 33, D154–D159 (2005). Article CAS PubMed Google Scholar * Luciano, J. S. PAX of mind for pathway researchers. _Drug Discov.

Today_ 10, 937–942 (2005). Article CAS PubMed Google Scholar * Lloyd, C. M., Halstead, M. D. & Nielsen, P. F. CellML: its future, present and past. _Prog. Biophys. Mol. Biol._ 85,

433–450 (2004). Article CAS PubMed Google Scholar * Spellman, P. T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). _Genome Biol._ 3,

RESEARCH0046 (2002). * Orchard, S. et al. Further steps in standardisation. Report of the second annual Proteomics Standards Initiative Spring Workshop (Siena, Italy 17–20th April 2005).

_Proteomics_ 5, 3552–3555 (2005). Article CAS PubMed Google Scholar * Hucka, M. et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical

network models. _Bioinformatics_ 19, 524–531 (2003). Article CAS PubMed Google Scholar * Green, M. L. & Karp, P. D. Genome annotation errors in pathway databases due to semantic

ambiguity in partial EC numbers. _Nucleic Acids Res._ 33, 4035–4039 (2005). Article CAS PubMed PubMed Central Google Scholar * Stevens, R. et al. TAMBIS: transparent access to multiple

bioinformatics information sources. _Bioinformatics_ 16, 184–185 (2000). Article CAS PubMed Google Scholar * Köhler, J., Philippi, S. & Lange, M. SEMEDA: ontology based semantic

integration of biological databases. _Bioinformatics_ 19, 2420–2427 (2003). Article PubMed Google Scholar * Ashburner, M. et al. Gene Ontology: tool for the unification of biology. The

Gene Ontology Consortium. _Nature Genet._ 25, 25–29 (2000). Article CAS PubMed Google Scholar * Philippi, S. & Köhler, J. Using XML technology for the ontology-based semantic

integration of life science databases. _IEEE Trans. Inf. Technol. Biomed._ 8, 154–160 (2004). Article PubMed Google Scholar * NC-IUBMB. _Enzyme Nomenclature 1992: Recommendations of the

Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes_ (Academic Press, San Diego, 1992). * Wheeler, D. L.

et al. Database resources of the National Center for Biotechnology Information: update. _Nucleic Acids Res._ 32, D35–D40 (2004). Article CAS PubMed PubMed Central Google Scholar *

Hendler, J. Communication. Science and the semantic web. _Science_ 299, 520–521 (2003). Article CAS PubMed Google Scholar * Noble, D. Will genomics revolutionise pharmaceutical R&D?

_Trends Biotechnol._ 21, 333–337 (2003). Article CAS PubMed Google Scholar * Smith, B., Köhler, J. & Kumar, A. On the application of formal principles to life science data: a case

study in the gene ontology. _Proc. Data Integr. Life Sci. First Int. Workshop_ 79–94 (2004). * Zhang, S. & Bodenreider, O. Law and order: assessing and enforcing compliance with

ontological modeling principles in the Foundational Model of Anatomy. _Comput. Biol. Med._ 6 Sep 2005 (doi:10.1016/j.compbiomed.2005.04.007). * van Helden, J. et al. Representing and

analysing molecular and cellular function using the computer. _Biol. Chem._ 381, 921–935 (2000). CAS PubMed Google Scholar * Bornberg-Bauer, E. & Paton, N. W. Conceptual data

modelling for bioinformatics. _Brief. Bioinformatics_ 3, 166–180 (2002). Article CAS PubMed Google Scholar * Nelson, M. R., Reisinger, S. J. & Henry, S. G. Designing databases to

store biological information. _BioSilico_ 1, 134–142 (2003). Article CAS Google Scholar * Taylor, C. F. et al. A systematic approach to modeling, capturing, and disseminating proteomics

experimental data. _Nature Biotechnol._ 21, 247–254 (2003). Article CAS Google Scholar * Ma, Z. & Chen, J. (eds) _Database Modeling in Biology: Practices and Challenges_ (Springer, in

the press). * Karp, P. D. et al. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. _Nucleic Acids Res._ 33, 6083–6089 (2005). Article CAS PubMed PubMed

Central Google Scholar * Searls, D. B. Data integration — connecting the dots. _Nature Biotechnol._ 21, 844–845 (2003). Article CAS Google Scholar * Karp, P. D. What we do not know

about sequence analysis and sequence databases. _Bioinformatics_ 14, 753–754 (1998). Article CAS PubMed Google Scholar * Camon, E. et al. The Gene Ontology Annotation (GOA) Database:

sharing knowledge in Uniprot with Gene Ontology. _Nucleic Acids Res._ 32, D262–D266 (2004). Article CAS PubMed PubMed Central Google Scholar * Gattiker, A. et al. Automated annotation

of microbial proteomes in SWISS-PROT. _Comput. Biol. Chem._ 27, 49–58 (2003). Article CAS PubMed Google Scholar * Garcia-Berthou, E. & Alcaraz, C. Incongruence between test

statistics and P values in medical papers. _BMC Med. Res. Methodol._ 4, 13 (2004). Article PubMed PubMed Central Google Scholar * Mecham, B. H. et al. Increased measurement accuracy for

sequence-verified microarray probes. _Physiol. Genomics_ 18, 308–315 (2004). Article CAS PubMed Google Scholar * Ntzani, E. E. & Ioannidis, J. P. Predictive ability of DNA

microarrays for cancer outcomes and correlates: an empirical assessment. _Lancet_ 362, 1439–1444 (2003). Article CAS PubMed Google Scholar * Hirschhorn, J. N., Lohmueller, K., Byrne, E.

& Hirschhorn, K. A comprehensive review of genetic association studies. _Genet. Med._ 4, 45–61 (2002). Article CAS PubMed Google Scholar * Müller, H., Naumann, F. & Freytag,

J.-C. Data quality in genome databases. _Proc. Conf. Inf. Qual. (IQ 03)_ 269–284 (2003). * Iliopoulos, I. et al. Evaluation of annotation strategies using an entire genome sequence.

_Bioinformatics_ 19, 717–726 (2003). Article CAS PubMed Google Scholar * Leser, U. & Hakenberg, J. What makes a gene name? Named entity recognition in the biomedical literature.

_Brief. Bioinformatics_ 6, 357–369 (2005). Article CAS PubMed Google Scholar * Resnik, D. B. Strengthening the United States' database protection laws: balancing public access and

private control. _Sci. Eng. Ethics_ 9, 301–318 (2003). Article PubMed Google Scholar * Maurer, S. M., Hugenholtz, P. B. & Onsrud, H. J. Intellectual property. Europe's database

experiment. _Science_ 294, 789–790 (2001). Article CAS PubMed Google Scholar * Merali, Z. & Giles, J. Databases in peril. _Nature_ 435, 1010–1011 (2005). Article CAS PubMed Google

Scholar * Ellis, L. B. & Kalumbi, D. The demise of public data on the web? _Nature Biotechnol._ 16, 1323–1324 (1998). Article CAS Google Scholar * Greenbaum, D. & Gerstein, M. A

universal legal framework as a prerequisite for database interoperability. _Nature Biotechnol._ 21, 979–982 (2003). Article CAS Google Scholar * Brazma, A. et al. Minimum information

about a microarray experiment (MIAME) — toward standards for microarray data. _Nature Genet._ 29, 365–371 (2001). Article CAS PubMed Google Scholar * Bourne, P. Will a biological

database be different from a biological journal? _PLoS Comput. Biol._ 1, 179–181 (2005). CAS PubMed Google Scholar * Berman, H. M. et al. The Protein Data Bank. _Nucleic Acids Res._ 28,

235–242 (2000). Article CAS PubMed PubMed Central Google Scholar * Rother, K. et al. Columba: multidimensional data integration of protein annotations. _Proc. Data Integr. Life Sci.

First Int. Workshop_ 156–171 (2004). * Zdobnov, E. M., Lopez, R., Apweiler, R. & Etzold, T. The EBI SRS server — recent developments. _Bioinformatics_ 18, 368–373 (2002). Article CAS

PubMed Google Scholar * Haas, L. M. et al. DiscoveryLink: a system for integrated access to life sciences data sources. _IBM Syst. J._ 40, 489–511 (2001). Article Google Scholar *

Köhler, J. et al. Linking experimental results, biological networks and sequence analysis methods using Ontologies and Generalised Data Structures. _In Silico Biol._ 5, 33–44 (2004). Google

Scholar * Stein, L. D. Integrating biological databases. _Nature Rev. Genet._ 4, 337–345 (2003). Article CAS PubMed Google Scholar * Köhler, J. Integration of life science databases.

_Drug Discov. Today_ 2, 61–69 (2004). Article Google Scholar * Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. _Nucleic Acids Res._ 31, 374–378 (2003).

Article CAS PubMed PubMed Central Google Scholar * Kolchanov, N. A. et al. Transcription Regulatory Regions Database (TRRD): its status in 2002. _Nucleic Acids Res._ 30, 312–317 (2002).

Article CAS PubMed PubMed Central Google Scholar Download references ACKNOWLEDGEMENTS The authors would like to thank C. Rawlings and P. Verrier for commenting on an earlier version of

this article. Furthermore we would like to thank the following individuals for exploring with us the pitfalls of life-science databases over the past years: J. Baumbach, J. Butz, E.

Kirchem, F. Klingert, S. Knop, B. Kormeier, I. Kupp, A. Neu, A. Rüegg, A. Skusa, B. Steuernagel, J. Taubert, P. Verrier and R. Winnenburg. S.P. gratefully acknowledges funding by the

European Science Foundation. Rothamsted Research receives grant-aided support from the UK Biotechnological and Biological Science Research Council. AUTHOR INFORMATION AUTHORS AND

AFFILIATIONS * Stephan Philippi is at the Department of Computer Science, University of Koblenz, PO Box 201602, Koblenz, 56016, Germany Stephan Philippi * Jacob Köhler is at the

Biomathematics and Bioinformatics Division, Rothamsted Research, Harpenden, AL5 2JQ, Hertfordshire, UK Jacob Köhler Authors * Stephan Philippi View author publications You can also search

for this author inPubMed Google Scholar * Jacob Köhler View author publications You can also search for this author inPubMed Google Scholar CORRESPONDING AUTHOR Correspondence to Stephan

Philippi. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing financial interests. RELATED LINKS RELATED LINKS FURTHER INFORMATION BioJava BioPAX — Biological Pathways

Exchange BioPerl BioRuby CellML DiscoveryLink DNA Data Bank of Japan EC (enzyme class) numbers of the enzyme nomenclature Ensembl Trace Server European Bioinformatics Institute SRS server

European Bioinformatics Institute Extensible Markup Language (XML) Gene Ontology homepage Kyoto Encyclopedia of Genes and Genomes Microarray Gene Expression Data Society mySQL NCBI taxonomy

Nucleic Acids Research Database Categories List ONDEX Open Biomedical Ontologies Open Source Initiative License Index PostgreSQL Proteomics Standards Initiative — molecular interaction

Systems biology markup language Universal Protein Resource Web Services Activity GLOSSARY * Controlled vocabulary A standardized set of terms that can be used in a given application domain.

A prominent example is the enzyme class nomenclature, which describes classes of biochemical reaction. * Database management system A system that provides a means of storing, modifying and

extracting data from a database. * Evidence code A controlled vocabulary that is used to track the types of evidence that support a gene annotation. * Flat file Human readable,

non-standardized files that can be used to exchange the contents of life-science databases. * Ontology A commonly agreed definition of real-world concepts, such as 'protein' and

'enzyme', and their particular relationships, for example, an enzyme 'is a' protein. * Parser Software that reads a given input, such as a flat file, for further

processing. * Web service A standardized way to allow for interoperable machine-to-machine interaction over a network. * XML The extensible markup language (XML) is a standard for the

creation of application-specific, self-descriptive markup languages, which, for example, can be used for the definition of data-exchange formats. RIGHTS AND PERMISSIONS Reprints and

permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Philippi, S., Köhler, J. Addressing the problems with life-science databases for traditional uses and systems biology. _Nat Rev Genet_ 7,

482–488 (2006). https://doi.org/10.1038/nrg1872 Download citation * Published: 09 May 2006 * Issue Date: 01 June 2006 * DOI: https://doi.org/10.1038/nrg1872 SHARE THIS ARTICLE Anyone you

share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the

Springer Nature SharedIt content-sharing initiative

Us lifts laptop ban on egypt and morocco airlines

&nbspPhoto Credit:&nbspAFP Cairo: Egypt and Morocco's airlines have said a ban against carry-on laptops on ...

A simple smile on your face can offer these surprising benefits

Image credit: Representational Image SMILE It is an involuntary response to things that makes you feel happy. It is link...

Brooklyn decker and andy roddick are selling their austin home

Brooklyn Decker and Andy Roddick are ready to make yet another move in Austin. The former world No. 1 tennis champion an...

Why the s&p's intraday bounce matters: technician

The staged a midday rebound on Tuesday, saving the index from its fourth straight day of losses. And that bounce could b...

Javascript support required...

International art | The West Australian

Rebecca BrewinKalgoorlie Miner Magnificent landscapes by renowned artist Tom Hickman are now on display and available to...