Data integration and mining

Pre-computing of bioinformatics analysis data is done to have results readily available when needed. This concerns proteins investigated in the high-throughput genomics assays as well as clones used in cDNA microarray experiments. External publicly available databases provide additional data on the genes and proteins investigated. The integration these diverse data sources is a prerequisite for an efficient mining of information.

Computational analysis

  • cDNAs which enter the cloning pipeline automatically undergo a bioinformatic analysis of their encoded amino acid sequence (del Val et al., 2004, del Val et al., 2006). This comprises e. g. similarity searches against major protein databases, protein domain architecture determination and secondary structure predictions.
  • a system has been set to annotate ~48,000 image clones where most entries from Unigene are covered. The annotation includes e.g. the assignment of several Ids, Gene_symbols and names, GO annotation, genome position, etc.

External resources

Data from several sources are integrated and stored on a central database server, facilitating data mining and analysis. Green: experimental data, orange: data from bioinformatics protein analysis, yellow: data from external publicly available databases.
© dkfz.de

We load the content of several external databases using specialized applications and run updates when new datasets become available. The accessed external sources comprise:


  • Gene and cDNA information (NCBI UniGene and Entrez Gene)
  • Protein data ( Swiss-Prot, IPI)
  • Ontologies (GO, eVOC)
  • We currently develop a meta-database for protein-interactions which covers data from six major protein interaction databases

Computational and external data is stored in special databases on the central database server where it is integrated with functional assay data and selected results from the expression profiling experiments.

to top