ORFeome, RNAi Tools & more

The international ORFeome Collaboration

The long-term goal of the human genome project is to establish a comprehensive gene catalogue that contains all human genes as well as physical clones for every single gene, and the functional analysis of these genes and gene products. The International ORFeome Collaboration joins scientists from around the world who aim to generate and make widely accessible a comprehensive resource of cloned ORFs that shall cover the entire protein-coding part of the genome/transcriptome. The division Molecular Genome Analysis has been contributor to this project and has been deeply involved its further development (Wiemann et al., 2016).

The German cDNA Consortium

Coverage of genes in the International ORFeome Collaboration resource (part of the human Xq28 genomic region is shown). RefSeq genes are indicated in blue, clones of the International ORFeome Collaboration resource are represented in green. Individual clones are named with their respective accession numbers. "AM... and EU..." indicate ORF-clones that have been generated in the Division of Molecular Genome Analysis (image copied from the UCSC genome browser).
© dkfz.de

Stefan Wiemann (DKFZ),  H. Blöcker (GBF Braunschweig); A. Bahr (Qiagen GmbH Hilden); K. Köhrer (BMFZ Düsseldorf); W. Ansorge (EMBL Heidelberg); H.W. Mewes (GSF München); B. Ottenwälder (Medigenomix GmbH München); D. Heubner (AGOWA GmbH Berlin)

The German cDNA Consortium (see below) was formed in 1996 as the world's second large-scale cDNA analysis project, and aimed at systematically generating (Wellenreuther 2004), sequencing and annotating full-length cDNAs of human genes. The resources produced in this project have been our contribution to the ORFeome Collaboration and have substantially helped to reach comprehensive coverage of the human genes in that resource.

While gene identification was initially through EST-sequencing of cDNA libraries (we generated ESTs from ~250,000 cDNAs, and completely sequenced 15,000 full-length cDNAs), we later shifted towards the directed modelling of gene structures and the cloning of respective ORFs. The project was part of the National Genome Research Network (NGFN). We have adapted and routinely apply the Gateway cloning system (Invitrogen) for the cloning of protein coding regions (ORFs) (Simpson 2000, Bechtel 2007). This system is based on recombination and allows for the base specific and directional cloning of DNA. With a high level of automation we amplify the ORFs in a 2-step PCR process, adding 5' and 3' flanking Gateway compatible sites. Universal `entry clones´ are generated via recombination, they are compatible with any Gateway expression vector. The entry clones are completely sequence verified as an important step in the quality control process. Inserts of verified entry clones are recombined into a range of expression vectors. These constructs are utilised in functional profiling to determine the sub-cellular localisation of encoded proteins (-> LIFEdb) and to identify cellular effects of protein overexpression in cell-based functional assays (Wiemann 2004, Starkuviene 2004, Arlt 2005, Laketa 2007, Sauermann 2007). Based on these screens, a number of new functions and disease associations could be associated with respective hits (Neubrand 2005, Fleischer 2006, Sauermann 2008). Furthermore, the clone resource has been exploited in a number of collaborative projects (e.g., Will 2010, Bai 2011, Lisauskas 2012, Simpson 2012, Walde 2012)

The sequence resources and expertise of the German cDNA Consortium have been basis of the systematic functional annotation of the human transcriptome, which has been carried out by the international H-invitational consortium (Imanishi 2004, Yamasaki 2008). The complete annotation is publicly available in the H-invitational database (http://www.h-invitational.jp/).

The 3of5 web application for complex and comprehensive pattern matching in protein sequences

The identification of patterns in biological sequences is a key challenge in genome analysis and in proteomics. Frequently such patterns are complex and highly variable, especially in protein sequences. They are frequently described using terms of regular expressions (RegEx) because of the user-friendly terminology. Limitations arise for queries with the increasing complexity of patterns and are accompanied by requirements for enhanced capabilities. This is especially true for patterns containing ambiguous characters and positions and/or length ambiguities.

We have implemented the 3of5 web application in order to enable complex pattern matching in protein sequences. 3of5 is named after a special use of its main feature, the novel n-of-m pattern type. This feature allows for an extensive specification of variable patterns where the individual elements may vary in their position, order, and content within a defined stretch of sequence. The number of distinct elements can be constrained by operators, and individual characters may be excluded. The n-of-m pattern type can be combined with common regular expression terms and thus also allows for a comprehensive description of complex patterns. 3of5 increases the fidelity of pattern matching and finds ALL possible solutions in protein sequences in cases of length-ambiguous patterns instead of simply reporting the longest or shortest hits. Grouping and combined search for patterns provides a hierarchical arrangement of larger patterns sets. The algorithm is implemented as internet application and freely accessible. The application is available at http://dkfz.de/mga2/3of5/3of5.html.

The 3of5 application offers an extended vocabulary for the definition of search patterns and thus allows the user to comprehensively specify and identify peptide patterns with variable elements. The n-of-m pattern type offers an improved accuracy for pattern matching in combination with the ability to find all solutions, without compromising the user friendliness of regular expression terms (Seiler 2006).

RNAi resources

Transient cell-growth modulators identified by the RTCA screen and the interaction network discovered among hits (from Zhang 2011)
© dkfz.de

RNA interference (RNAi) is a common method for loss of function studies. Its mechanism is based on abrogating the message of a target gene by annealing of a specific short RNA (siRNA, shRNA), which induces degradation of the respective mRNA. To this end we have a whole-genome siRNA library available (Dharmacon Thermo-Fisher) and are members of the RNAi Global consortium. We apply RNAi to correlate the effects of protein down regulation in phenotypic cellular assays (e.g., Zhang 2011). Furthermore, we have performed genome-wide miRNA screens for modulators of EGFR signaling in cell cycle control (Uhlmann 2012) as well as for regulators of NF-KB signaling (Keklikoglou 2012).

to top