Automated localization of proteins encoded by full-length cDNAs

A collaboration with Rainer Pepperkok (EMBL Heidelberg)

Among the greatest challenges facing biology today is the exploitation of huge amounts of genomic data, and their conversion into functional information about the proteins encoded. The ORFeome Collaboration is providing high numbers of full-length cDNAs encoding novel proteins of completely unknown function. These cDNAs are utilized to generate an ORF resource that makes the proteins amenable to functional analysis.

Image courtesy of Rainer Pepperkok (EMBL)

As a first step towards their characterization we tag these with the green fluorescent protein (GFP), and examine the subcellular localizations of these fusion proteins in living cells (Simpson 2000, Simpson 2001, Pepperkok 2001). These data allow us to classify the proteins into subcellular groups, which determines the next step towards a detailed functional characterization (e.g., Simpson 2012).

We have thus far localized over 1,000 different proteins in living cells, employing 4,000 different expression constructs. The localization data in many cases are the first functional information for these proteins, helping to identify targets to be funnelled into functional cell-based assays and into proteomics' projects of the division. Localization data are made public through the LIFEdb database (Bannasch 2004, Mehrle 2006).

