GOSimby Holger Fröhlich
h.froehlich@dkfz-heidelberg.de

Information Theoretic GO Similarities Between Terms and Gene Products


Description:

Background: With the increased availability of high throughput data, such as DNA microarray data, researchers are capable of producing large amounts of biological data. During the analysis of such data often there is the need to further explore the similarity of genes not only with respect to their expression, but also with respect to their functional annotation which can be obtained from Gene Ontology (GO).
GOSim Package: We present the freely available software package GOSim, which allows to calculate the functional similarity of genes based on various information theoretic similarity concepts for GO terms. GOSim extends existing tools by providing additional lately developed functional similarity measures for genes. These can e.g. be used to cluster genes according to their biological function. Vice versa, they can also be used to evaluate the homogeneity of a given grouping of genes with respect to their GO annotation. GOSim hence provides the researcher with a flexible and powerful tool to combine knowledge stored in GO with experimental data..
Since version 1.1 GOSim additionally offers the possibility of a GO enrichment analysis using the topGO package. Moreover, since version 1.1.5.0 GOSim offers some lately developed diffusion kernel techniques to compute similarities between GO terms (see references in the vignette). Hence, GOSim acts now as an umbrella for different analysis methods employing the GO structure.
Implementation: GOSim is implemented as a package for the statistical computing environment R and is distributed under GPL within the CRAN project. It includes documentation and examples on how to use the package.

Download R package version 1.1.5.4 (tar.gz for Linux users)
Download R package version 1.1.5.4 (zip for Windows users)
Download R package version 1.0.2 (tgz for Mac users)

Documentation only (PDF)


In case of problems contact: h.froehlich@dkfz-heidelberg.de.


References (please cite):

H. Froehlich, N. Speer, A. Poustka, T. Beissbarth, GOSim - An R-Package for Computation of Information Theoretic GO Similarities Between Terms and Gene Products, BMC Bioinformatics, 8:166, 2007.

Holger Fröhlich, Kernel Methods in Chemo- and Bioinformatics, PhD Thesis, Logos Verlag, 2006.

H. Froehlich, N. Speer, C. Spieth, A. Zell, Kernel Based Functional Gene Grouping. In: Proc. Int. Joint Conf. Neural Networks (IJCNN), 6886 - 6891, 2006.

N. Speer, H. Froehlich, C. Spieth, A. Zell, Functional Grouping of Genes Using Spectral Clustering and Gene Ontology. In: Proc. Int. Joint Conf. Neural Networks (IJCNN), 298 - 303, 2005.


FAQs
  • Q: How do I switch to another organism?
    A: Simly use setEvidenceLevel(organism=myorganism).

  • Q: Why is the dimension of the similarity matrix sometimes less than the length of my list of genes?
    A: Only genes, which can be mapped to the currently set ontology are used.

  • Q: How can I obtain the corrected maximum pairwise gene similarity measure introduced by Couto et al. in the paper Implementation of a Functional Semantic Similarity Measure between Gene-Products, which is used in FuSSiMeg?

    A: Simply select term similarity "CoutoEnriched" and use gene similarity "max":
    Example:
    > getGeneSim(c("850457","856356"),similarity="max",similarityTerm="CoutoEnriched",verbose=FALSE)
    1.0000000 0.6089551
    0.6089551 1.0000000


    Please be aware that you may want to change the enrichment factors prior to the calculation (function setEnrichmentFactors).

  • Q: How do I pre-calculate IC-tables for certain evidence levels and use them afterwards?

    A: Example:
    >setEvidenceLevel("IMP", organism="mouse") # set evidence level IMP for mouse
    >setOntology("CC", loadIC=FALSE) # This tells GOSim that it should not load the IC file; necessary to prevent an error.
    >calcICs() # You have to move the IC file to the data-directory of GOSim after calling this.
    >setwd("~/GOSim") # switch to GOSim directory
    >setOntology("CC") # load the IC file into GOSim

Known bugs/problems in old versions:
  • Version 1.1.4.1 had a bug in the calculation of IC values. Please switch to version 1.1.4.2!
  • The man page for function getGeneSim does not listen the option "mean" for parameter "similarity", which allows to compute the average GO term similarity.
  • Some people have reported the error NA/NaN/Inf in foreign function call (arg 1) when calling function getGeneSim with option similarity="OA". It seems that this error occurs when there are genes without annotation. To circumvent this problem, one should remove all genes without GO annotation prior to similarity analyisis. Later versions of GOSim will explicitly catch this situation and throw a warning instead.
  • The same reason can also cause the error Error in if (term1 == term2):argument is of length zero.
  • There can be a warning like this: In gomap[[as.character(geneList[i])]] : partial match of '36506' to '365063'. It indicates that actually gene 36506 is NOT in the database, but 365063 is. However, because of the partial match R assumes that 365063 could be taken as well, which is obviously wrong. In other words, the result of the similarity computation is wrong. Since version 1.1.3.2 this problem should not occur any more.

Holger Fröhlich's homepage