SDMinP FAQs
Frequently asked questions
Which platform is best running SDMinP?
SDMinP is designed to run on any python supporting platform, i.e. Windows (XP, 2000), Unix, Linux or Macintosh. No specific platform dependent functionalities were used. We tested the program on a Windows XP and a Unix system.
Problems with Unix-Windows file format?
We did not have any problems with Unix or Windows file formats.
How do I format my input files?
You can use scripting or programming languages, such as Perl, Java or Python, to create the input file. In the step-by-step example we show how to create the input file using the statistic program R (link see below).
How can I visualize the results?
To visualize the results the statistic program R (link see below) can
be used. SDMinP provides the possibility to generate an R-readable result file, which can be imported into R by using the command ’my.results <- read.table(”%Path to R-readable resultfile%”)’ in the R console. See section 11.3 in the Documentation for a routine to plot the results.
Which version of Python supports SDMinP?
SDMinP was developed in Python 2.3.5. You can download it from the python-homepage (link see below). Python is also a standard program in Linux and Unix environments.
What is the difference between the formulas for the raw p-value calculation?
There are two formulas used for calculating the empirical p-value of the test statistics. One is provided by Ge et al. [2003], which is used by SDMinP to calculate the raw p-values of the permutation test statistics, when the raw p-value of the observed test statistic is provided in the input file (data input format I). The other formula is provided by Becker and Knapp [2004] and is used for calculating the raw p-value of the observed and permutation test statistics, when the data file does not contain the raw p-value of the observed test statistic (data input format II). For more detailed information about this subject refer to Becker and Knapp [2004].
Is the program limited in the number of hypotheses or in the number of permutations?
The program is not limited in the number of hypotheses or the number of permutations. A limit is the maximum size of the input file, which is determined by the operating system. All operating systems should be able to handle files of 2GB. If you want to process larger files, see the documentation of your system.
Why should I use split files?
Split files make the computation faster, as they contain the content of the input data file, but split into smaller portions. The program does not have not to parse the whole original data file from top to the respective position. The program jumps instead to the shorter split file, containing the respective information. You can always use the advantage of split files.
Which maximum line number is the best for split files?
Generally, decreasing the split file size speeds up the program retrieval of specific hypotheses. We recommended to set the number of maximum lines to a value between 10 and 100.
Further Help
References
Becker,T.,Knapp,M. (2004) A Powerful Strategy to Account for Multiple Testing in the Context of Haplotype Analysis, Am. J. Hum. Genet., 75, 561-570.
Ge,Y.,Dudoit,S.,Speed,T.P. (2003) Resampling-based Multiple Testing for Microarray Data Analysis, Test, 12, 1-77
R Development Core Team (2004). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org.
Westfall,P.H.,Young,S.S. (1993) Resampling-based multiple testing: examples and methods for P-value adjustment John Wiley & Sons, New York