Biostatistics

Our Research

The mission of the Division of Biostatistics is to support DKFZ scientists in performing and publishing excellent reproducible research. Biostatistics is an interdisciplinary science with the aim to provide efficient design of experiments and trials, and devise sound statistical analysis and interpretation of biomedical data. Adequate experimental design and analysis strategies are rarely available ‘off the shelf’ but must be developed and tailored to the specific problem in collaboration with the biomedical researcher. Therefore, the Division of Biostatistics can only provide state-of-the-art support if it actively performs methodological research and implements newly developed analysis strategies. As a consequence, it acts as a research division with a service function.

Our methodological research activities cover a wide range of biostatistical topics, often motivated and interlinked with long-standing collaborations within and outside the DKFZ, including a large number of clinical trials. The close collaboration with biomedical researchers and clinicians allows us to link statistical methodological research and clinical practice, thus contributing to the advancement of translational oncology and precision oncology. Major areas of current research interest include: design and analysis of clinical trials, both in the frequentist setting as well as in the Bayesian framework; identification of prognostic and particularly predictive factors from clinical and molecular data; optimal design and analysis for dose-response relationships, with a focus on combination of substances; measuring dependence between sets of random variables for various data types. We are keen on approaching novel methodological challenges, and indeed, in our collaborations with biomedical scientists, we address a variety of additional research topics. More detailed information about our research activities are given here.

Biostatistical Service and Support

We provide statistical support for all scientific activities at the DKFZ, from in vitro and animal to human subject studies. Our support covers experimental design, sample size/power estimation, data analysis, software guidance, visualization and interpretation of statistical results, and preparation of results for publication. It ranges from brief statistical consultations to long-term collaborations and covers standard statistical analysis approaches as well as the development of complex statistical methods tailored to specific questions. We offer discussions on advantages and disadvantages of different statistical methods and guidance for the method of choice in specific cases.We provide assistance on statistical aspects and requirements of funding applications, ethical vote applications, clinical trial protocols and animal studies.

For standard experiments (no high-throughput measurements) recorded in spreadsheet files, samples/observations/replicates should be entered in rows, features/characteristics in columns. If multiple measurements per sample have been made (e.g. time series), each measurement should go into a separate row and an identifier variable for samples should be included. Column names should not contain any special signs. If measurements are coded, a legend must be provided. Dates should all be in the same format. If during the process of analysis your data must be updated or corrected, please provide an updated file without changing column names, formats etc. Information supplied by highlighting, coloring or any other type of formatting cannot be imported and used for the analysis.

The DKFZ provides SPSS SigmaPlot for standard analysis in a user-friendly environment. GraphPad Prism is another user-friendly statistical software frequently used at the DKFZ but without a campus-wide license. The Genomics and Proteomics Core Facility provides bioinformatics tools for conducting standard microarray/sequencing analysis, such as Chipster and IPA. Our division generally uses R/Bioconductor and SAS for power/sample- size estimations.

We consider reproducible research to be essential for scientific work. For this reason, we prepare our analysis in R/Bioconductor in combination with Sweave/Knitr in order to allow for reproducibility of results, figures and tables. If requested, we can also provide stand-alone analysis scripts that can be used to reproduce results and can be submitted along with your manuscript.

We encourage PhD candidates and their supervisors to contact us whenever they need statistical advice on their experimental design, the methods to use, the correct application of statistical software, or the proper interpretation of results. We normally expect PhD candidates to perform the statistical analyses for their theses themselves. Of course, in case of a more complex analysis requiring advanced statistical knowledge and/or software expertise we will provide the necessary support.

Please email the division of Biostatistics at biostatistics-consulting(at)dkfz.de and briefly describe your experiment/question and your aim.

Statistics Courses

The division of Biostatistics offers three consecutive statistics lecture series starting every summer semester.The aim of the courses is to enable the participants to perform simple analyses by themselves, to recognize when professional statistical advice is needed and to facilitate cooperation between researchers and the division of Biostatistics. The topics that are covered are chosen according to the needs of researchers at the DKFZ. For details about dates and location please visit the Training Portal (for DKFZ employees on the intranet), the Heidelberg University Lecture Index, or contact the division of Biostatistics.

Lecture series for researchers and PhD students in the biological or clinical sciences without prior knowledge in statistics.

Topics:

Descriptive statistics: plots, measures of location and spread
Confidence intervals
Statistical hypothesis testing, p-value, etc.
Statistical tests for quantitative data, e.g., t-test
Statistical tests for qualitative data, e.g., chi-square test
Correlation and regression
Study design

This lecture series accompanies "Basic Principles of Biostatistics" and shows how the methods introduced there are coded in R. Participants should have a working R installation on their computers. Also, participants should be familiar with the statistical concepts covered in the "Basic Principles in Biostatistics" lecture.

Team-taught lecture series by members of the division of Biostatistics for researchers and PhD students in the biological or clinical sciences with basic knowledge of statistics.

Topics:

Analysis of Variance
Non-parametric methods
Multiple linear regression
Logistic regression
Linear mixed models
Dose-response modeling
Diagnostic tests
Measuring agreement
Survival analysis: Kaplan-Meier curves, logrank tests, Cox PH regression
Variable selection in regression
Design of clinical trials
Multiple Testing
Introduction to Bayesian thinking

This lecture series accompanies "Advanced Topics in Biostatistics" and shows how the methods introduced there are coded in R. Participants should have some basic R programming skills, including the ability to use the basic statistical methods shown in the "Basic principles" course.
In addition to the courses organized by the division of Biostatistics, the Advanced Training department of the DKFZ also offers programming courses in R and SAS, and the Genomics and Proteomics Core Facility at DKFZ offers courses on specific data analysis tools for high-throughput genomics data. DKFZ employees please visit the Training Portal for further information.

Research Topics

The Division of Biostatistics currently focuses on several research topics:

This research area deals with innovative methods for clinical trial designs and evaluation strategies for clinical data. Motivated by our involvement in a multitude of clinical trials in all phases, we develop methods for design and analysis of clinical trials, both in the frequentist setting as well as in the Bayesian framework.

Further information

Software

Bayesian design for phase II trials
The WebApp BDP2 provides a workflow to determine design parameters for a multi-stage single-arm phase II trial with binary endpoint. Declaration of efficacy and futility is based on the Bayesian posterior distribution. It is based on the R-package BDP2 available from CRAN.
For details see:
Kopp‐Schneider, A., Wiesenfarth, M., Witt, R., Edelmann, D., Witt, O., & Abel, U. (2019). Monitoring futility and efficacy in phase II trials with Bayesian posterior distributions - A calibration approach. Biometrical Journal, 61(3), 488-502.

Sample size calculation for modifications of Simon's two-stage design
The R package hctrial can be used to calculate the sample size for modifications of Simon's two stage design allowing for stratification and incorporation of historical controls.
For details see:
Edelmann, D., Habermehl, C., Schlenk, R. F., & Benner, A. (2020). Adjusting Simon's optimal two‐stage design for heterogeneous populations based on stratification or using historical controls. Biometrical Journal, 62(2), 311-329.