Biostatistics

Our Research

The mission of the Division of Biostatistics is to support DKFZ scientists in performing and publishing excellent reproducible research. Biostatistics is an interdisciplinary science with the aim to provide efficient design of experiments and trials, and devise sound statistical analysis and interpretation of biomedical data. Adequate experimental design and analysis strategies are rarely available ‘off the shelf’ but must be developed and tailored to the specific problem in collaboration with the biomedical researcher. Therefore, the Division of Biostatistics can only provide state-of-the-art support if it actively performs methodological research and implements newly developed analysis strategies. As a consequence, it acts as a research division with a service function.

Our methodological research activities cover a wide range of biostatistical topics, often motivated and interlinked with long-standing collaborations within and outside the DKFZ, including a large number of clinical trials. The close collaboration with biomedical researchers and clinicians allows us to link statistical methodological research and clinical practice, thus contributing to the advancement of translational oncology and precision oncology. Major areas of current research interest include: design and analysis of clinical trials, both in the frequentist setting as well as in the Bayesian framework; identification of prognostic and particularly predictive factors from clinical and molecular data; optimal design and analysis for dose-response relationships, with a focus on combination of substances; measuring dependence between sets of random variables for various data types. We are keen on approaching novel methodological challenges, and indeed, in our collaborations with biomedical scientists, we address a variety of additional research topics. More detailed information about our research activities are given here.

The working group “Statistics in translational research” within the Division of Biostatistics supports clinical trial groups as biometric center and bridges research on molecular patient characteristics to new therapeutic options in oncology.

Biostatistical Service and Support

We provide statistical support for all scientific activities at the DKFZ, from in vitro and animal to human subject studies. Our support covers experimental design, sample size/power estimation, data analysis, software guidance, visualization and interpretation of statistical results, and preparation of results for publication. It ranges from brief statistical consultations to long-term collaborations and covers standard statistical analysis approaches as well as the development of complex statistical methods tailored to specific questions. We offer discussions on advantages and disadvantages of different statistical methods and guidance for the method of choice in specific cases.We provide assistance on statistical aspects and requirements of funding applications, ethical vote applications, clinical trial protocols and animal studies.

For standard experiments (no high-throughput measurements) recorded in spreadsheet files, samples/observations/replicates should be entered in rows, features/characteristics in columns. If multiple measurements per sample have been made (e.g. time series), each measurement should go into a separate row and an identifier variable for samples should be included. Column names should not contain any special signs. If measurements are coded, a legend must be provided. Dates should all be in the same format. If during the process of analysis your data must be updated or corrected, please provide an updated file without changing column names, formats etc. Information supplied by highlighting, coloring or any other type of formatting cannot be imported and used for the analysis.

The DKFZ provides SPSS SigmaPlot for standard analysis in a user-friendly environment. GraphPad Prism is another user-friendly statistical software frequently used at the DKFZ but without a campus-wide license. The Genomics and Proteomics Core Facility provides bioinformatics tools for conducting standard microarray/sequencing analysis, such as Chipster and IPA. Our division generally uses R/Bioconductor and SAS for power/sample- size estimations.

We consider reproducible research to be essential for scientific work. For this reason, we prepare our analysis in R/Bioconductor in combination with Sweave/Knitr in order to allow for reproducibility of results, figures and tables. If requested, we can also provide stand-alone analysis scripts that can be used to reproduce results and can be submitted along with your manuscript.

We encourage PhD candidates and their supervisors to contact us whenever they need statistical advice on their experimental design, the methods to use, the correct application of statistical software, or the proper interpretation of results. We normally expect PhD candidates to perform the statistical analyses for their theses themselves. Of course, in case of a more complex analysis requiring advanced statistical knowledge and/or software expertise we will provide the necessary support.

Please email the division of Biostatistics at biostatistics-consulting(at)dkfz.de and briefly describe your experiment/question and your aim.

Statistics Courses

The division of Biostatistics offers three consecutive statistics lecture series starting every summer semester.The aim of the courses is to enable the participants to perform simple analyses by themselves, to recognize when professional statistical advice is needed and to facilitate cooperation between researchers and the division of Biostatistics. The topics that are covered are chosen according to the needs of researchers at the DKFZ. For details about dates and location please visit the Training Portal (for DKFZ employees on the intranet), the Heidelberg University Lecture Index, or contact the division of Biostatistics.

Lecture series for researchers and PhD students in the biological or clinical sciences without prior knowledge in statistics.

Topics:

Descriptive statistics: plots, measures of location and spread
Confidence intervals
Statistical hypothesis testing, p-value, etc.
Statistical tests for quantitative data, e.g., t-test
Statistical tests for qualitative data, e.g., chi-square test
Correlation and regression
Study design

This lecture series accompanies "Basic Principles of Biostatistics" and shows how the methods introduced there are coded in R. Participants should have a working R installation on their computers. Also, participants should be familiar with the statistical concepts covered in the "Basic Principles in Biostatistics" lecture.

Team-taught lecture series by members of the division of Biostatistics for researchers and PhD students in the biological or clinical sciences with basic knowledge of statistics.

Topics:

Analysis of Variance
Non-parametric methods
Multiple linear regression
Logistic regression
Linear mixed models
Dose-response modeling
Diagnostic tests
Measuring agreement
Survival analysis: Kaplan-Meier curves, logrank tests, Cox PH regression
Variable selection in regression
Design of clinical trials
Multiple Testing
Introduction to Bayesian thinking

This lecture series accompanies "Advanced Topics in Biostatistics" and shows how the methods introduced there are coded in R. Participants should have some basic R programming skills, including the ability to use the basic statistical methods shown in the "Basic principles" course.
In addition to the courses organized by the division of Biostatistics, the Advanced Training department of the DKFZ also offers programming courses in R and SAS, and the Genomics and Proteomics Core Facility at DKFZ offers courses on specific data analysis tools for high-throughput genomics data. DKFZ employees please visit the Training Portal for further information.

Research Topics

The Division of Biostatistics currently focuses on several research topics:

This research area deals with innovative methods for clinical trial designs and evaluation strategies for clinical data. Motivated by our involvement in a multitude of clinical trials in all phases, we develop methods for design and analysis of clinical trials, both in the frequentist setting as well as in the Bayesian framework.

Further information

Software

Bayesian design for phase II trials
The WebApp BDP2 provides a workflow to determine design parameters for a multi-stage single-arm phase II trial with binary endpoint. Declaration of efficacy and futility is based on the Bayesian posterior distribution. It is based on the R-package BDP2 available from CRAN.
For details see:
Kopp‐Schneider, A., Wiesenfarth, M., Witt, R., Edelmann, D., Witt, O., & Abel, U. (2019). Monitoring futility and efficacy in phase II trials with Bayesian posterior distributions - A calibration approach. Biometrical Journal, 61(3), 488-502.

Sample size calculation for modifications of Simon's two-stage design
The R package hctrial can be used to calculate the sample size for modifications of Simon's two stage design allowing for stratification and incorporation of historical controls.
For details see:
Edelmann, D., Habermehl, C., Schlenk, R. F., & Benner, A. (2020). Adjusting Simon's optimal two‐stage design for heterogeneous populations based on stratification or using historical controls. Biometrical Journal, 62(2), 311-329.

Sample size determination for diagnostic studies
The WebApp SampleSizeDiagnosticTest can be used to estimate the sample size for a study where the aim is to test whether the performance of a diagnostic test is sufficient in terms of false positive (specificity) and true positive fraction (sensitivity).

Working group "Statistics for Translational Oncology"

A colorful, abstract representation of a bridge, featuring various horizontal segments in shades like blue, green, yellow, red, and pink, against a plain background. The image symbolizes connection and transition, reflecting themes related to biostatistics and research. — © dkfz.de

The working group contributes to bridging from research on molecular data to new therapeutic options for cancer patients ("The Bridge", painted by Deborah Kunz, 7 years)

One main focus is the exploitation of high-dimensional molecular data to improve the understanding of carcinogenesis and prediction of disease progression and treatment outcome. In the era of precision medicine, another area of focus is the search for prognostic biomarkers associated with disease progression and treatment outcome and for predictive genetic and genomic factors, i.e. the identification of biologically defined patient subgroups, who benefit from specific treatment or who are susceptible to serious adverse events due to their genomic profile. Another research topic is the development and validation of statistical methods for classification, prognosis and prediction using high-dimensional data. Further, we evolve data-driven model selection strategies in the framework of more complex multi-state models incorporating molecular data to capture pathogenic disease processes and underlying etiologies more precisely.

In addition to our methodological research, we also contribute to transferring research results from experimental and observational data into clinical practice. We collaborate on clinical trials and other forms of clinical research to convert the knowledge gained in the basic research into effective clinical applications.

For example, we support several trials of the NCT Precision Medicine in Oncology (PMO) program which has been established at the NCT Heidelberg. One example is the NCT-PMO-1602 phase II study CRAFT - Continuous ReAssessment With Flexible ExTension in Rare Malignancies.

We participate in the high-dimensional data topic group of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative [Sauerbrei et al. 2014]. The main goal of STRATOS is to provide guidance for the design and analysis of studies with observational data.

Furthermore, we are involved in the HARMONY alliance, a European public-private partnership in hematology including hospitals, research institutes, patient organizations, pharmaceutical and IT companies. The primary aim of the alliance is to use big data to improve outcomes for patients with blood cancers.

For a broader overview of the projects we are or have been involved in, have a look at some of our long-term collaborations.

Collaborations

We collaborate with many researchers within and outside of DKFZ. We provide support for experimental design and perform statistical analyses tailored to the specific scientific question. Examples of major collaborations are:

The German-speaking Myeloma-Multicenter Group (GMMG) conducts active research to improve treatment methods for multiple myeloma. Therein our long-term collaboration with the GMMG has witnessed treatment modifications that have been/will be implemented in the German health system. So, the current standard of care for patients with newly diagnosed multiple myeloma includes chemo-combination therapy followed by autologous stem cell transplantation. Based on the analysis of the GMMG-MM5 trial, it was shown that patients aged 65 to 70 years benefit from stem cell transplantation in the same way as the age group <= 65 without additional safety risks [Mai EK, Miah K et al. 2021]. As a consequence, cost absorption of ASCT is now admissible for multiple myeloma patients up to the age of 70 years by statutory health insurances (cf. https://gmmg.info/atp/). In addition, in part 1 of the randomized phase III study GMMG-HD7, it could be shown that the addition of a novel immunotherapy client significantly reduces the risk of detecting residual disease in the bone marrow which is a surrogate for prolongation of progression-free survival. Based on the results of this IIT trial, the monoclonal antibody will be sought for regulatory approval [Goldschmidt et al 2022]. Furthermore, the test for free light chains in the blood has so far been an easy-to-use diagnostic tool for predicting tumor activity. It has now been shown that the normalization of free light chains has a prognostic impact on progression-free survival, allowing an individualized therapy for this subgroup of responding patients. [Klein EM, Tichy D et al., 2021]

The German-Austrian AML Study Group (AMLSG) is one of the world's largest study groups for the research and treatment of AML, initiating a number of innovative national and interventional clinical trials and running the AMLSG BiO Registry Study with around 1,500 newly diagnosed AML patients being recruited annually. All patients included in the AMLSG BiO Registry Study agree to a systematic central biobanking and undergo in-depth molecular and genetic diagnostics which allow for prestigious translational research projects that are published in high-impact journals. Members of the working group have been responsible statisticians in the clinical trials since the study group was founded in 2003 and support many of the accompanying research projects.

In cooperation with the Section of Allogeneic Stem Cell Transplantation at Heidelberg University Hospital we investigate the usefulness of EASIX as prognostic and predictive biomarker for several diseases and endpoints. For instance, we illustrate the prognostic and predictive value of EASIX for time-to-sepsis, the effectiveness of statin-based prophylaxis for non-relapse mortality in different EASIX subgroups and the prognostic value of EASIX for severe complications after CAR-T cell therapy.

Goldschmidt et al. Addition of isatuximab to lenalidomide, bortezomib, and dexamethasone as induction therapy for newly diagnosed, transplantation-eligible patients with multiple myeloma (GMMG-HD7): part 1 of an open-label, multicentre, randomised, active-controlled, phase 3 trial. Lancet Hematology 9(11):e810-821 (2022). doi: 10.1016/S2352-3026(22)00263-0
Klein EM, Tichy D et al. Prognostic Impact of Serum Free Light Chain Ratio Normalization in Patients with Multiple Myeloma Treated within the GMMG-MM5 Trial. Cancers 13(9): 4856 (2021). DOI: 10.3390/cancers13194856
Mai EK, Miah K. et al. Bortezomib-based induction, high-dose melphalan and lenalidomide maintenance in myeloma up to 70 years of age. Leukemia 35(12): 3636 (2021). doi: 10.1038/s41375-021-01357-4.
Sauerbrei W, et al. STRengthening analytical thinking for observational studies: the STRATOS initiative. Stat Med. 33(30):5413-5432 (2014)

Goldschmidt et al. Addition of isatuximab to lenalidomide, bortezomib, and dexamethasone as induction therapy for newly diagnosed, transplantation-eligible patients with multiple myeloma (GMMG-HD7): part 1 of an open-label, multicentre, randomised, active-controlled, phase 3 trial. Lancet Hematology 9(11):e810-821 (2022). doi: 10.1016/S2352-3026(22)00263-0
Klein EM, Tichy D et al. Prognostic Impact of Serum Free Light Chain Ratio Normalization in Patients with Multiple Myeloma Treated within the GMMG-MM5 Trial. Cancers 13(9): 4856 (2021). DOI: 10.3390/cancers13194856
Mai EK, Miah K. et al. Bortezomib-based induction, high-dose melphalan and lenalidomide maintenance in myeloma up to 70 years of age. Leukemia 35(12): 3636 (2021). doi: 10.1038/s41375-021-01357-4.
Sauerbrei W, et al. STRengthening analytical thinking for observational studies: the STRATOS initiative. Stat Med. 33(30):5413-5432 (2014)