Bayesian modeling

Team: Axel Benner, Annette Kopp-Schneider, Maral Saadati, Manuel Wiesenfarth  (former members Ana Corberan-Vallet, Manuela Zucknick)

Bayesian methods combine the likelihood of the observed data with prior beliefs about parameters to obtain posterior inferences. This allows us to formalize learning from external knowledge and from previous experiments by explicitly including this information in the model via prior distributions. While methods are often quite computer intensive, modern use of high-performance parallelized computing makes the estimation of very flexible and complex models feasible. 
Examples of our areas of interest include Bayesian survival models, spatiotemporal models, methods for integration of multiple sources of molecular data and Bayesian adaptive designs for clinical trials in personalized medicine. 

 

Bayesian variable selection for integrative genomics

Risk prediction based on clinical and molecular information is fundamental in cancer research. In the context of oncological clinical trials, researchers routinely collect genome-wide data from patients for multiple molecular data types. The combined analysis of these data can lead to new insights into the disease biology and improve the performance of prediction models. We propose a Bayesian variable selection model setup for the integration of copy number variation (CNV) information into a gene expression-based logistic regression model. The setup reflects the assumption that genes which show consistent copy number differences between classes are more likely to also show differential gene expression. CNV information is used to weigh prior inclusion probabilities of gene expression variables, giving larger weights to genes located in CNV regions associated with the clinical endpoint. The model is fully probabilistic and all variability observed in the CNV data is propagated through the model, thus avoiding under-estimation of the variance of the predictor. In simulations and in applications in cancer, improved model performance is demonstrated when using CNV data compared to models based on gene expression data alone, in terms of prediction and selection of relevant variables.

© dkfz.de

Bayesian models for the development and evaluation of spatiotemporal predictive health surveillance

© dkfz.de

On the past decade there has been a growing interest in the development of statistical methodology for early detection of disease outbreaks, which is fundamental to implement timely public health response. Unlike testing methods, modeling for spatiotemporal disease surveillance is relatively recent. Yet, statistical models allow covariate effects to be estimated and provide a better insight into etiology, spread, prediction and control of diseases. In the first project, Bayesian hierarchical Poisson models have been used to describe the space-time behavior of disease under ‘normal‘ conditions. We introduced then the surveillance conditional predictive ordinate as a general Bayesian model-based surveillance technique that allows us to detect small areas of increased disease incidence when spatial data are available. In the second project, we have extended this prospective surveillance technique to the multivariate case, since surveillance systems are often focused on more than one disease within a predefined study region. The use of multivariate surveillance techniques integrating information from multiple diseases allows us to improve the sensitivity and timeliness of outbreak detection when changes in disease incidence happen simultaneously for two or more diseases.

Adaptive trial designs for personalized medicine

Bayesian methods can be useful for adaptive clinical trial designs with interim stopping for futility and efficacy. Thereby, interim monitoring can be based on Bayesian posterior probabilities (the probability that a treatment response is larger than some threshold given the data so far) or Bayesian predictive probabilities (the probability of obtaining statistical significance at the (future) target sample size given the data at the interim analysis). In contrast to frequentist methods, Bayesian methods allow continuous monitoring of results since properties are not affected by the number and timing of interim analyses. Further, experience from earlier studies and external data can be formally incorporated. In personalized medicine with many small treatment arms, hierarchical borrowing across subgroups can improve efficiency. Yet, the appropriate elicitation of prior distributions is essential in order to obtain robust results in all cases.

Software

  • BVS
    MATLAB toolbox for binary Bayesian variable selection (logistic and probit regression) suitable for high-dimensional data, including block MCMC samplers and a parallel tempering algorithm to speed up the sampling process.(http://www.bgx.org.uk/software.html)
  • BVSflex
    This R package implements efficient integrative Bayesian variable selection models for high dimensional input data from multiple sources. (http://bvsflex.r-forge.r-project.org)

Selected publications

 

  • Corberán-Vallet A and Lawson AB (2011). Conditional predictive inference for online surveillance of spatial disease incidence. Statistics in Medicine, 30(26):3095-3116.
  • Corberán-Vallet A (2012). Prospective surveillance of multivariate spatial disease data. Statistical Methods in Medical Research, 21(5):457-77.
  • Zucknick M, Richardson S (2014). MCMC algorithms for Bayesian variable selection in the logistic regression model for large-scale genomic applications. Technical Report. http://arxiv.org/abs/1402.2713
  • Zucknick M, Saadati M, Benner A (2015). Nonidentical twins: Comparison of frequentist and Bayesian lasso for Cox models. Biometrical Journal, 57(6):959-981. doi: 10.1002/bimj.201400160  

to top