Computer Assisted Medical Interventions

Good scientific practice initiative: Critical review of current practices related to grand challenges

< Back to Research overview

The importance of data science techniques in almost all fields of medicine is increasing at an enormous pace. This holds particularly true for the fields of radiology and image-guided interventions where the automatic analysis of medical images (e.g. for tumor detection, classification, staging and progression modeling) plays a crucial role. While clinical trials are the state-of-the-art methods to assess the effect of new medication, in a comparative manner, benchmarking in the field of image analysis is governed by so-called challenges. Challenges are international competitions, hosted by individual researchers, institutes, or societies, for example, that aim to assess the performance of competing algorithms on identical datasets for benchmarking. They are often published in prestigious journals, such as Nature Methods, and receive a huge amount of attention with hundreds of citations and thousands of views. Moreover in platforms like Kaggle awarding the winner with a significant amount of prize money (up to €1 Mio) is becoming increasingly common.

Given that validation of algorithms has traditionally been performed on the individual researchers’ data sets, this development was a great step forward. On the other hand, the increasing scientific impact of challenges now puts huge responsibility on the shoulders of the challenge hosts that take care of the organization and design of such competitions. The performance of an algorithm on challenge data is essential, not only for the acceptance of a paper and its impact on the community, but also for the individuals’ scientific careers (e.g. due to awards, paper (non-)acceptance, performances of their algorithms), and the potential that algorithms can be translated into clinical practice.

In the scope of our research, we developed the hypothesis that there is a huge discrepancy between the importance of biomedical image analysis challenges and their quality (control). In response to this, we formed an international multidisciplinary initiative with partners from about 30 institutes worldwide to bring international challenges to the next level. In an article in Nature Communications (Maier-Hein et al., 2018), we present the first comprehensive evaluation of biomedical image analysis challenges. Our analysis of more than 500 sub-competitions (tasks) demonstrates the high importance of challenges in the field of biomedical image analysis, but also reveals major issues:

  1. Challenge reporting: Common practice related to challenge reporting is poor and does not allow for adequate interpretation and reproducibility of results.
  2. Challenge design: Challenge design is very heterogeneous and lacks common standards, although these are requested by the community.
  3. Robustness of rankings: Rankings are sensitive to a range of challenge design parameters such as the metric variant applied, the type of test case aggregation performed and the observer annotating the data (see figure below). The choice of metric and aggregation scheme has a significant influence on the ranking’s stability.
  4. Exploitation of common practice: Security holes in challenge design can potentially be exploited by both challenge organizers and participants to tune rankings (e.g. by selective test case submission (participants) or retrospective tuning of the ranking scheme (organizers)).
  5. Best practice recommendations: Based on the findings of our analysis and an international survey, we present a list of best practice recommendations and open research challenges.

Effect of different ranking schemes (RS) applied to one example MICCAI 2015 segmentation task. Design choices are indicated in the header: RS xy defines the different ranking schemes. The following three rows indicate the used metric (Dice similarity coefficient (DSC), Hausdorff distance (HD) or the 95% variant of the HD (HD95)), the aggregation method (metric-based (aggregate, then rank) or case-based (rank, then aggregate)) and the aggregation operator (mean or median). RS 00 (single-metric ranking with DSC; aggregate with mean, then rank) is considered as the default ranking scheme. For each RS, the resulting ranking is shown for algorithms A1 to A13. To illustrate the effect of different RS on single algorithms, A1, A6 and A11 are highlighted.

To address the discrepancy between the impact of challenges and the quality (control), the Biomedical Image Analysis ChallengeS (BIAS) initiative developed a set of recommendations for the reporting of challenges. The BIAS statement aims to improve the transparency of the reporting of a biomedical image analysis challenge regardless of field of application, image modality or task category assessed. A first step to achieve this goal was the submission of a guideline paper on how to report biomedical challenges which is currently under review. The document further includes checklists for the challenge organizers and journal reviewers with all relevant items to be reported. The guideline itself was registered at the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) network:
Development of the guideline was initiated by the MICCAI board challenge working group (, lead by Prof. Dr. Lena Maier-Hein. For further information, please refer to Biomedical Image Analysis ChallengeS (BIAS) Initiative.


  • Keno März (Developer)
  • Patrick Scholz (Graduate assistant)
  • Marko Stankovic (Master's student)
  • Sebastian Pirmann (Graduate assistant)

Key collaborators


  • Maier-Hein, L., Reinke, A., Kozubek, M., Martel, A.L., Arbel, T, Eisenmann, M., Hanbury, A., Jannin, P., Müller, H., Onogur, S., Saez-Rodriguez, J,.van Ginneken, B., Kopp-Schneider, A., Landman, B.A. BIAS: Transparent reporting of biomedical image analysis challenges. Medical Image Analysis (Currently in Review).
  • Wiesenfarth, M., Reinke, A., Landman, B.A., Maier-Hein-L., Kopp-Schneider, A. Methods and open-source toolkit for analyzing and visualizing challenge results. Medical Image Analysis (Currently in Review).
  • Maier-Hein, L., Eisenmann, M., Reinke, A., Onogur, S., Stankovic, Scholz, P., Arbel, T., Bogunovic, H., Bradley, A.P., Carass, A., Feldmann, C., Frangi, A.F., Full, P.M., van Ginneken, B., Hanbury, A., Honauer, K., Kozubek, M., Landman, B.A., März, K., Maier, O., Maier-Hein, K., Menze, B.H., Müller, H., Neher, P.F., Niessen, W., Rajpoot, N., Sharp, G.C., Sirinukunwattana, K., Speidel, S., Stock, C., Stoyanov, D., Taha, A.A., van der Sommen, F., Wang, C.-W., Weber, M.-A., Zheng, G., Jannin, P., Kopp-Schneider, A. Why rankings of biomedical image analysis competitions should be interpreted with care. Nature Communications (2018),
  • Reinke, A., Eisenmann, M., Onogur, S., Stankovic, M., Scholz, P., Full, P.M., Bogunovic, H., Landman, B.A., Maier, O., Menze, B., Sharp, G.C., Sirinukunwattana, K., Speidel, S., van der Sommen, F., Zheng, G., Müller, H., Kozubek, M., Arbel, T., Bradley, A.P., Jannin, P., Kopp-Schneider, A., Maier-Hein, L. How to exploit weaknesses in biomedical challenge design and organization. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (2018),

Keynote and Invited Talks

03/2020: European Congress of Radiology (ECR) - Session: Artificial intelligence and translations to clinical practice, Title: “Challenges to objectively compare performance of AI applications”

02:2020: Workshop on data curation, standardisation and algorithm benchmarking, Title: Biomedical image analysis challenges: Statistics, new developments and surprising insights

10/2019: MICCAI - Workshop on Large-Scale Annotation of Biomedical data and Expert Label Synthesis

09/2018: MICCAI - Main conference oral presentation, Title: “How to Exploit Weaknesses in Biomedical Challenge Design and Organization”

09/2017: MICCAI - Tutorial designing benchmarks and challenges for measuring algorithm performance in biomedical image analysis, Title: “Review of Biomedical Image Analysis Challenges”

09/2017: MICCAI - Endoscopic vision challenge, Title: "10 years of MICCAI grand-challenges - successes and failures"


< Back to Research overview

to top