Data Jamboree - Machine Learning

Machine learning is a method that generates artificial knowledge based on experience. In doing so, IT systems recognize patterns and principles in datasets and use them to develop predictions for data never seen. Machine learning is already used in many areas of daily life, such as internet search engines and speech recognition, and is a central tool for health care with its large amounts of data. The latter is becoming more and more important.

During clinical routine data is collected for each patient such as a description of the course of disease, X-ray images as well as examination results, which are discussed by a panel of experts from several disciplines - the so-called tumor board - to find a suitable therapy. The participation of physicians from different areas of expertise guarantees a high quality of the subsequent treatment. If it were possible to predict a promising therapy from patient data alone by the means of machine learning, this would not only save physicians a lot of time, but would also be a great support of the decision making process for a particular treatment method.

This is the vision Sophia Stahl-Toyota and her colleagues is pursuing. "We at DKFZ have the opportunity to work with a large number of real patient data. However, the analysis potential is far from being exhausted, "says the scientist. As a result, Stahl-Toyotas group has teamed up with a DAX-listed company in the region that has great expertise in machine learning but lacks data to apply their knowledge.

In May 2019, both areas, machine learning and patient data, were brought together at the DKFZ in order to try to predict the therapeutic recommendations of the Heidelberg molecular tumor board by means of machine learning. At this event, the so-called Data Jamboree, a team of clinicians, bioinformaticians and researchers from the DKFZ, as well as machine-learning experts, met for a three-day workshop. Due to the invitation of the experts, the data never left the premises of the DKFZ. Likewise, the data centers of the DKFZ carried out the entire data processing. "This way, we were able to ensure the difficult data protection regulations for these sensitive data and were still able to work with external, subject-specific expertson our common project," reported Stahl-Toyota, who led the workshop and prepared it together with Ilona Binenbaum and Analie Pascoe-Perez. She says that a first meeting was already held in March to discuss possible use cases and the content of the data. In the following weeks, the DKFZ produced short anonymous sample files. These were sent to the cooperation partners and formed the basis for discussion during weekly telephone conferences. In return, the employees of the DAX-listed company provided a list of requirements and a script so that the hardware and software could be prepared for the workshop. For the Jamboree itself, the MITRO team provided the real data of about 1000 MASTER patients with the help of bioinformaticians and clinicians. In the MASTER (Molecularly Aided Stratification for Tumor Eradication Research) study the molecular tumor board divides younger adults with advanced‐stage cancer across all histologies and patients with rare tumors into different therapy baskets which represent therapy groups like chemotherapy, radiation or surgical removal. Stahl-Toyota and her colleagues wanted to use machine learning algorithms to determine this basket allocation at the workshop. The goal of the jamboree would be to answer the following question: “How accurately can we predict the therapy recommendations by the molecular tumor board on a basket level?”

At the beginning of the workshop, patient data sets were prepared for modeling. "We had to filter and simplify the data from the MASTER study in order to answer our question within a short period of time," explains Stahl-Toyota. In a first step, the genetic changes of the patients were classified into groups. On the second step, the models were trained. For each of the seven "therapy baskets" six different methods were used. On the last day, the detail level of the input data was increased to take into account the characteristics of different types of gene variation. "The jamboree was a great success. The best results arose when we worked with the real data and did not look at a PowerPoint slide", concludes Stahl-Toyota. “ Only this way we can improve the algorithm and add value to patients.”

© dkfz.de

to top