Generation of synthetic segmented medical images for tumor detection

This project took place in summer term 2022, you CAN NOT apply to this project anymore!

Results of this project are explained in detail in the final report.

About the Chair for Application and Middleware Systems
At the chair we perform research at the intersection of distributed systems and data management. Our topics include distributed machine learning, graph processing, event-based systems and blockchains.

Motivation

Accounting for over 1.7 million deaths in 2018, lung cancer is one of the most common causes of cancer death worldwide. While tumors are hard to detect with the human eye, computer-aided diagnoses carries the potential of supporting doctors in the early discov- ery of nodules. Recent studies have shown that neural-network based object detection models can perform at a comparable error rate to doctors. While the training of these models requires massive amounts of training images, those need to contain not only the information whether an image contains a tumor (label), but also the precise location of the tumor (segmentation). Since medical images are highly sensitive and hospitals cannot share their data with other parties due to privacy constraints, training data for these models is scarce.

There are three main challenges that hinder the application of recent methods in the medical environment:

  1. Medical images are highly sensitive and scans from multiple hospitals can usually not be combined at central location to train a joint model, due to data privacy constraints.
  2. The generated images are very sensitive to the settings of the utilised scanner, its manufacturer as well as the physique of the patients in a certain region, which hinders the transfer and exchange of pre-trained models between hospitals.
  3. The datasets of existing medical images are highly imbalanced and sparse. Fortunately, most people that undergo screening, end up not having a tumor on their chest. We will show in section three, that existing open datasets do not provide sufficient tumorous images to train an object detection model.

Goal of the project

  1. We aim to identify promising approaches towards generating synthetic segmented medical images from a theoretical perspective.
  2. We aim to implement and benchmark different annotation methods in an end-to-end pipeline, to provide the project partner with a recommendation on promising approaches to purse for an industrial implementation of the task.