Semi-Supervised Labeling of Data with Varying Distributions

  • Sponsored by: PreciBake GmbH 
  • Scientific Lead: Mathias Sundholm
  • Project Lead: Dr. Ricardo Acevedo Cabra
  • Term: Winter Semester 2018

PreciBake, a Munich based company with offices in New York and Mumbai, develops AI solutions for food and baking industry. Our AI team is continuously working on developing and improving our ML algorithms for tasks such as image classification, object detection and tracking.

A challenge often faced with real world data sets is that they have uneven data distributions with respect to different properties such class labels, light conditions or camera angles. Maintaining large, clean, and uniform data sets becomes increasingly time consuming and requires often expert knowledge. An additional challenge is that the models easily overfit to certain environments or illuminations. Hence models trained in a controlled lab environment might have degraded performance once deployed in the real world. Deployed models require therefore regular maintenance (retraining using new labeled data) in order to guarantee good classification performance for our systems. We propose to use recently introduced semi-supervised deep learning methods in order to keep the labeled data set small, clean and maintainable, while still keeping good classification performance. In the project you will work on improving the classification performance for identifying different baking products. The ultimate goal is to utilize a large amount of unlabeled data to improve model accuracy and robustness. Such algorithms has the potential to generalize better across changing environments and to automatically improve itself over time as new data becomes available.

Results: The results of this project are explained in detail in the final documentation and presentation.