Planning and Control using Model-Based Reinforcement Learning

This project took place in winter term 2020, you CAN NOT apply to this project anymore!

Results of this project are explained in detail in the final documentation and presentation.

  • Sponsored by: PreciBake GmbH
  • Project Leader: Dr. Ricardo Acevedo Cabra
  • Scientific Lead: M.Sc. Mathias Sundholm, M.Sc. Hamdi Belhassen
  • TUM Co-Mentor: PhD candidate Michael Rauchensteiner
  • Term: Winter Semester 2020

PreciBake, is a company based in Munich, New York and Mumbai, developing AI solutions for food-tech and baking industry. Our AI team is continuously working on developing and improving our ML algorithms for tasks such as image classification, object detection and tracking.

Supervised models have proven very effective when it comes to specialized tasks such as image classification and object detection. Real world systems are however often much larger and more complicated and may consist of several software and machine learning components working together. Such systems are hard to optimize in a supervised fashion as there might not be a known mapping between observations and optimal actions. Reinforcement learning agents on the other hand are more flexible and do not rely on dense ground truth labels in order to learn. Instead they adapt their behavior to maximize a well defined reward function. 

Training reinforcement learning agents for real systems is in practice complicated since in order to learn the agent needs to act with the environment which might neither be safe nor feasible. By using a model of the environment the reinforcement learning agent might improve both sample efficiency and AI safety over model-free agents by using an internal model of the world in order to predict the consequences of its future actions. Using the model the agent can plan a series of actions and select the next action based on the most promising future action trajectory and discard trajectories that would lead to failure.

The goal of the project is to develop agents that are capable of planning and estimating the consequences of its future actions, and can therefore adapt to new or changing environments.