Multi-Agent Reinforcement Learning for Logistics

This project was for summer term 2020, you CAN NOT apply to this project anymore!

Results of this project are explained in detail in the final documentation and presentation.

Domain: Decision-making in complex and stochastic logistics systems is a major challenge. To optimize resource efficiency like the utilization of a railway network or the performance of autonomous robots in a chaotic warehouse, many providers of logistics services rely on heuristics. Due to the large sizes of logistics systems, a minimal improvement (e.g., by 1%) of the decision-making processes translates into millions of saved costs and a significant reduction of C02 emissions.

Methods: The use of Artificial Intelligence reports phenomenal results in complex decision-making systems. Techniques like multi-agent reinforcement learning (MARL) surpass human performance in highly complex domains like StarCraft 2. The data science & artificial intelligence department at MaibornWolff cooperates with providers of logistics services to apply these state-of-the-art methods to logistics systems.

Goal: The goal of this project is to optimize the logistics process of a simulated chaotic warehouse (CW). The metric to minimize are agents’ average steps taken to finish transactions. This optimization can be done through two methods: Through smart modeling of the CW and/or the implementation of a sophisticated MARL algorithm.

A chaotic warehouse consists of the following components: 

  • items ·
  • staging-in-area (one): items of an inbound-transaction arrive here
  • staging-out-area (one): items of an outbound-transaction need to be delivered here
  • bins (multiple): hold items
  • agents: are moving in the CW to put and pick items 
  • inbound-transactions: specify bundles of items to be picked from the staging-in-area and put into bins 
  • outbound-transactions: specify bundles of items to be picked from bins and delivered to the staging-out-area

Starting Point: As a participant for this project, you will get access to the implementation of a baseline algorithm and the simulation environment of the CW (Python Code). The baseline algorithm is based on Q-Learning and consists of a Deep-Q-Network. To meet the goal, you can use the provided code to develop your solution.

What you’ll learn: From playing games to driving to controlling fleets of autonomous trains/robots to optimally distributing e-scooters in a city: Many problems require decision-making over time in dynamic environments. Throughout this project, you’ll understand how real-world decision-making processes can be modeled to simulate these processes. Furthermore, you’ll learn how AI can be used to learn optimized strategies for these processes.