Symbolic representation of a DRL-CO-pipeline: a neural network estimates node weights, a CO-model uses them to generate an action. The environment returns a reward and a next state based on the action.

Scalable Reinforcement Learning for Industrial Applications

by Heiko Hoppe

While Deep Reinforcement Learning (DRL) has led to significant success in areas like robotic control or natural language processing, it still struggles in industrial applications, such as inventory planning, vehicle dispatching, or machine scheduling. These challenges mainly arise from the large and combinatorially structured action spaces of industrial problems, which prevent effective training of DRL algorithms. For example, the action space of an inventory planning problem scales exponentially with the number of considered items and allowed stock levels.

To overcome these challenges, we combine DRL and combinatorial optimization (CO) in hybrid architectures. The DRL-part of such a pipeline allows to generalize over states and is a powerful predictor of future dynamics. The CO-part of the pipeline addresses the combinatorial structure of the action space and allows for dimensionality reduction. Effectively, the CO-component maps the lower-dimensional DRL output to the higher-dimensional combinatorial action space.

We study several new algorithms for training these pipelines: first, Multi-Agent DRL is a very adaptive methodology for various combinatorial action spaces that may contain discrete and continuous components. Second, Structured DRL enables differentiation through CO-layers using Fenchel-Young losses, incorporating the dynamics of CO into DRL to enhance training stability and performance. Third, Action Space Mappings for DRL can improve scalability of DRL in high-dimensional action spaces. Using these methodologies, we aim to construct scalable DRL-algorithms for a variety of industrial problems.

Hoppe H., Enders T., Cappart Q., Schiffer M. (2024): Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems. In Proceedings of the 6^th Annual Learning for Dynamics & Control Conference (L4DC), Proceedings of Machine Learning Research (PMLR), 242, pp. 260–272.

6^th Annual Learning for Dynamics & Control Conference (L4DC), 15.07.2024–17.07.2024, Oxford, United Kingdom, conference paper

OR 2024 – International Conference on Operations Research 2024, 03.09.2024–06.09.2024, Munich, Germany, conference presentation

Supervisor

Prof. Dr. Maximilian Schiffer

Business Analytics and Intelligent Systems

Scalable Reinforcement Learning for Industrial Applications

by Heiko Hoppe

Publications

Conferences

Supervisor

Prof. Dr. Maximilian Schiffer