In a large variety of applications, deep learning has demonstrated to be extremely powerful and flexible regarding highly complicated analytical and predictive tasks. A complete theoretical, in particular mathematical, understanding and explanation of deep neural networks and associated algorithmic procedures is yet to be found.
The aim of this PhD project is to combine different mathematical approaches from the theory of dynamical systems to provide a new, rigorous framework for machine learning in the context of deep neural networks. The basic idea is to identify a fast and a slow time scale in the learning dynamics. In that way, a deep neural network can be interpreted as a two-timescale dynamical system.
The fast dynamics describes the information propagation through a large adaptive network. To study the fast process, novel integro-partial differential equations are derived and analyzed. In this context, the theory of graph operators, also called graphops, will be used and extended. The process of adapting the coefficients of the neural network represents the learning procedure and can be interpreted as the slow dynamics. The slow process is studied with a focus on stochastic gradient descent. Thereby metastability properties are derived and the random dynamical systems interpretation is used. Finally, these two dynamical time scales are combined into a full model of the neural network dynamics. In this model, the interplay between stochastic learning and dynamical robustness can be studied. The expectation is to find, explain and predict particular patterns and their representational significance. The mathematical tools used include stochastic dynamics, ergodic theory, adaptive networks, graph limits and multiscale dynamics.