# Acceleration of Neural Network Training with Microsoft DeepSpeed

This project took place in summer term 2022, you **CAN NOT** apply to this project anymore!

**Results** of this project are explained in detail in the **final report.**

- Sponsored by: TUM Chair of Aerodynamics and Fluid Mechanics, MDSI Prof. Dr.-Ing. N. Adams.
- Project Lead: Dr. Ricardo Acevedo Cabra
- Scientific Lead: PhD candidate Ludger Paehler
- TUM Co-Mentor: Dr. Ricardo Acevedo Cabra
- Term: Summer semester 2022

**Project Description**

Neural network training is a time- and cost-intensive endeavor for modern neural network architectures, with the training of extreme-scale natural language processing models such as GPT-3 requiring up to 1287 MWh of energy. To rein in such excessive energy consumption and make neural network training more efficient approaches such as Microsoft DeepSpeed customly compile the compute kernels of the neural network on the existing hardware, hence utilizing

their compute-capability more effectively. Extending this approach across a network, and combining it with the auxiliary custom parallelisation primitives, and reduced precision optimizers leads to speed-ups of more than 2x.

In this project the students will work on the acceleration of research-level neural networks, specifically a generative adversarial network (GAN), and a graph neural network (GNN) for the acceleration of smoothed-particle hydrodynamics. For this we will begin by introducing the DeepSpeed concepts at the beginning of the project, and experimenting with them on a simple GAN to illustrate the core concepts behind it. Afterwards we will advance to the research GAN, to accelerate the simulation here with customly accelerated kernels, which are to be benchmarked and tested on GPU-servers. This work will then be further extended to GNNs,

which are being used for surrogate-based acceleration of smoothed-particle hydrodynamics solvers by approximating the time-evolution of the particle system. Here we will delve deep into PyTorch Geometric to accelerate our used kernels, and accelerate the network’s training.

The work done to accelerate the neural network training is expected to be open-sourced, and upstreamed to help accelerate the training of other neural networks.

**Tools & Datasets**

- Machine Learning Stack:
- PyTorch
- Microsoft DeepSpeed
- TorchScript
- PyTorch Geometric

**Datasets**:

- GAN Dataset: Schlieren Dataset of ~30k segmented samples which becomes 300k segmented samples at training time due to added random noise to prevent overfitting
- SPH Datasets: Taylor-Green Vortex in 2D & 3D, with 8k time-series each and
- more comparable datasets available if necessary

Accepted students to this project should attend (unless they have proven knowledge) online workshops at the LRZ from 19.04. - 22.04.22. More information will be provided to students accepted to this project.