Acceleration of Neural Network Training with Microsoft DeepSpeed

This project took place in summer term 2022, you CAN NOT apply to this project anymore!

Results of this project are explained in detail in the final report.

Video describing the Acceleration of Neural Network Training project

Sponsored by: TUM Chair of Aerodynamics and Fluid Mechanics, MDSI Prof. Dr.-Ing. N. Adams.
Project Lead: Dr. Ricardo Acevedo Cabra
Scientific Lead: PhD candidate Ludger Paehler
TUM Co-Mentor: Dr. Ricardo Acevedo Cabra
Term: Summer semester 2022

Illustration describing the Acceleration of Neural Network Training project

Project Description
Neural network training is a time- and cost-intensive endeavor for modern neural network architectures, with the training of extreme-scale natural language processing models such as GPT-3 requiring up to 1287 MWh of energy. To rein in such excessive energy consumption and make neural network training more efficient approaches such as Microsoft DeepSpeed customly compile the compute kernels of the neural network on the existing hardware, hence utilizing
their compute-capability more effectively. Extending this approach across a network, and combining it with the auxiliary custom parallelisation primitives, and reduced precision optimizers leads to speed-ups of more than 2x.

In this project the students will work on the acceleration of research-level neural networks, specifically a generative adversarial network (GAN), and a graph neural network (GNN) for the acceleration of smoothed-particle hydrodynamics. For this we will begin by introducing the DeepSpeed concepts at the beginning of the project, and experimenting with them on a simple GAN to illustrate the core concepts behind it. Afterwards we will advance to the research GAN, to accelerate the simulation here with customly accelerated kernels, which are to be benchmarked and tested on GPU-servers. This work will then be further extended to GNNs,
which are being used for surrogate-based acceleration of smoothed-particle hydrodynamics solvers by approximating the time-evolution of the particle system. Here we will delve deep into PyTorch Geometric to accelerate our used kernels, and accelerate the network’s training.

The work done to accelerate the neural network training is expected to be open-sourced, and upstreamed to help accelerate the training of other neural networks.

Tools & Datasets

Machine Learning Stack:
PyTorch
Microsoft DeepSpeed
TorchScript
PyTorch Geometric

Datasets:

GAN Dataset: Schlieren Dataset of ~30k segmented samples which becomes 300k segmented samples at training time due to added random noise to prevent overfitting
SPH Datasets: Taylor-Green Vortex in 2D & 3D, with 8k time-series each and
more comparable datasets available if necessary

Accepted students to this project should attend (unless they have proven knowledge) online workshops at the LRZ from 19.04. - 22.04.22. More information will be provided to students accepted to this project.

To top

Be part of TUM-DI-LAB!

Mentors:

cutting edge knowledge is essential for our lab. Professors, postdocs and doctoral students are welcome as project mentors. Find out here how to become a mentor.

Partners:

Industrial partners are indispensable for TUM-DI-LAB. Find out here how to become a partner

Munich Data Science Institute

TUM-DI-LAB is part of the Munich Data Science Institute (MDSI) since October 2021