How To Train the Best Deep Learning Models for The Edge?

This project took place in summer term 2022, you CAN NOT apply to this project anymore!

Results of this project are explained in detail in the final report.

For the past decade most of the innovation within the field of machine learning has come from a constant desire to beat the current state-of-the-art and reach human performance within fields such as computer vision and language understanding. A lot of these breakthroughs have been a product of a constant reduction of the price of computational resources when training models such as neural networks. However, a few years ago we reached an inflection point and some people will argue that we can no longer rely on Moore’s law to make our computational resources cheaper.

Therefore, in this project we will be investigating how we can reduce the size and computational complexity of neural networks. This will have tremendous implications not only for extremely large models but also for small models as we equip more of our everyday devices with sensors and the ability to make smart decisions based on sensor inputs.

Imagining the scenario of a self-driving car, where we need to make sure that our models are good enough to recognize pedestrians, but we also need to make sure that our models can recognize them fast enough to avoid a collision, and finally, we need to make sure that our models are able to fit on whatever hardware is available in the car. These are some of the challenges and constraints that we will be investigating and tackling throughout this project.

By studying the human brain, researchers have developed methods to address some of these challenges. What has been found is that within the human brain only between 0.5% and 2% of its neurons are active at a time, and similarly, only between 1% and 5% of all connections between neurons exists, which makes the neurons in the human brain both sparsely active and connected.

Building on these observations researchers have developed methods, such as sparsity, pruning, quantization and knowledge distillation, for reducing the complexity of neural networks. The idea behind sparsity is to actively reduce the number of non-zero elements, and specifically in pruning the idea is to remove the unnecessary neurons and connections. In quantization, the idea is to change the encoding of the neurons and connections to a lower precision and finally knowledge distillation is about creating smaller networks to mimic the behavior of larger ones.

However, most of this work is limited to only looking at one or two metrics - accuracy and latency or accuracy and memory footprint. We are arguing that it is paramount to include an even larger set of metrics into the design process of neural networks – think throughput, latency, peak memory usage, storage memory, sparsity and more – which is what we will be tackling throughout the project.

So, if you are interested in learning how to design and develop machine learning systems, getting hands-on experience with the latest tools, and helping to tackle the next frontier of machine learning, join this project!

Accepted students to this project should attend (unless they have proven knowledge) online workshops at the LRZ from 19.04. - 22.04.22. More information will be provided to students accepted to this project.