SS26 TUM Algorithmic Machine Learning and Explainable AI: The Handbook of Growing: An Empirical Guide for Practitioners

The Handbook of Growing: An Empirical Guide for Practitioners

Sponsored by: TUM Algorithmic Machine Learning and Explainable AI: Munich Data Science Institute (MDSI)
Project lead: Dr. Ricardo Acevedo Cabra
Scientific lead: Ferdinand Kapl, Vincent Pauline, Tobias Höppe, Prof. Stefan Bauer
Term: Summer semester 2026
Application deadline: Sunday 25.01.2026

Apply to this project here

TUM Algorithmic Machine Learning and Explainable AI

Summary

We will build a practical, compute-aware “handbook” of growing strategies for deep neural networks—when and how to add depth or width during training to increase the number of parameters of the model—through a systematic ablation study, simple rules of thumb, and open-source tooling.

Motivation & Background

Growing architectures (adding layers/width over time) saves compute versus strong baselines trained from scratch and can exhibit an inductive bias for improved reasoning in text (Saunshi et al., 2024). For frontier LLMs and VLMs where pre-training and fine tuning incur substantial cost, well-designed growth schedules can yield sizable wall-clock and energy savings while preserving or improving downstream capabilities. Yet, practitioners lack clear guidance on: when and how to grow. This project distills empirical best practices into actionable recipes by investigating the design space thoroughly.

Project Goals

G1: When-to-grow. Derive simple, robust triggers (e.g., loss-improvement plateaus, gradient/curvature surrogates) and stage lengths (e.g., equal-token or equal-FLOP per stage).
G2: How-to-grow. Compare existing growth operators: growing in depth, width or jointly; suggest novel growth operators by improved knowledge reuse.
G3: Optimizer & HP transfer. Provide recipes for mapping optimizer state (e.g., momenta, Adam statistics), LR warm-ups/decays, and regularization across growth events.
G4: Open toolkit & report. Release a light PyTorch library with scripts and a short Handbook summarizing recommendations and trade-offs.

Key Methods (student-implemented)

Implement, investigate and improve:

Growth triggers. Loss-slope/plateau detectors, scaling law derived thresholds, and strong simple baselines.
Operators. Depth (e.g., stacking) vs. Width (e.g., learnable mapping).
Optimizer state mapping. Preserving training dynamics; learning-rate (or other HP) adaptation post-growth.

Open-Source Data

Vision: CIFAR-10/100, ImageNet.
Language: OpenWebText, Fineweb-Edu.

Expected Outcomes

A concise Handbook with practical guidelines and scaling laws for growing.
An open-source PyTorch library: growth schedulers, operator implementations, optimizer-state transfer, and compute-normalized evaluation.

Student Profile

Team up to five; strong PyTorch, ML engineering, HPC experience; interest in modern architectures and alternative training paradigms.

References

Saunshi, Nikunj, Stefani Karp, Shankar Krishnan, Sobhan Miryoosefi, Sashank Jakkam Reddi, and Sanjiv Kumar (2024). “On the inductive bias of stacking towards improving reasoning”. In: Advances in Neural Information Processing Systems 37, pp. 71437–71464.

Apply to this project here

To top

Be part of TUM-DI-LAB!

Mentors:

cutting edge knowledge is essential for our lab. Professors, postdocs and doctoral students are welcome as project mentors. Find out here how to become a mentor.

Partners:

Industrial partners are indispensable for TUM-DI-LAB. Find out here how to become a partner

Munich Data Science Institute

TUM-DI-LAB is part of the Munich Data Science Institute (MDSI) since October 2021