Variational Training for Better Models at Lower Cost
About the Project
Modern AI models are highly capable, but with increasing scale they become costly to train, maintain, and update. In this project, we develop new fundamental learning algorithms for neural networks to drastically reduce AI's training cost, for example, by improving its ability to adapt and continually learn or by other means such as enabling sparsity and low-precision training. A focus is on variational Bayesian learning methods, which can address such issues, and our recent research shows them to be effective for large neural networks. This project aims to further advance variational learning methods for large deep networks and to demonstrate their application towards sustainable AI training.
Research and Open Positions
The project focuses on the following research directions:
- Effective new variational learning algorithms for large deep networks (e.g., pre-training, sparsity, low-precision)
- Applications in continual learning, distributed learning, active learning, reinforcement learning, etc.
- Mechanistic interpretability, sensitivity analysis, and influence functions
- Theoretical foundations of variational learning (PAC-Bayes, optimization in spaces of measures, etc.)
I am looking to hire interns (~5 months or longer fully funded internship) to work with me at RIKEN AIP in Tokyo in FY2026 (earliest starting date: October 2026). For eligibility criteria, and more information, please see the internship program website. If you are interested, please get in touch by email:
Tutorials, Slides, Other Materials
- The Improved Variational Online Newton (IVON) Optimizer, GitHub link.
Publications
- T. Möllenhoff*, S. Swaroop*, F. Doshi-Velez, M. E. Khan. Federated ADMM from Bayesian Duality, In Proceedings of the International Conference on Learning Representations (ICLR), 2026. [code]
- N. Daheim, C. Meister, T. Möllenhoff, I. Gurevych. Uncertainty-Aware Decoding with Minimum Bayes' Risk, In Proceedings of the International Conference on Learning Representations (ICLR), 2025.
- B. Cong, N. Daheim, Y. Shen, D. Cremers, R. Yokota, M. E. Khan, T. Möllenhoff. Variational Low-Rank Adaptation using IVON. NeurIPS Workshop on Fine-Tuning in Modern ML (FITML), 2024. [code]
- Y. Shen*, N. Daheim*, B. Cong, P. Nickl, G. M. Marconi, C. Bazan, R. Yokota, I. Gurevych, D. Cremers, M. E. Khan, T. Möllenhoff. Variational Learning is Effective for Large Deep Networks. In Proceedings of the International Conference on Machine Learning (ICML), 2024. [code]
- P. Nickl, L. Xu, D. Tailor, T. Möllenhoff, M. E. Khan. The Memory-Perturbation Equation: Understanding Model's Sensitivity to Data. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023. [code]
- T. Möllenhoff, M. E. Khan. SAM as an Optimal Relaxation of Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), 2023. [code]
- E. M. Kiral, T. Möllenhoff, M. E. Khan. The Lie-Group Bayesian Learning Rule. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2023. [code]