Variational AI: Training Better Models at Lower Cost

KAKENHI Grant-in-Aid for Scientific Research (A), FY 2026-2031. Project no. 26H02541.

About the Project

Contemporary AI models are highly capable but with increasing scale they become costly to train, maintain and update. In this project, we develop new foundational learning algorithms for neural networks to drastically reduce AI's training cost by improving its ability to adapt and continually learn from experiences in an open world. A focus is on variational Bayesian-like training methods, which can address such issues and our recent research shows them to be effective to large neural networks. This project aims to further advance variational learning methods for large deep networks and to demonstrate their application towards sustainable AI training.

Official KAKENHI record

Research and Open Positions

The project focuses on the following research directions:

Effective New Variational Learning Algorithms for Large Deep Networks, (e.g., pre-training)
Variational Methods for Adaptive, Continual, Distributed and Reinforcement Learning in Deep Networks
Mechanistic Interpretability, Sensitivity Analysis and Influence Functions
Theoretical Foundations of Variational Learning (PAC-Bayes, Optimization in Spaces of Measures, etc.)

I am looking to hire two interns (~6 months fully funded internship) to work with me at RIKEN AIP in Tokyo in FY2026. If you are interested or looking to collaborate, please get in touch by email:

Tutorials, Teaching and Other Materials

The Improved Variational Online Newton (IVON) Optimizer, github link.

Publications

T. Möllenhoff*, S. Swaroop*, F. Doshi-Velez, M. E. Khan. Federated ADMM from Bayesian Duality, In Proceedings of the International Conference on Learning Representations (ICLR), 2026. [code]
N. Daheim, C. Meister, T. Möllenhoff, I. Gurevych. Uncertainty-Aware Decoding with Minimum Bayes' Risk, In Proceedings of the International Conference on Learning Representations (ICLR), 2025.
B. Cong, N. Daheim, Y. Shen, D. Cremers, R. Yokota, M.E. Khan, T. Möllenhoff. Variational Low-Rank Adaptation using IVON. NeurIPS Workshop on Fine-Tuning in Modern ML (FITML), 2024. [code]
Y. Shen*, N. Daheim*, B. Cong, P. Nickl, G.M. Marconi, C. Bazan, R. Yokota, I. Gurevych, D. Cremers, M.E. Khan, T. Möllenhoff. Variational Learning is Effective for Large Deep Networks. In Proceedings of the International Conference on Machine Learning (ICML), 2024. [code]
P. Nickl, L. Xu, D. Tailor, T. Möllenhoff, M. E. Khan. The Memory-Perturbation Equation: Understanding Model's Sensitivity to Data. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023. [code]
T. Möllenhoff, M. E. Khan. SAM as an Optimal Relaxation of Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), 2023. [code]
E. M. Kiral, T. Möllenhoff, M. E. Khan. The Lie-Group Bayesian Learning Rule. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2023. [code]