SFR: Sparse Function-space Representation of Neural Networks
Function-space Parameterization of Neural Networks for Sequential Learning Aidan Scannell*, Riccardo Mereu*, Paul Chang, Ella Tamir, Joni Pajarinen, Arno Solin International Conference on Learning Representations (ICLR 2024) |
Sparse Function-space Representation of Neural Networks Aidan Scannell*, Riccardo Mereu*, Paul Chang, Ella Tamir, Joni Pajarinen, Arno Solin ICML 2023 Workshop on Duality Principles for Modern Machine Learning |
Abstract
Sequential learning paradigms pose challenges for gradient-based deep learning due to difficulties incorporating new data and retaining prior knowledge. While Gaussian processes elegantly tackle these problems, they struggle with scalability and handling rich inputs, such as images. To address these issues, we introduce a technique that converts neural networks from weight space to function space, through a dual parameterization. Our parameterization offers: (i) a way to scale function-space methods to large data sets via sparsification, (ii) retention of prior knowledge when access to past data is limited, and (iii) a mechanism to incorporate new data without retraining. Our experiments demonstrate that we can retain knowledge in continual learning and incorporate new data efficiently. We further show its strengths in uncertainty quantification and guiding exploration in model-based RL.
TL;DR
SFR
is a “posthoc” Bayesian deep learning method- Equip any trained NN with uncertainty estimates
SFR
can be viewed as a function-space Laplace approximation for NNsSFR
has several benefits over weight-space Laplace approximation for NNs:- Its function-space representation is effective for regularization in continual learning (CL)
- It has good uncertainty estimates
- We use them to guide exploration in model-based reinforcement learning (RL)
- It can incorporate new data without retraining the NN
SFR | GP | Laplace BNN | |
---|---|---|---|
Function-space | ✅ | ✅ | ❌ (weight space) |
Image inputs | ✅ | ❌ | ✅ |
Large data | ✅ | ❌ | ✅ |
Incorporate new data fast | ✅/❌ | ✅ | ❌ (requires retraining) |
Useage
See the notebooks for how to use our code for both regression and classification.
Minimal example
Here’s a short example:
Citation
Please consider citing our conference paper:
Or our workshop paper: