This seminar series aims to provide a platform for young researchers (PhD student or post-doc level) to give invited talks about their research, intending to have a diverse set of talks & speakers on topics related to probabilistic machine learning.

The presentations are virtual talks (~30 min + 15 min Q&A) given live over Zoom. To stimulate discussion and allow for the presentation of ongoing work, the talks will not be recorded unless the speaker separately requests for it. The schedule is in Eastern European (Summer) Time, click here for EET (winter time) conversion.

Seminar organizers Martin Trapp and Arno Solin

Combining Pseudo-Point and State Space Approximations for Sum-Separable Gaussian Processes

Will Tebbutt (University of Cambridge)

Abstract: State space approximations and pseudo point approximations can be combined in a principled manner to yield scalable approximate inference algorithms for sums of separable Gaussian processes. In this talk, I will: 1. show how this combination can be performed for variational pseudo point approximations via a simple conditional independence result, 2. discuss how existing exact inference algorithms for state space models can be re-purposed for approximate inference, 3. interpret existing related work in light of our work, and 4. briefly discuss some experimental results in a spatio-temporal context. For more info, please see our recent UAI paper.

About the presenter: Will is a PhD student with Rich Turner in the Machine Learning Group at Cambridge, and is interested in probabilistic modelling in general. He is particularly interested in Gaussian processes: how to specify and scale them in large spatio-temporal settings, how best to write software to work with them, and challenges faced in climate science for which they might be helpful.

Automated Augmented Conjugate Inference for Gaussian Processes

Théo Galy-Fajou (TU Berlin)

Abstract: Gaussian Processes are a tool of choice for modelling functions with uncertainties. However, the inference is only tractable analytically for the classical case of regression with Gaussian noise since all other likelihoods are not conjugate with the Gaussian prior. In this talk, I will show how one can transform a large class of likelihoods into conditional conjugate distributions by augmenting them with latent variables. These augmented models have the advantage that, while the posterior inference is still not fully analytic, the full conditionals are! Consequently, one can work easily (and efficiently!) with algorithms like Gibbs sampling or Coordinate Ascent VI (CAVI) and outperform existing inference methods.

About the presenter: Théo Galy-Fajou is a PhD candidate at TU Berlin under the supervision of Prof. Manfred Opper. His work focuses specifically on Gaussian processes and ways to scale them easily to more data and more complex models. He also has a general interest in all approximate Bayesian inference techniques. He is heavily involved in open-source development for inference and visualization techniques of Bayesian methods in the Julia programming language.

Finite Mixture Models Do Not Reliably Learn the Number of Components

Diana Cai (Princeton University)

Abstract: Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. A common suggestion is to use a finite mixture model (FMM) with a prior on the number of components. Past work has shown the resulting FMM component-count posterior is consistent; that is, the posterior concentrates on the true, generating number of components. But consistency requires the assumption that the component likelihoods are perfectly specified, which is unrealistic in practice. In this paper, we add rigour to data-analysis folk wisdom by proving that under even the slightest model misspecification, the FMM component-count posterior diverges: the posterior probability of any particular finite number of components converges to 0 in the limit of infinite data. Contrary to intuition, posterior-density consistency is not sufficient to establish this result. We develop novel sufficient conditions that are more realistic and easily checkable than those common in the asymptotics literature. We illustrate the practical consequences of our theory on simulated and real data.

About the presenter: Diana Cai is a Ph.D. candidate in computer science at Princeton University and is advised by Ryan Adams and Barbara Engelhardt. Her research spans the areas of machine learning and statistics and focuses on developing robust and scalable methods for probabilistic modelling and inference, with an emphasis on flexible, interpretable, and nonparametric machine learning methods. Previously, Diana obtained an A.B. in computer science and statistics from Harvard University, an M.S. in statistics from the University of Chicago, and an M.A. in computer science from Princeton University. Her research is supported in part by a Google PhD Fellowship in Machine Learning.

Causal Decision-making Meets Gaussian Processes

Virginia Aglietti (University of Warwick / DeepMind)

Abstract: Solving decision-making problems in a variety of domains such as healthcare or operations research requires experimentation. By performing interventions, one can understand how a system behaves when an action is taken and thus infer the cause-effect relationships of a phenomenon. Experiments are usually expensive, time-consuming, and may present ethical issues. Therefore, researchers generally have to trade-off cost, time, and other practical considerations to decide which experiments to conduct in order to learn about a system. In this talk, I will present two methodologies that, by linking causal inference, experimental design and Gaussian process (GP) modelling, allow to efficiently learn the causal effects in a graph and identify the optimal intervention to perform. Firstly, I will show how to construct a multi-task causal GP model, the DAG-GP model, which captures the non-trivial correlation structure across different experimental outputs. By sharing experimental information, the DAG-GP model accurately estimates the causal effects in a variety of experimental settings while enabling proper uncertainty quantification. I will then demonstrate how this model, and more generally GP models, can be used within decision-making algorithm to choose experiments to perform. Particularly, I will introduce the Causal Bayesian Optimization algorithm, and I will show how incorporating the knowledge of the causal graph in Bayesian Optimization improves the ability to reason about optimal decision making while decreasing the optimization cost and avoiding suboptimal solutions.

About the presenter: Virginia is a final year PhD student in Statistics at the University of Warwick and a visiting researcher at The Alan Turing Institute. She is supervised by Dr. Theodoros Damoulas. In September, Virginia will join DeepMind as a Research Scientist to work on causal probabilistic models. Virginia is interested in linking probabilistic models, specifically Gaussian processes, and causality to develop algorithms for causal decision making under uncertainty. She has expertise in working with Gaussian processes and variational inference, with a particular focus on models for point processes. In terms of applications, she is particularly interested in spatio-temporal problems in the social sciences.

Equivariant Probabilistic Generative Modelling

Priyank Jaini (University of Amsterdam)

Abstract: In this talk, I will discuss recent work on developing generative models for efficient sampling and inference by incorporating inductive biases in the form of equivariances. I will begin by introducing the Equivariant Stein Variational Gradient Descent algorithm – an equivariant sampling method based on Stein’s identity for sampling from densities with symmetries. Equivariant SVGD explicitly incorporates symmetry information in a density through equivariant kernels, which makes the resultant sampler efficient both in terms of sample complexity and the quality of generated samples. Subsequently, I will demonstrate the use of Equivariant SVGD by defining equivariant energy-based models to model invariant densities that are learned using contrastive divergence. I will then discuss the applications of these equivariant energy models for modelling joint densities in regression and classification tasks for image datasets, many-body particle systems and molecular structure generation. Finally, if time permits, I will touch on methods for sampling using diffusion models and neural transport augmented Monte Carlo methods for more efficient sampling in discrete spaces with applications in denoising, Bayesian posterior sampling, and training light-weight Bayesian quantised neural nets.

About the presenter: Priyank Jaini is a postdoctoral researcher at the University of Amsterdam and Bosch-Delta Lab, working with Prof. Max Welling. Before that, he completed his PhD at the University of Waterloo under the supervision of Prof. Pascal Poupart and Prof. Yaoliang Yu, where he received the doctoral dissertation award from the faculty of Math for his PhD thesis. His research interests lie in building tractable probabilistic models for reasoning under uncertainty. Recently, he has been interested in incorporating inductive biases in the form of symmetries through equivariances in probabilistic modelling and applying them to downstream tasks like molecular generation and modelling many-body particle systems.

Tractable Probabilistic Reasoning for Trustworthy AI

YooJung Choi (UCLA)

Abstract: Automated decision-making systems are increasingly being deployed in areas with personal and societal impacts: from personalized ads to medical diagnosis and criminal justice. This led to growing interest and need for trustworthy AI and ML systems - that is, models that are robust, explainable, fair, and so on. It is important to note that these guarantees only hold with respect to a certain model of the world, with inherent uncertainties. In this talk, I will present how probabilistic modelling and inference, by incorporating a distribution, offer a principled way to handle different kinds of uncertainties when reasoning about decision-making system behaviours. For example, labels in training data may be biased; I will show that probabilistic circuits, a class of tractable probabilistic models (TPMs), can be effective in enforcing and auditing fairness properties by explicitly modelling a latent unbiased label. Another common source of uncertainty is missing values at prediction time, which also leads to fairness and robustness queries that account for this to be computationally hard inference tasks. I will also demonstrate how TPMs can again tractably answer these complex queries.

About the presenter: YooJung Choi is a Ph.D. candidate in the Computer Science Department at UCLA, advised by Prof. Guy Van den Broeck. Her research is broadly in the areas of artificial intelligence and machine learning, with a focus on probabilistic modelling and inference for automated decision-making. In particular, she is interested in developing algorithms for tractable inference of complex queries and characterizing the boundaries of tractable inference. Her work also focuses on applying these results to address fairness, robustness, explainability, and in general, aim towards trustworthy AI/ML.

Probabilistic Inference, Message Passing, and Hybrid Models

Christian Knoll (TU Graz)

Abstract: Probabilistic graphical models are flexible models for representing complex high-dimensional distributions and for incorporating domain knowledge in an intuitive and expressive way. For making probabilistic inference, one often relies on recursive message passing methods. While these methods are efficient for restricted model classes (e.g., for trees), they only serve as approximation methods for more complex models. In this talk, I will show how we can enhance the performance of message passing methods from two opposing angles: i.e., by simplifying the model itself with the utilized inference method in mind and by modifying the inference method with the underlying model in mind. Therefore, I will show how we can advance our understanding of message passing methods by considering them as a dynamic system and by applying tools from system theory. These insights will then suggest various improvements. Recently, we have also complemented message passing methods by neural networks. I will discuss how such hybrid models benefit from the flexibility of neural networks in combination with the implicit underlying model assumptions.

About the presenter: Christian Knoll is a postdoc researcher at the signal processing and speech communications laboratory at the Graz University of Technology. His research interests include machine learning, statistical signal processing, and probabilistic graphical models. He is particularly interested in understanding and improving message passing methods for performing probabilistic inference in graphical models. Recently, he has also been interested in combining graphical models with neural networks.

Score-based Generative Modeling and the Diffusion Schrödinger Bridge

James Thornton (University of Oxford)

Abstract: Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schrödinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013). Joint work with Valentin De Bortoli, Jeremy Heng and Arnaud Doucet.

About the presenter: James is a PhD student at the University of Oxford, supervised by George Deligiannidis and Arnaud Doucet. His research interests are at the intersection of Optimal Transport, sampling methods and machine learning.

Topographic Generative Models Learn Structured Representations

Andy Keller (University of Amsterdam)

Abstract: Topographic generative models can be seen as a class of generative models where the latent variables have an underlying topographic (or spatial) organization which determines their correlation structure. Such structure is widely observed in biological neural networks, however, its computational value is still debated and thus lacks adoption by the deep learning community at large. In this talk, we will describe the statistical motivations behind early topographic generative models like Topographic ICA, and show how such priors can be integrated into modern deep neural networks by introducing the Topographic Variational Autoencoder (TVAE). Further, we will show how topographic representations can be seen as generalized structured representations, and demonstrate how topographic organization over space and time can be leveraged to induce the learning of equivariant sets of features we call capsules. Finally, we will show preliminary results comparing the representations learned by deep TVAEs with FMRI recordings, demonstrating the emergence of localized specialized regions similar to the face area observed in primates.

About the presenter: T. Anderson Keller (Andy) is a fourth-year PhD student supervised by Max Welling at the University of Amsterdam. His work is focused on probabilistic generative models inspired by observations and theories from neuroscience. His current interests broadly include: developing unsupervised methods for structured representation learning (e.g. equivariant & invariant representations), exploring the computational benefits of topographically organized representations, and improving techniques for efficiently training deep latent variable models. His past research includes studying fast-weight recurrent neural networks while part of the Intel AI Lab and developing methods for training unconstrained normalizing flows.

Bayesian Deep Learning with Linearised Neural Networks

Javier Antorán (University of Cambridge)

Abstract: Despite their ubiquitousness in modern data-driven decision-making systems, neural networks are not very well understood. A symptom of this is that network hyperparameters are almost always chosen via cross-validation, an expensive approach that scales poorly in the number of hyperparameters. Additionally, obtaining robust uncertainty estimates for neural network predictions remains an open problem. The probabilistic framework holds the promise of providing both an objective for model selection and reliable uncertainty estimates. However, for the case of neural networks exact probabilistic inference is intractable. This talk introduces the Linearised Laplace approximation for Bayesian deep learning. We examine the assumptions behind linearised Laplace, particularly in conjunction with model selection. We show that these interact poorly with some now-standard features of deep learning—stochastic approximation methods and normalisation layers—and make recommendations for how to better adapt this classic method to the modern setting. We provide Theoretical support of our recommendations and validate them empirically on MLPs, classic CNNs, residual networks with and without normalisation layers, generative autoencoders and transformers. As a case study, we deep dive into Bayesian deep learning methods for tomographic reconstruction. Using linearised Laplace, we construct a probabilistic Deep Image Prior over reconstructed images. Inference in this model allows us to choose U-Net architecture parameters without the need for cross-validation and yields state of the art uncertainty calibration for tomographic reconstruction.

About the presenter: Javier Antorán is a third-year PhD student at the University of Cambridge, supervised by Jose Miguel Hernandez-Lobato and Max Welling. His research focuses on probabilistic modelling and inference. Specifically, Javier’s research spans Bayesian deep learning, Gaussian processes, causal inference and interpretable machine learning. Previously to starting his PhD, Javier worked as a telecommunications engineer developing communications infrastructure for the ATLAS experiment at CERN, and co-founded the startup ARISE, which develops machine learning technology to increase the efficiency of the process in the agricultural sector.

The Coupled Rejection Sampler

Adrien Corenflos (Aalto University)

Abstract: Coupling methods have recently been used to compute unbiased estimates of Markov chains Monte Carlo and particle smoothing expectations. However, in most cases, sampling from couplings has a random run time, the variance of which can be infinite. This behaviour, acceptable in distributed computing, is highly problematic in the parallel computing framework. We propose a limited variance coupled rejection sampling method for sampling from couplings of arbitrary distributions. We show how we can modify the coupled rejection method to propose an ensemble of proposals so as to asymptotically recover a maximal coupling while decreasing the total run time of the algorithm. We then discuss the important special case of coupling Gaussian distributions with different means and covariances, and show how the rejection sampling method can be optimised in this case. We then apply the method to sampling from couplings of Gaussian tails, perform coupled Gibbs sampling, couple parallel resampling algorithms in particle filtering, and couple manifold MALA.

About the presenter: Adrien Corenflos is a PhD student at Aalto university under Simo Särkkä. Prior to this, he worked as a quantitative researcher at JP Morgan, UK. His research interests typically orbit around Monte Carlo methods for machine learning, with an emphasis on sequential Monte Carlo.

Low-Cost Bayesian Methods for Fixing Neural Nets' Overconfidence

Agustinus Kristiadi (University of Tübingen)

Abstract: Well-calibrated predictive uncertainty of neural networks—essentially making them know when they don’t know—is paramount in safety-critical applications. However, deep neural networks are overconfident in the region both far away and near the training data. In our works, we study Bayesian neural networks (BNNs) and their extensions to mitigate this issue. First, we show that being Bayesian, even just at the last layer and in a post-hoc manner, helps mitigate overconfidence in deep ReLU classifiers. Then, we provide a cost-effective Gaussian-process extension to ReLU BNNs that provides a guarantee that ReLU nets will never be overconfident in the region far from the data. Furthermore, we propose two ways of improving the calibration of general BNNs in the out-of-distribution (OOD) regions near the data by (i) training the uncertainty of Laplace approximations and (ii) by leveraging OOD data during training. Finally, we provide a simple library, laplace-torch, to facilitate modern arts of Laplace approximations in deep learning. This library gives users a way to turn a standard pre-trained deep net into a BNN in a cost-efficient manner.

About the presenter: Agustinus Kristiadi is a last-year Ph.D. student at the University of Tübingen, under the supervision of Philipp Hennig. Before this, he studied computer science at the University of Bonn under Asja Fischer. His current interest is in the intersection between Bayesian deep learning and Riemannian geometry.

Embedded-model flows: Combining the inductive biases of model-free deep learning and explicit probabilistic modeling

Gianluigi Silvestri (OnePlanet Research Center and Donders Institute for Brain, Cognition and Behaviour)

Abstract: Normalizing flows have shown great success as general-purpose density estimators. However, many real-world applications require the use of domain-specific knowledge, which normalizing flows cannot readily incorporate. We propose embedded-model flows (EMF), which alternate general-purpose transformations with structured layers that embed domain-specific inductive biases. These layers are automatically constructed by converting user-specified differentiable probabilistic models into equivalent bijective transformations. We also introduce gated structured layers, which allow bypassing the parts of the models that fail to capture the statistics of the data. We demonstrate that EMFs can be used to induce desirable properties such as multimodality, hierarchical coupling and continuity. Furthermore, we show that EMFs enable a high-performance form of variational inference where the structure of the prior model is embedded in the variational architecture. In our experiments, we show that this approach outperforms state-of-the-art methods in common structured inference problems.

About the presenter: Gianluigi Silvestri is a PhD candidate in Probabilistic Machine Learning at OnePlanet Research Center and Donders Institute for Brain, Cognition and Behaviour, in Nijmegen, the Netherlands. His research interests include Variational Inference, Normalizing Flows and Bayesian Reinforcement Learning.

Parameter elimination in particle Gibbs sampling

Anna Wigren (Uppsala University)

Abstract: Bayesian joint parameter and state inference in non-linear state-space models is a difficult problem due to the often high-dimensional state sequence. Particle Gibbs (PG) is well-suited for solving this type of inference problem but produces correlated samples. In this talk, I describe how the correlation can be reduced by marginalizing out one or more parameters from the state update when conjugacy relations exist between the parameter prior and the complete data likelihood. Deriving the marginalized conjugacy relations is often time-consuming, but probabilistic programming can be employed to automate the process. I also introduce a marginalized PG sampler for multiple time series described by a common state-space model structure, where subsets of the parameters are shared between different models. The spread of mosquito-borne diseases, where some parameters are location-specific, and some are disease-specific, is one example. In theory, it is possible to update all models concurrently, but sequential Monte Carlo becomes inefficient as the number of time series increases. Our suggested marginalized PG sampler instead updates one model at a time, conditioned on the remaining datasets, and can be formulated in a modular fashion that greatly facilitates its implementation.

About the presenter: Anna Wigren is a final year PhD student at the Division of Systems and Control at Uppsala University, supervised by Fredrik Lindsten (Linköping University). Her main research interests include sequential Monte Carlo and Markov chain Monte Carlo methods.

Adaptive Design in Real Time

Desi Ivanova (University of Oxford)

Abstract: Designing sequences of adaptive experiments to maximise the information gathered about an underlying process is a key challenge in science and engineering. Bayesian Experimental Design (BED) is a powerful mathematical framework for tackling the optimal design problem. Despite the huge potential of obtaining information more quickly and efficiently, the widespread adoption of adaptive BED has been severely limited by the costly computations required at each experiment iteration. In this talk, I’ll present a new method, called Deep Adaptive Design (DAD), that alleviates this problem. DAD marks a critical change from previous BED methods in that it optimises a policy instead of individual designs during the experiment. The policy is parametrised by a neural network, taking as inputs past data and returning the design to use at the next experiment iteration. Using a single pass through the network, DAD enables quick and adaptive design decisions in real time.

About the presenter: Desi Ivanova is a second-year grad student at the University of Oxford, working with Tom Rainforth and Yee Whye Teh. Her research interests include probabilistic machine learning and inference with applications to experimental design, causality and data compression. Before joining Oxford, Desi worked as a quantitative researcher at Goldman Sachs.

Aspects of modelling CoVID-19: Understanding and quantifying the uncertainty

Swapnil Mishra (University of Copenhagen)

Abstract: Despite trends in modern medicine and epidemiological control, the risk for novel outbreaks and previously existing pathogens is currently greater than ever. Indeed, the current outbreak of SARS-CoV-2 has exposed the need for precise, robust, and principled mathematical modelling of disease outbreaks that can perform well with noisy and potentially biased data. To tackle these challenges, I will present a unifying view of modelling infectious diseases that contributes to the new understanding of the spread of the diseases and their epidemiological properties. The unified framework allows flexible probabilistic models that are capable of fitting complex and noisy data from different sources. I will touch upon how the new unified framework, built using Stan (numpyro), has helped us to characterize the initial spread of SARS-CoV-2 and quantify the altered epidemiological characteristics of various ‘variants of concerns’ (VOCs).

About the presenter: I am an Assistant Professor at the University of Copenhagen (UCPH), where I am primarily working at the intersection of public health, machine learning and Bayesian modelling. Before this, I was fortunate enough to spend my post-doc years at the School of Public Health, Imperial College London, where I worked primarily with Professor Samir Bhatt and Dr Seth Flaxman. I am working with colleagues at Imperial College London, the University of Oxford and UCPH to model the spread of COVID-19. I did my Ph.D. at the Research School of Computer Science, The Australian National University, under the supervision of Professor Lexing Xie and Dr Marian-Andrei Rizoiu. I have also worked with Professor Wray Buntine at Monash University.

Transformed Gaussian Processes to specify non-stationary function priors

Juan Maroñas (Universidad Autónoma of Madrid)

Abstract: In this talk, I will introduce the Transformed Gaussian Processes, a stochastic process specified by transforming samples from a Gaussian process using an invertible transformation (warping function). These processes can be easily made non-stationary by parameterizing the warping function through an input-dependent transformation. I show how this is achieved with a Bayesian Neural Network implemented with Monte Carlo dropout with the additional benefit of incorporating uncertainties, effectively regularizing the model. This new model can match the performance of a Deep Gaussian Process at a fraction of its cost and also allow us to incorporate inductive biases in the function that we are trying to model (e.g. positive constraints), among other benefits. Training and predictions can be scaled using a sparse variational inference algorithm. We also show how the basic idea of Transformed Gaussian Processes can be used to create a set of C dependent function priors which can provide similar or better results than an SVGP in classification problems with a big number of classes but one order of magnitude faster.

About the presenter: Since February 2022, I am a Post-doctoral researcher at Universidad Autónoma of Madrid, working with Daniel Hernández Lobato. I defended my PhD in January 2022 from Universidad Politécnica de Valencia. My research interests are Bayesian modelling (Gaussian Processes, Bayesian Neural Networks and hierarchical latent variable models) and the different ways of performing inference. I am also interested in deep generative models, Bayes decision theory and model calibration. Previous to this, I have trained Deep Convolutional Neural Networks for computer vision in different domains and also worked on other application problems such as speech enhancement using deep learning. In the next years, I am planning to study probabilistic circuits and stochastic differential equations in depth.

A Kernel Stein Test for Comparing Latent Variable Models

Heishiro Kanagawa (Gatsby Computational Neuroscience Unit, University College London)

Abstract: I will describe a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distributions of the observed variables are intractable. Given the premise that “all models are wrong,” the goal of the test is to determine whether one model significantly outperforms the other in respect of a reference data sample. The test generalises earlier kernel Stein discrepancy (KSD) tests to the case of latent variable models, a much more general class than the fully observed models treated previously. The new test, with a properly calibrated threshold, has a well-controlled type-I error. In the case of models with low-dimensional latent structure and high-dimensional observations, our test significantly outperforms the relative maximum mean discrepancy test, which is based on samples from the models, and does not exploit the latent structure. I will illustrate the test on probabilistic topic models of arXiv articles.

About the presenter: Heishiro Kanagawa is a PhD student at Gatsby Computational Neuroscience Unit supervised by Arthur Gretton, and joining Newcastle University to work with Chris Oates. He is interested in evaluating machine learning and statistical models and developing diagnostic tools.

Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization

Sam Daulton (Meta / University of Oxford)

Abstract: Optimizing expensive-to-evaluate black-box functions of discrete (and potentially continuous) design parameters is a ubiquitous problem in science and engineering applications. Bayesian optimization (BO) is a popular sample-efficient method that selects promising designs to evaluate by optimizing an acquisition function (AF) over some domain based on a probabilistic surrogate model. However, maximizing the AF over mixed or high-cardinality discrete search spaces is challenging as we cannot use standard gradient-based methods or evaluate the AF at every point in the search space. To address this issue, we propose using probabilistic reparameterization (PR). Instead of directly optimizing the AF over the search space containing discrete parameters, we instead maximize the expectation of the AF over a probability distribution defined by continuous parameters. We prove that under suitable reparameterizations, the BO policy that maximizes the probabilistic objective is the same as that which maximizes the AF, and therefore, PR enjoys the same regret bounds as the underlying AF. Moreover, our approach admits provably converges to a stationary point of the probabilistic objective under gradient ascent using scalable, unbiased estimators of both the probabilistic objective and its gradient, and therefore, as the number of starting points and gradient steps increases, our approach will recover of a maximizer of the AF (an often-neglected requisite for commonly used BO regret bounds). We validate our approach empirically and demonstrate state-of-the-art optimization performance on many real-world applications. PR is complementary to (and benefits) recent work and naturally generalizes to settings with multiple objectives and black-box constraints.

About the presenter: Sam Daulton is a research scientist at Meta, a DPhil candidate in machine learning at the University of Oxford, and co-creator of BoTorch—an open-source library for Bayesian optimization research. Sam works with Eytan Bakshy and Max Balandat at Meta and Mike Osborne at Oxford. His current research focuses on methods for Bayesian optimization in challenging scenarios. Previously, Sam worked with Finale Doshi-Velez at Harvard University on efficient and robust methods for transfer learning.

Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization

Adrián Javaloy (University of Saarland)

Abstract: A number of variational autoencoders (VAEs) have recently emerged with the aim of modelling multimodal data, e.g., to jointly model images and their corresponding captions. Still, multimodal VAEs tend to focus solely on a subset of the modalities, e.g., by fitting the image while neglecting the caption. We refer to this limitation as modality collapse. In this presentation, I argue that this effect is a consequence of modality-specific gradients conflicting during the training of multimodal VAEs. After this talk, you will be able to detect which parts of your model’s computational graph can suffer from gradients conflict (which I call impartiality blocks), as well as how to leverage existing gradient-conflict solutions from multitask learning to mitigate modality collapse. In order words, you will learn how to encourage impartial optimization across modalities. The framework I introduce is general, and we have successfully applied it to several multimodal VAE models, losses, and datasets from the literature, and empirically showed that it significantly improves the reconstruction performance, conditional generation, and coherence of the latent space across modalities.

About the presenter: I am a last-year PhD student at the University of Saarland under the supervision of Isabel Valera, where I primarily work on probabilistic machine learning models and their training dynamics (and challenges), specially when the training data comprises multiple modalities (such as images and their captions, or tabular data, where different columns represent fundamentally different quantities). My interests are broad however, and I try to stay tuned with the machine learning literature without losing my own sanity. Before moving to Saarland, I started my PhD at the Max-Planck Institute for Intelligent Systems with Isabel Valera, and previously I pursued a double-bachelor on Computer Science Engineering and Mathematics at the University of Murcia.

Learning to Dynamically Optimise Algorithms

André Biedenkapp (University of Freiburg)

Abstract: The performance of many algorithms in the fields of hard combinatorial problem solving, machine learning or AI in general depends on hyperparameter tuning. Automated methods have been proposed to alleviate users from the tedious and error-prone task of manually searching for performance-optimized configurations. However, there is still a lot of untapped potential. Existing solution approaches often neglect the non-stationarity of hyperparameters where different hyperparameter values are optimal at different stages of an algorithms run. Taking the non-stationarity into account in the optimization procedure promises much better performances but also poses many new challenges. In this talk we will discuss existing solution approaches to classical hyperparameter optimization and explore ways of tackling the non-stationarity of hyperparameters in particular by means of reinforcement learning.

About the presenter: I am a researcher at the University of Freiburg, Germany. My primary research interest is in the field of artificial intelligence, with a focus on automated machine learning and algorithm configuration, i.e., the problem of automatically tuning (machine learning) algorithms to maximize their performance. In particular, I focus on using reinforcement learning to tackle the problem of dynamically configuring algorithms. I completed my bachelor’s degree in 2015 and my master’s degree in 2017 in computer science at the University of Freiburg. From February 2018 to October 2022, I did my Ph.D. at the University of Freiburg, at the Machine Learning Chair under the supervision of Prof. Dr. Frank Hutter and Prof. Dr. Marius Lindauer (Leibniz University Hannover). In October 2022 I successfully defended my PhD (Dr. rer. nat.) with the topic ‘Dynamic Algorithm Configuration by Reinforcement Learning’.

Adversarial Attacks in Linear Regression

Antonio Horta Ribeiro (Uppsala University)

Abstract: State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial attacks are a popular framework for studying these vulnerabilities. They consider worst-case input disturbances designed to maximize model error and got a lot of attention due to their impact on the performance of state-of-the-art models. Adversarial training considers extending model training with these examples and is an effective approach to defend against such attacks. This talk will explore adversarial attacks and training in linear regression. There is a strong reason for this focus, for linear regression, adversarial training can be formulated as a convex and quadratic problem. Moreover, many interesting phenomena that can be observed in nonlinear models are still present. The setup is used to study the role of high dimensionality in robustness. And to reveal the connection between adversarial training, parameter-shrinking methods and minimum-norm solutions.

About the presenter: I’m a postdoctoral researcher at Uppsala University in Sweden. My work studies techniques to extract information and learn the intrinsic behaviour of time series, signals and dynamical systems. I have a focus on large-scale models that can perform such tasks, with a special interest in their robustness and generalization capabilities. While motivated by a range of applications, I am particularly interested in the new prospects these advances can bring to medicine. I did my Ph.D. at UFMG, a top-ranked university in Brazil, under the supervision of Luis A. Aguirre. My thesis defended in 2020 went on to be awarded the best thesis in Engineering and Physical Sciences by the university. In the next year, I joined Thomas Schön’s group in Uppsala. Due to contributions to machine learning and control with applications in cardiology, I was also awarded the Benzelius Award, granted by the Royal Society of Sciences in Uppsala to young researchers.

Score-based Generative Models on Riemannian manifolds

Emile Mathieu (University of Cambridge)

Abstract: Score-based generative models (SGMs) are a powerful class of generative models that exhibit remarkable empirical performance. Score-based generative modelling consists of a noising stage, whereby a diffusion is used to gradually add Gaussian noise to data, and a generative model, which entails a denoising process defined by approximating the time-reversal of the diffusion. Existing SGMs assume that data is supported on a Euclidean space, i.e. a manifold with flat geometry. In many domains such as robotics, geoscience or protein modelling, data is often naturally described by distributions living on Riemannian manifolds and current SGM techniques are not appropriate. We introduce here Riemannian Score-based Generative Models (RSGMs), a class of generative models extending SGMs to Riemannian manifolds. We demonstrate our approach on a variety of manifolds, and in particular with earth and climate science spherical data.

About the presenter: I am a Postdoctoral Research Associate in the Cambridge Machine Learning Group, working with Prof Richard Turner and Prof José Miguel Hernández-Lobato. Previously, I was a Postdoctoral Research Associate in the OxCSML group in the Department of Statistics @ Oxford, where I was a PhD student prior to that. My research interests centre around deep probabilistic machine learning with a focus on encoding problem symmetries and geometrical constraints, with application to the natural sciences. In the past year I have in particular been working on extending and analysing score based diffusion models.

On the Importance of Priors in Bayesian Deep Learning

Vincent Fortuin (University of Cambridge / Helmholtz AI)

Abstract: While Bayesian deep learning has been a popular field of research in recent years, most of the work has focused on improving inference methods for better performance and lower computational costs. Conversely, the priors have often been ignored and merely chosen to be isotropic Gaussian, for mathematical and computational convenience. In this talk, I will review recent work that calls this popular practice into question and highlights pathologies that can arise from prior misspecification in Bayesian neural networks. I will then present different methods that can aid the selection of better priors and I will discuss the advantages of function-space priors over weight-space ones.

About the presenter: Vincent Fortuin is an incoming research group leader at Helmholtz AI in Munich and a postdoctoral researcher at the University of Cambridge, working in the Machine Learning Group. He is also a Research Fellow at St. John’s College, and a Branco Weiss Fellow. His research focuses on the interface between deep learning and probabilistic modeling. He is particularly interested in developing models that are more interpretable and data efficient, following the Bayesian paradigm. To this end, he is mostly trying to find better priors and more efficient inference techniques for Bayesian deep learning. Apart from that, he is also interested in deep generative modeling, meta-learning, and PAC-Bayesian theory.

Robust Bayesian Inference for Simulator-based Models via the MMD Posterior Bootstrap

Harita Dellaporta (University of Warwick)

Abstract: Simulator-based models are models for which the likelihood is intractable but simulation of synthetic data is possible. They are often used to describe complex real-world phenomena, and as such can often be misspecified in practice. In this talk, I will present a novel algorithm based on the posterior bootstrap and maximum mean discrepancy estimators. This leads to a highly-parallelisable Bayesian inference algorithm with strong robustness properties. This is demonstrated through an in-depth theoretical study which includes generalisation bounds, frequentist consistency and robustness of our posterior guarantees. The approach is then illustrated on a range of examples including a g-and-k distribution and a toggle-switch model.

About the presenter: Harita is a third-year PhD student at the Warwick CDT in Mathematics & Statistics under the supervision of Prof. Theo Damoulas and a placement Enrichment Student at the Alan Turing Institute. Prior to this, Harita obtained an MSc in Computational Statistics & Machine Learning from UCL. Her research focuses on generalised notions of Bayesian inference with emphasis on different types of robustness such as model misspecification and measurement error. She is also interested in causal inference and applications of robust methodologies in healthcare.

Tractable Uncertainty for Causal Structure Learning

Benjie Wang (University of Oxford)

Abstract: Causal structure learning aims to discover the causal directed acyclic graph (DAG) responsible for generating a given dataset. However, a point estimate can be flawed due to limited data as well as non-identifiability of the underlying DAG. Bayesian approaches to structure learning instead seek to characterize a full posterior distribution over DAGs, but are typically very computationally expensive in high dimensions. In this talk, I will present Tractable Uncertainty for STructure learning (TRUST), a framework for approximate posterior inference that relies on probabilistic circuits, a type of tractable probabilistic model, as the representation of our posterior belief over causal DAGs. In comparison to Monte-Carlo posterior approximations, our representation can capture a much richer space of DAGs, while also being able to tractably reason about the uncertainty; for example, inferring the most likely completion of a partial graph, or the expected linear causal effect. Finally, I will show how our posterior representations can be learned by exploiting existing structure learning algorithms together with variational inference, leading empirically to improvement in both the quality of inferred structures and posterior uncertainty.

About the presenter: Benjie Wang is a final year DPhil student in Computer Science at the University of Oxford under the supervision of Marta Kwiatkowska. Previously, he completed his Masters in Statistics supervised by Tom Rainforth. His interests span probabilistic machine learning, causality, robustness and verification of probabilistic models, and automated decision-making. His research currently focuses on theory and methodology in tractable probabilistic modelling, and their application to computational problems in causal discovery and inference.

Probabilistic Numerical Simulation of Differential Equation Models

Peter Nicholas Krämer (University of Tübingen)

Abstract: The numerical simulation of differential equations underpins many modelling decisions made in the natural sciences. Solving differential equations with probabilistic numerical algorithms promises better uncertainty quantification than with non-probabilistic approaches, but until recently, probabilistic solvers have been inefficient, unstable, and generally impractical. In this talk, I will explain the fundamentals of probabilistic numerical algorithms for the simulation of ordinary differential equations. Building on this, I will survey the stable and efficient implementation of probabilistic numerical solvers and discuss generalisations of the algorithm to partial differential equations.

About the presenter: Nico is a final-year PhD student in Machine Learning at the University of Tübingen supervised by Philipp Hennig. His research interests are probabilistic numerical algorithms and the simulation of differential equations. Before his PhD, Nico obtained an MSc in Mathematics from the University of Bonn.

STRUM: Extractive Aspect-Based Contrastive Summarization

Beliz Gunel (Google Research)

Abstract: Comparative decisions, such as picking between two cars or deciding between two hiking trails, require the users to visit multiple webpages and contrast the choices along relevant aspects. Given the impressive capabilities of pre-trained large language models, we ask whether they can help automate such analysis. We refer to this task as extractive aspect-based contrastive summarization which involves constructing a structured summary that compares the choices along relevant aspects. In this paper, we propose a novel method called STRUM for this task that can generalize across domains without requiring any human-written summaries or fixed aspect list as supervision. Given a set of relevant input webpages, STRUM solves this problem using two pre-trained T5-based large language models: first one fine-tuned for aspect and value extraction, and second one fine-tuned for natural language inference. We showcase the abilities of our method across different domains, identify shortcomings, and discuss questions that we believe will be critical in this new line of research.

About the presenter: Beliz Gunel is a Research Scientist in Google Research where she currently focuses on using large language models for structured summarization. Her research interests lie broadly at the intersection of natural language processing, data-efficient machine learning, and representation learning. She earned her PhD from Stanford University in 2022 where she worked on leveraging prior knowledge and structure for data-efficient machine learning. During her PhD studies, she was a research intern in Microsoft Research, Meta AI, and Google Brain. She is a regular reviewer for NeurIPS, ICML, ACL, EMNLP, and NAACL conferences. She also co-organized Representation Learning on Graphs and Manifolds Workshop at ICLR in 2019 and Women in Machine Learning (WiML) Workshop at NeurIPS in 2022.

Variational Learning is Effective for Large Deep Networks

Thomas Möllenhoff (RIKEN Center for Advanced Intelligence Project)

Abstract: In this talk, I present extensive evidence against the common belief that variational Bayesian learning is ineffective for large neural networks. First, I show that a recent deep learning method called sharpness-aware minimization (SAM) solves an optimal convex relaxation of the variational Bayesian objective. Then, I demonstrate that a direct optimization of the variational objective with an Improved Variational Online Newton method (IVON) can consistently match or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON’s computational costs are nearly identical to Adam but its predictive uncertainty is better. The talk concludes with several new use cases of variational learning where we improve fine-tuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data.

About the presenter: Thomas Möllenhoff received his PhD in Informatics from the Technical University of Munich in 2020. From 2020 to 2023, he was a post-doc in the Approximate Bayesian Inference Team at RIKEN. Since 2023 he works at RIKEN as a tenured research scientist. His research focuses on optimization and Bayesian deep learning and has been awarded several times, including the Best Paper Honorable Mention award at CVPR 2016 and a first-place at the NeurIPS 2021 Challenge on Approximate Inference.

Exploiting Properties of the Gaussian Likelihood for Label Noise Robustness in Classification

Erik Englesson (KTH Royal Institute of Technology)

Abstract: A natural way of estimating heteroscedastic label noise in regression is to model the observed (potentially noisy) target as a sample from a normal distribution, whose parameters can be learned by minimizing the negative log-likelihood. This formulation has desirable loss attenuation properties, as it reduces the contribution of high-error examples. Intuitively, this behaviour can improve robustness against label noise by reducing overfitting. We propose an extension of this simple and probabilistic approach to classification, that has the same desirable loss attenuation properties. We evaluate the effectiveness of the method by measuring its robustness against label noise in classification. In follow-up work, we improve the method’s robustness by modelling and estimating a shift (non-zero mean) in the Gaussian noise distribution, which we show makes it possible for the method to correct noisy labels.

About the presenter: Erik is a postdoc at the Division of Robotics, Perception and Learning at KTH, supervised by Hossein Azizpour. His research interests are related to robustness and uncertainty in deep learning. Erik likes to bring time-tested ideas from fields such as statistics and information theory to deep learning. His PhD thesis was about robustness to label noise, which arises from aleatoric uncertainty in the data generation process. In the near future, Erik’s plans to connect aleatoric uncertainty, label noise, and epistemic uncertainty, and also bring ideas from Gaussian processes to deep learning.

Generative Models for Biomolecular Prediction, Dynamics, and Design

Hannes Stark & Bowen Jing (MIT Computer Science and Artificial Intelligence Laboratory)

Abstract: We lay out three avenues in which we think generative models are especially valuable for modeling biomolecules. 1) Hard prediction tasks can be better addressed with generative models that can suggest and rank multiple solutions (e.g., docking). 2) The dynamics and conformations of biomolecules can be captured with generative models (e.g., protein conformational ensembles and MD trajectories). 3) Designing new biomolecules can be accelerated, informed by samples or likelihoods from generative models (e.g., protein binder or regulatory DNA design).

About the presenter: Bowen and Hannes are 4th and 3rd year PhD students at MIT working with Bonnie Berger, Tommi Jaakkola, and Regina Barzilay. Their interests center around using generative models for biomolecular applications ranging from protein engineering to molecular dynamics.

On Conditional Diffusion Models for PDE Simulations

Aliaksandra Shysheya (University of Cambridge)

Abstract: Modeling partial differential equations (PDEs) is of crucial importance in science and engineering. Some of the most common tasks include 1) forecasting, where the aim is to predict future states based on an initial one, as well as 2) inverse problems, such as data assimilation (DA), with the goal of reconstructing an aspect of the PDE (i.e. coefficient, initial condition, full trajectory, etc.) given some partial observations of the solution to the PDE. However, most previous numerical and machine learning approaches that target forecasting cannot be applied out-of-the-box for data assimilation. Recently, diffusion models have emerged as a powerful tool for conditional generation, being able to flexibly incorporate observations without retraining. In this talk, I will discuss our recent work in this domain, where we perform a comparative study of score-based diffusion models for forecasting and assimilation of sparse observations. In particular, we focus on diffusion models that are either presented with the conditional information during training, or conditioned after unconditional training. We address the shortcomings of previous work and develop methods that are able to successfully tackle the combination of forecasting and data assimilation, a task commonly encountered in real-world scenarios such as weather modeling.

About the presenter: Sasha is a 3rd year PhD student at the Computational and Biological Learning lab of the University of Cambridge, UK, supervised by Prof Richard E Turner. Her main research interests are in generative modeling and data-efficient machine learning.

Software and MCMC methods for sampling from complex distributions

Nikola Surjanovic (University of British Columbia)

Abstract: We introduce a software package, Pigeons.jl, that provides a way to leverage distributed computation to obtain samples from complicated probability distributions, such as multimodal posteriors arising in Bayesian inference and high-dimensional distributions in statistical mechanics. Pigeons.jl provides simple APIs to perform such computations single-threaded, multi-threaded, and/or distributed over thousands of MPI-communicating machines. Our software provides several Markov kernels, including our newly proposed algorithm, autoMALA. This MCMC algorithm, based on the Metropolis-adjusted Langevin algorithm, automatically sets its step size at each iteration based on the local geometry of the target distribution. Our experiments demonstrate that autoMALA is competitive with related state-of the-art MCMC methods, in terms of the number of log density evaluations per effective sample, and it outperforms state-of-the-art samplers on targets with varying geometries.

About the presenter: Nikola Surjanovic is a Vanier Scholar pursuing a PhD in Statistics at the University of British Columbia under the supervision of Dr. Alexandre Bouchard-Côté and Dr. Trevor Campbell. His research interests include scalable Bayesian inference and machine learning. Nikola is also a core contributor and founding member of the Pigeons software project for distributed sampling from difficult distributions and for solving computational Lebesgue integration problems.

Robust probabilistic circuits for efficient and reliable predictions

Fabrizio Ventola (TU Darmstadt)

Abstract: Probabilistic circuits are prominent tractable probabilistic models which can provide exact answers to a wide range of probabilistic queries in a tractable way. Given their sparse nature and the structural constraints enabling exact inference, it is challenging to induce these models in high-dimensional real-world domains such as time series and raw images. In this talk, I show how we can leverage spectral modeling and the clear probabilistic semantics of probabilistic circuits to learn models able to provide efficient and reliable predictions in these challenging domains, and how to make these particular models more robust to distribution shift and out-of-distribution data.

About the presenter: Fabrizio Ventola is concluding his PhD at the AI & ML Lab of TU Darmstadt (Germany), advised by Kristian Kersting. His research focuses on deep tractable probabilistic models and how to enable them to efficiently provide insights on big data collections, such as performing accurate and reliable predictions in the presence of noise and missing data. He co-organized the last three editions of the workshop on Tractable Probabilistic Modeling (TPM), which were co-located with UAI, and a series of seminars mainly centred on AutoML and intelligent systems for data management.

Physics-inspired Deep Representation Learning

Yogesh Verma (Aalto University)

Abstract: The past decade has witnessed a remarkable surge in deep learning applications, spanning from recommender systems to fundamental sciences like biology and chemistry. However, when applied to scientific domains, deep learning methods lack effective inductive biases, leading to suboptimal performance and limited generalizability. By bridging the gap between deep learning and physical inductive biases, we aim to develop robust, interpretable, and scientifically grounded AI methods. In this talk, I will present our recent works on developing physics-inspired deep learning approaches in AI4Science, such as modeling weather and climate, generating macromolecules, and designing topological GNNs for drug discovery.

About the presenter: Yogesh Verma is a doctoral researcher at Aalto University, supervised by Vikas Garg and Markus Heinonen. His research interests include developing physics-inspired deep learning methods for modeling dynamical systems, (geometric and topological) deep learning, generative modeling and robust bayesian inference.

Scalable Approximate Bayesian Methods for Uncertainty Quantification in DNNs (Tentative)

Olivier Laurent (University of Paris-Saclay)

Abstract: TBA

Turing.jl: a general-purpose probabilistic programming language (Tentative)

Tor Fjelde (University of Cambridge)

Abstract: TBA