This seminar series aims to provide a platform for young researchers (PhD student or post-doc level) to give invited talks about their research, intending to have a diverse set of talks & speakers on topics related to probabilistic machine learning.

The presentations are virtual talks (~30 min + 15 min Q&A) given live over Zoom. To stimulate discussion and allow for the presentation of ongoing work, the talks will not be recorded unless the speaker separately requests for it. The schedule is in Eastern European Standard Time, click here for a conversion.

Seminar organizers Martin Trapp and Arno Solin

Combining Pseudo-Point and State Space Approximations for Sum-Separable Gaussian Processes

Will Tebbutt (University of Cambridge)

Abstract: State space approximations and pseudo point approximations can be combined in a principled manner to yield scalable approximate inference algorithms for sums of separable Gaussian processes. In this talk, I will: 1. show how this combination can be performed for variational pseudo point approximations via a simple conditional independence result, 2. discuss how existing exact inference algorithms for state space models can be re-purposed for approximate inference, 3. interpret existing related work in light of our work, and 4. briefly discuss some experimental results in a spatio-temporal context. For more info, please see our recent UAI paper.

About the presenter: Will is a PhD student with Rich Turner in the Machine Learning Group at Cambridge, and is interested in probabilistic modelling in general. He is particularly interested in Gaussian processes: how to specify and scale them in large spatio-temporal settings, how best to write software to work with them, and challenges faced in climate science for which they might be helpful.

Automated Augmented Conjugate Inference for Gaussian Processes

Théo Galy-Fajou (TU Berlin)

Abstract: Gaussian Processes are a tool of choice for modelling functions with uncertainties. However, the inference is only tractable analytically for the classical case of regression with Gaussian noise since all other likelihoods are not conjugate with the Gaussian prior. In this talk, I will show how one can transform a large class of likelihoods into conditional conjugate distributions by augmenting them with latent variables. These augmented models have the advantage that, while the posterior inference is still not fully analytic, the full conditionals are! Consequently, one can work easily (and efficiently!) with algorithms like Gibbs sampling or Coordinate Ascent VI (CAVI) and outperform existing inference methods.

About the presenter: Théo Galy-Fajou is a PhD candidate at TU Berlin under the supervision of Prof. Manfred Opper. His work focuses specifically on Gaussian processes and ways to scale them easily to more data and more complex models. He also has a general interest in all approximate Bayesian inference techniques. He is heavily involved in open-source development for inference and visualization techniques of Bayesian methods in the Julia programming language.

Finite Mixture Models Do Not Reliably Learn the Number of Components

Diana Cai (Princeton University)

Abstract: Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. A common suggestion is to use a finite mixture model (FMM) with a prior on the number of components. Past work has shown the resulting FMM component-count posterior is consistent; that is, the posterior concentrates on the true, generating number of components. But consistency requires the assumption that the component likelihoods are perfectly specified, which is unrealistic in practice. In this paper, we add rigour to data-analysis folk wisdom by proving that under even the slightest model misspecification, the FMM component-count posterior diverges: the posterior probability of any particular finite number of components converges to 0 in the limit of infinite data. Contrary to intuition, posterior-density consistency is not sufficient to establish this result. We develop novel sufficient conditions that are more realistic and easily checkable than those common in the asymptotics literature. We illustrate the practical consequences of our theory on simulated and real data.

About the presenter: Diana Cai is a Ph.D. candidate in computer science at Princeton University and is advised by Ryan Adams and Barbara Engelhardt. Her research spans the areas of machine learning and statistics and focuses on developing robust and scalable methods for probabilistic modelling and inference, with an emphasis on flexible, interpretable, and nonparametric machine learning methods. Previously, Diana obtained an A.B. in computer science and statistics from Harvard University, an M.S. in statistics from the University of Chicago, and an M.A. in computer science from Princeton University. Her research is supported in part by a Google PhD Fellowship in Machine Learning.

Causal Decision-making Meets Gaussian Processes

Virginia Aglietti (University of Warwick / DeepMind)

Abstract: Solving decision-making problems in a variety of domains such as healthcare or operations research requires experimentation. By performing interventions, one can understand how a system behaves when an action is taken and thus infer the cause-effect relationships of a phenomenon. Experiments are usually expensive, time-consuming, and may present ethical issues. Therefore, researchers generally have to trade-off cost, time, and other practical considerations to decide which experiments to conduct in order to learn about a system. In this talk, I will present two methodologies that, by linking causal inference, experimental design and Gaussian process (GP) modelling, allow to efficiently learn the causal effects in a graph and identify the optimal intervention to perform. Firstly, I will show how to construct a multi-task causal GP model, the DAG-GP model, which captures the non-trivial correlation structure across different experimental outputs. By sharing experimental information, the DAG-GP model accurately estimates the causal effects in a variety of experimental settings while enabling proper uncertainty quantification. I will then demonstrate how this model, and more generally GP models, can be used within decision-making algorithm to choose experiments to perform. Particularly, I will introduce the Causal Bayesian Optimization algorithm, and I will show how incorporating the knowledge of the causal graph in Bayesian Optimization improves the ability to reason about optimal decision making while decreasing the optimization cost and avoiding suboptimal solutions.

About the presenter: Virginia is a final year PhD student in Statistics at the University of Warwick and a visiting researcher at The Alan Turing Institute. She is supervised by Dr. Theodoros Damoulas. In September, Virginia will join DeepMind as a Research Scientist to work on causal probabilistic models. Virginia is interested in linking probabilistic models, specifically Gaussian processes, and causality to develop algorithms for causal decision making under uncertainty. She has expertise in working with Gaussian processes and variational inference, with a particular focus on models for point processes. In terms of applications, she is particularly interested in spatio-temporal problems in the social sciences.

Equivariant Probabilistic Generative Modelling

Priyank Jaini (University of Amsterdam)

Abstract: In this talk, I will discuss recent work on developing generative models for efficient sampling and inference by incorporating inductive biases in the form of equivariances. I will begin by introducing the Equivariant Stein Variational Gradient Descent algorithm – an equivariant sampling method based on Stein’s identity for sampling from densities with symmetries. Equivariant SVGD explicitly incorporates symmetry information in a density through equivariant kernels, which makes the resultant sampler efficient both in terms of sample complexity and the quality of generated samples. Subsequently, I will demonstrate the use of Equivariant SVGD by defining equivariant energy-based models to model invariant densities that are learned using contrastive divergence. I will then discuss the applications of these equivariant energy models for modelling joint densities in regression and classification tasks for image datasets, many-body particle systems and molecular structure generation. Finally, if time permits, I will touch on methods for sampling using diffusion models and neural transport augmented Monte Carlo methods for more efficient sampling in discrete spaces with applications in denoising, Bayesian posterior sampling, and training light-weight Bayesian quantised neural nets.

About the presenter: Priyank Jaini is a postdoctoral researcher at the University of Amsterdam and Bosch-Delta Lab, working with Prof. Max Welling. Before that, he completed his PhD at the University of Waterloo under the supervision of Prof. Pascal Poupart and Prof. Yaoliang Yu, where he received the doctoral dissertation award from the faculty of Math for his PhD thesis. His research interests lie in building tractable probabilistic models for reasoning under uncertainty. Recently, he has been interested in incorporating inductive biases in the form of symmetries through equivariances in probabilistic modelling and applying them to downstream tasks like molecular generation and modelling many-body particle systems.

Tractable Probabilistic Reasoning for Trustworthy AI

YooJung Choi (UCLA)

Abstract: Automated decision-making systems are increasingly being deployed in areas with personal and societal impacts: from personalized ads to medical diagnosis and criminal justice. This led to growing interest and need for trustworthy AI and ML systems - that is, models that are robust, explainable, fair, and so on. It is important to note that these guarantees only hold with respect to a certain model of the world, with inherent uncertainties. In this talk, I will present how probabilistic modelling and inference, by incorporating a distribution, offer a principled way to handle different kinds of uncertainties when reasoning about decision-making system behaviours. For example, labels in training data may be biased; I will show that probabilistic circuits, a class of tractable probabilistic models (TPMs), can be effective in enforcing and auditing fairness properties by explicitly modelling a latent unbiased label. Another common source of uncertainty is missing values at prediction time, which also leads to fairness and robustness queries that account for this to be computationally hard inference tasks. I will also demonstrate how TPMs can again tractably answer these complex queries.

About the presenter: YooJung Choi is a Ph.D. candidate in the Computer Science Department at UCLA, advised by Prof. Guy Van den Broeck. Her research is broadly in the areas of artificial intelligence and machine learning, with a focus on probabilistic modelling and inference for automated decision-making. In particular, she is interested in developing algorithms for tractable inference of complex queries and characterizing the boundaries of tractable inference. Her work also focuses on applying these results to address fairness, robustness, explainability, and in general, aim towards trustworthy AI/ML.

Probabilistic Inference, Message Passing, and Hybrid Models

Christian Knoll (TU Graz)

Abstract: Probabilistic graphical models are flexible models for representing complex high-dimensional distributions and for incorporating domain knowledge in an intuitive and expressive way. For making probabilistic inference, one often relies on recursive message passing methods. While these methods are efficient for restricted model classes (e.g., for trees), they only serve as approximation methods for more complex models. In this talk, I will show how we can enhance the performance of message passing methods from two opposing angles: i.e., by simplifying the model itself with the utilized inference method in mind and by modifying the inference method with the underlying model in mind. Therefore, I will show how we can advance our understanding of message passing methods by considering them as a dynamic system and by applying tools from system theory. These insights will then suggest various improvements. Recently, we have also complemented message passing methods by neural networks. I will discuss how such hybrid models benefit from the flexibility of neural networks in combination with the implicit underlying model assumptions.

About the presenter: Christian Knoll is a postdoc researcher at the signal processing and speech communications laboratory at the Graz University of Technology. His research interests include machine learning, statistical signal processing, and probabilistic graphical models. He is particularly interested in understanding and improving message passing methods for performing probabilistic inference in graphical models. Recently, he has also been interested in combining graphical models with neural networks.

Score-based Generative Modeling and the Diffusion Schrödinger Bridge

James Thornton (University of Oxford)

Abstract: Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schrödinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013). Joint work with Valentin De Bortoli, Jeremy Heng and Arnaud Doucet.

About the presenter: James is a PhD student at the University of Oxford, supervised by George Deligiannidis and Arnaud Doucet. His research interests are at the intersection of Optimal Transport, sampling methods and machine learning.