Abstract : Particle methods have become a popular method for sampling, optimization, inversion, and filtering.
Moreover gradient-free versions of particle systems may implicitly carry gradient information that can be leveraged for better performance. In this minisymposium we bring together experts in
Ensemble Kalman filtering, Ensemble Kalman inversion, and Ensemble Kalman sampling as well as other
particle-based algorithms like Consensus-based optimization, Affine invariant Langevin dynamics,
Stein Variational Gradient Descent and collect recent advances based on homotopy approaches, ensemble enrichment, Wasserstein gradient flows and discrepancy flows.
Organizer(s) : Robert Gruhlke, Johannes Hertrich, David Sommer, Philipp Wacker
[05135] Improving Ensemble Kalman Filter performance by adaptively controlling the ensemble
Format : Talk at Waseda University
Author(s) :
Ruben Harris (FU Berlin)
Claudia Schillings (FU Berlin)
Abstract : Efficient strategies to improve the performance of the Ensemble Kalman Inversion by adaptively controlling the ensemble.
Due to their low computational costs and straightforward implementation, filtering methods such as the Ensemble Kalman Filter have become very popular for inverse problems over the last few years. They have been demonstrated to work well even for highly nonlinear, complex models. We discuss variants of the Ensemble Kalman Inversion (EKI) aiming to improve the accuracy of the estimate by adaptively choosing the particles in the ensemble.
[03030] Ensemble-based gradient inference for particle methods in optimization and sampling
Format : Talk at Waseda University
Author(s) :
Philipp Wacker (University of Canterbury)
Claudia Schillings (FU Berlin)
Claudia Totzeck (University of Wuppertal)
Abstract : We discuss how some ensemble-based methods for optimization and sampling can be augmented by inexact estimates of the gradient of a potential function, approximated in a straightforward way from pointwise evaluation of the potential in the ensemble. This approximated gradient can be inserted in place of an exact gradient in the context of sampling methods derived from Langevin dynamics, and it can be used as an additional term in global optimization methods like Consensus-based optimization
[05138] Less interaction with forward models in Langevin dynamics: Enrichment and Homotopy
Format : Talk at Waseda University
Author(s) :
Robert Gruhlke (FU Berlin)
Martin Eigel (WIAS Berlin)
David Sommer (WIAS Berlin)
Abstract : Ensemble methods like EKS and ALDI are widely used for Bayesian inference problems but suffer from a large number of forward calls and possible lack of convergence for multimodal distributions. We propose adaptive ensemble enrichment strategies to reduce the total number of forward calls. The method is extended for more complex distributions using adapted Langevin dynamics based on a homotopy formalism. Numerical investigations on benchmark problems demonstrate the method's advantages over state-of-the-art Langevin samplers.
Abstract : In recent years, various interacting particle samplers have been developed to sample from complex target distributions, such as those found in Bayesian inverse problems. These samplers are motivated by the mean-field limit perspective and implemented as ensembles of particles that move in the product state space according to coupled stochastic differential equations. The ensemble approximation and numerical time stepping used to simulate these systems can introduce bias and affect the invariance of the particle system with respect to the target distribution. To correct for this, we investigate the use of a Metropolization step, similar to the Metropolis-adjusted Langevin algorithm. We examine both ensemble- and particle-wise Metropolization and prove basic convergence of the resulting ensemble Markov chain to the target distribution. Our results demonstrate the benefits of this correction in numerical examples for popular interacting particle samplers such as ALDI, CBS, and stochastic SVGD.
[05149] Computing log-densities of time-reversed diffusion processes through Hamilton-Jacobi-Bellman equations
Format : Talk at Waseda University
Author(s) :
David Sommer
Robert Gruhlke (FU Berlin)
Martin Eigel (WIAS Berlin)
Abstract : Sampling from densities is a common challenge in uncertainty quantification. Langevin dynamics are a popular tool for this task but rely on certain properties of the log-density. To assimilate a larger class of distributions, a time-inhomogeneous drift term can be defined using intermediate log-densities. We propose learning these log-densities by propagation of the target distribution through an Ornstein-Uhlenbeck process, solving the associated Hamilton-Jabobi-Bellman equation using an implicit scheme and compressed polynomials for spatial discretization.
[04723] Simulation of Wasserstein gradient flows with low-rank tensor methods for sampling
Format : Talk at Waseda University
Author(s) :
Vitalii Aksenov (Weierstrass Institute for Applied Analysis and Stochastics)
Martin Eigel (Weierstrass Institute for Applied Analysis and Stochastics)
Abstract : We try to adapt the Eulerian methods for Wasserstein gradient flows for high-dimensional problems such as Bayesian inversion, importance sampling and generative modelling by utilizing low-rank tensor methods. The normalized density is approximated in tractable format, which allows additional application to density estimation and rare event detection. An ODE governing the evolution of samples can be defined with help of intermediate density and flux variables, linking the approach to particle methods.
[05014] Overparameterization of Deep ResNet: Zero Loss and Mean-Field Analysis
Format : Talk at Waseda University
Author(s) :
Zhiyan Ding (University of California, Berkeley)
Qin Li (University of Wisconsin, Madison)
Shi Chen (University of Wisconsin, Madison)
Stephen Wright (University of Wisconsin, Madison)
Abstract : In this talk, I will mainly focus on using mean-field analysis to analyze the overparameterization of neural networks. Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with the perfect fit (zero-loss) in many practical situations. In this talk, I will investigate this phenomenon in the case of Residual Neural Networks (ResNet) with smooth activation functions in a limiting regime in which both the number of layers (depth) and the number of weights in each layer (width) go to infinity. First, I will rigorously show that the gradient descent for parameter training becomes a gradient flow for a probability distribution that is characterized by a partial differential equation (PDE) in the large-NN limit. Next, I will introduce the conditions that make sure the solution to the PDE converges in the training time to a zero-loss solution. Together, these results suggest that the training of the ResNet gives a near-zero loss if the ResNet is large enough.
[04766] Wasserstein Steepest Descent Flows for Discrepancy Flows with Riesz Kernels
Format : Talk at Waseda University
Author(s) :
Johannes Hertrich (TU Berlin)
Robert Beinert (TU Berlin)
Gabriele Steidl (TU Berlin)
Abstract : We introduce Wasserstein steepest descent flows based on the geometric Wasserstein tangent space. These are locally absolutely continuous curves in the Wasserstein space whose tangent vectors point into a steepest descent direction of a given functional. This allows the use of Euler forward schemes instead of minimizing movement schemes introduced by Jordan, Kinderlehrer and Otto. Under certain assumptions, we show that there exists a unique Wasserstein steepest descent flow which coincides with the Wasserstein gradient flow. For the special example of interaction energies with non-smooth Riesz kernels, we derive analytic formulas for the corresponding Wasserstein steepest descent flows.
[04729] Neural Wasserstein Gradient Flows for Discrepancies with Riesz Kernels
Format : Talk at Waseda University
Author(s) :
Fabian Altekrüger (HU Berlin/ TU Berlin)
Johannes Hertrich (TU Berlin)
Gabriele Steidl (TU Berlin)
Abstract : Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure as singular measures can become absolutely continuous ones and conversely. We propose to approximate the backward scheme of Jordan, Kinderlehrer and Otto for computing such Wasserstein gradient flows as well as a forward scheme for so-called Wasserstein steepest descent flows by neural networks (NNs). Since we cannot restrict ourselves to absolutely continuous measures, we have to deal with transport plans and velocity plans instead of usual transport maps and velocity fields. Indeed, we approximate the disintegration of both plans by generative
NNs which are learned with respect to appropriate loss functions. For the interaction energy we provide analytic formulas for Wasserstein schemes starting at a Dirac measure. Finally, we illustrate our neural MMD flows by numerical examples.