Abstract : Lifting high-dimensional problems to an infinite-dimensional space and designing algorithms in that setting has been a fruitful idea in many areas of applied mathematics, including inverse problems, optimisation, and partial differential equations. This approach is sometimes referred to as ''optimise-then-discretise'' and allows one to develop algorithms that are inherently dimension- and discretisation-independent and typically perform better in high-dimensional problems. In the context of machine learning, this approach has gained significant attention in the context of operator learning. This workshop explores approaches that involve the approximation of functions with values in an infinite-dimensional space and their connections to partial differential equations.
Organizer(s) : Bamdad Hosseini, Yury Korolev, Jonas Latz
[02760] Approximation by structured deep neural networks
Format : Talk at Waseda University
Author(s) :
Dingxuan Zhou (University of Sydney)
Abstract : Deep learning based on deep neural networks possessing network architectures has been powerful in practical applications but is less understood theoretically. Structured neural networks are particularly difficult to analyze. An important family of structured neural networks is deep convolutional neural networks possessing convolutional structures. The convolutional architecture is key for the computational efficiency but raises scientific challenges. We describe a mathematical theory of approximating and learning functions or operators by structured deep neural networks.
[03794] Learning High-Dimensional Banach-Valued Functions from Limited Data with Deep Neural Networks
Format : Talk at Waseda University
Author(s) :
Nick Dexter (Florida State University)
Ben Adcock (Simon Fraser University)
Sebastian Moraga (Simon Fraser University)
Simone Brugiapaglia (Concordia University)
Abstract : Reconstructing high-dimensional functions from few samples is important for uncertainty quantification in computational science. Deep learning has achieved impressive results in parameterized PDE problems with solutions in Hilbert or Banach spaces. This work proposes a novel algorithmic approach using DL, compressed sensing, orthogonal polynomials, and finite elements to approximate smooth functions in infinite-dimensional Banach spaces. Theoretical analysis provides explicit guarantees on error and sample complexity, and numerical experiments demonstrate accurate approximations on challenging benchmark problems.
[04519] Kernel methods for learning operators between infinite dimensional Banach spaces
Format : Talk at Waseda University
Author(s) :
Pau Batlle (California Institute of Technology)
Matthieu Darcy (California Institute of Technology )
Houman Owhadi (California Institute of Technology)
Bamdad Hosseini (University of Washington)
Abstract : We introduce a kernel-based framework for learning operators between Banach spaces. We show that even with simple kernels, our approach is competitive in terms of cost-accuracy trade-off and either matches or beats the performance of NN methods on a majority of PDE-based benchmarks. Additionally, our framework offers several advantages inherited from kernel methods: simplicity, interpretability, convergence guarantees, a priori error estimates, and Bayesian UQ. It is therefore a natural benchmark for operator learning problems.
[03223] A duality framework for generalization analysis of random feature models and two-layer neural networks
Format : Talk at Waseda University
Author(s) :
Hongrui Chen (Peking University)
Jihao Long (Princeton University)
Lei Wu (Peking University)
Abstract : We consider the problem of learning functions in the $\mathcal{F}_{p,\pi}$ and Barron spaces, which are natural function spaces that arise in the high-dimensional analysis of random feature models (RFMs) and two-layer neural networks. Through a duality analysis, we reveal that the approximation and estimation of these spaces can be considered equivalent in a certain sense. This enables us to focus on the easier problem of approximation and estimation when studying the generalization of both models. The dual equivalence is established by defining an information-based complexity that can effectively control estimation errors. Additionally, we demonstrate the flexibility of our duality framework through comprehensive analyses of two concrete applications.
The first application is to study learning functions in $\mathcal{F}_{p,\pi}$ with RFMs. We prove that the learning does not suffer from the curse of dimensionality as long as $p>1$, implying RFMs can work beyond the kernel regime. Our analysis extends existing results (Celentano et al., 2021) to the noisy case and removes the requirement of overparameterization.
The second application is to investigate the learnability of reproducing kernel Hilbert space (RKHS) under the {\em uniform metric}. We derive both lower and upper bounds of the minimax estimation error by using the spectrum of the associated kernel. We then apply these bounds to dot-product kernels and analyze how they scale with the input dimension. Our results suggest that learning with ReLU (random) features is generally intractable in terms of reaching high uniform accuracy.
Abstract : Approximation properties of infinitely wide neural networks have been studied by several authors in the last few years. New function spaces have been introduced that consist of functions that can be efficiently (i.e., with dimension-independent rates) approximated by neural networks of finite width, e.g. Barron spaces fornetworks with a single hidden layer. Typically, these functions act between Euclidean spaces, typically with a high-dimensional input space and a lower-dimensional output space. As neural networks gain popularity in inherently infinite-dimensional settings such as inverse problems and imaging, it becomes necessary to analyse the properties of neural networks as nonlinear operators acting between infinite-dimensional spaces. In this talk, I will discuss a generalisation of Barron spaces to functions that map between Banach spaces and present Monte-Carlo (1/sqrt(n)) approximation rates.
[04824] Analysis of Neural Networks : Blessings of Width, Curses of Depth
Format : Online Talk on Zoom
Author(s) :
Lénaïc Chizat (EPFL)
Abstract : I will present several results around the large-width limit of gradient-based algorithms for artificial neural networks (NNs). After a review of two-layer NNs, I will discuss the case of deep NNs where the non-linear dynamics that arises turns out much less tractable, because of how the random matrices of the initialization interact. I will finally elaborate on the case of deep linear NNs where we have obtained a complete description of the dynamics.
[02466] Mirror Descent with Relative Smoothness in Measure Spaces, with application to Sinkhorn and EM
Abstract : Many problems in machine learning can be formulated as optimizing a convex
functional over a vector space of measures. This paper studies the convergence of
the mirror descent algorithm in this infinite-dimensional setting. Defining Bregman divergences through directional derivatives, we derive the convergence of the
scheme for relatively smooth and convex pairs of functionals. Such assumptions
allow to handle non-smooth functionals such as the Kullback–Leibler (KL) divergence. Applying our result to joint distributions and KL, we show that Sinkhorn’s
primal iterations for entropic optimal transport in the continuous setting correspond to a mirror descent, and we obtain a new proof of its (sub)linear convergence. We also show that Expectation Maximization (EM) can always formally
be written as a mirror descent. When optimizing only on the latent distribution
while fixing the mixtures parameters – which corresponds to the Richardson–Lucy
deconvolution scheme in signal processing – we derive sublinear rates of convergence.
[05291] Covariance-Modulated Optimal Transport Geometry
Format : Talk at Waseda University
Author(s) :
Franca Hoffmann (California Institute of Technology)
André Schlichting (University of Münster)
Martin Burger (DESY and University of Hamburg)
Daniel Matthes (Technische Universitaet Muenchen)
Matthias Erbar (Universitaet Bielefeld)
Abstract : We present a variant of the dynamical optimal transport problem in which the energy to be minimised is modulated by the covariance matrix of the current distribution. Such transport metrics arise naturally in mean-field limits of certain ensemble Kalman methods for solving inverse problems. We show that the transport problem splits into two coupled minimization problems up to degrees of freedom given by rotations: one for the evolution of mean and covariance of the interpolating curve, and one for its shape. Similarly, on the level of the gradient flows a similar splitting into the evolution of moments and shapes of the distribution can be observed. Those show better convergence properties in comparison to the classical Wasserstein metric in terms of exponential convergence rates independent of the Gaussian target.
[04018] Learning PDE operators with neural networks
Format : Talk at Waseda University
Author(s) :
Elizabeth Qian (Georgia Institute of Technology)
Abstract : The term ‘surrogate modeling’ in computational science and engineering refers to the development of computationally efficient approximations for expensive simulations such as those arising from numerical solution of partial differential equations (PDEs). Surrogate modeling is an enabling methodology for many-query computations in science and engineering which include iterative methods in optimization and sampling methods in uncertainty quantification. Over the last few years several approaches to surrogate modeling for PDEs using neural networks have emerged motivated by successes in using neural networks to approximate nonlinear maps in other areas. In principle the relative merits of these different approaches can be evaluated by understanding for each one the cost required to achieve a given level of accuracy. However the absence of a complete theory of approximation error for these approaches makes it difficult to assess this cost-accuracy trade-off. In this talk we provide a careful numerical study of this issue comparing a variety of different neural network architectures for operator approximation across a range of problems arising from PDE models in continuum mechanics.
[04864] Learning solution operators for PDEs with uncertainty
Format : Talk at Waseda University
Author(s) :
Emilia Magnani (University of Tübingen)
Nicholas Krämer (University of Tübingen)
Runa Eschenhagen (University of Cambridge)
Philipp Hennig (University of Tübingen)
Lorenzo Rosasco (University of Genova & MIT)
Abstract : We provide a Bayesian formulation of the problem of learning solution operators of PDEs in the formalism of Gaussian processes. We extend this treatment to recent deep architectures (neural operators) that have shown promising results to tackle this task. We provide them with uncertainty estimates through methods from Bayesian deep learning. Finally, we consider particular types of operators (convolutions) and investigate the process of learning these in the context of functional regression in inverse problems.
[04774] Unsupervised Learning of the Total Variation Flow
Format : Talk at Waseda University
Author(s) :
Tamara G Grossmann (University of Cambridge)
Sören Dittmer (University of Cambridge)
Yury Korolev (University of Bath)
Carola-Bibiane Schönlieb (University of Cambridge)
Abstract : The total variation (TV) flow generates a scale-space representation of an image based on the TV functional. This gradientflow observes desirable features for images and enables texture analysis. The standard numerical approach requires solving multiple non-smooth optimisation problems, this is often prohibitively expensive. We propose the TVflowNET, a neural network approach to compute the solution. We significantly speed up the computation time and show that the TVflowNET approximates the TV flow solution with high fidelity.
[03051] On solving/learning nonlinear PDEs with GPs
Format : Talk at Waseda University
Author(s) :
Houman Owhadi (California Institute of Technology)
Abstract : We present a simple, rigorous, and unified framework for solving and learning arbitrary nonlinear PDEs with GPs. The proposed approach inherits the error bounds of kernel interpolation methods and the near-linear complexity of linear solvers for dense kernel matrices. Its generalization to high-dimensional PDEs comes with error bounds exhibiting a tradeoff between dimensionality and regularity. Parts of this talk are joint work with Pau Batlle Franch, Yifan Chen, Bamdad Hosseini, Florian Schäfer, and Andrew Stuart.