# Registered Data

## [00088] Machine learning in infinite dimensions

**Session Date & Time**:- 00088 (1/3) : 4C (Aug.24, 13:20-15:00)
- 00088 (2/3) : 4D (Aug.24, 15:30-17:10)
- 00088 (3/3) : 4E (Aug.24, 17:40-19:20)

**Type**: Proposal of Minisymposium**Abstract**: Lifting high-dimensional problems to an infinite-dimensional space and designing algorithms in that setting has been a fruitful idea in many areas of applied mathematics, including inverse problems, optimisation, and partial differential equations. This approach is sometimes referred to as ''optimise-then-discretise'' and allows one to develop algorithms that are inherently dimension- and discretisation-independent and typically perform better in high-dimensional problems. In the context of machine learning, this approach has gained significant attention in the context of operator learning. This workshop explores approaches that involve the approximation of functions with values in an infinite-dimensional space and their connections to partial differential equations.**Organizer(s)**: Bamdad Hosseini, Yury Korolev, Jonas Latz**Classification**:__46E40__,__68T07__,__62H30__,__Operator learning, representer theorems, PDE learning, vector-valued neural networks__**Speakers Info**:- Houman Owhadi (California Institute of Technology)
- Elizabeth Qian (Georgia Institute of Technology)
- Ding-Xuan Zhou (University of Sydney)
- Lei Wu (Peking University)
- Nick Dexter (Florida State University)
- Pau Batlle Franch (California Institute of Technology)
- Emilia Magnani (University of Tübingen)
- Lénaïc Chizat (École polytechnique fédérale de Lausanne)
- Tamara Grossmann (University of Cambridge)
- Lu Lu (University of Pennsylvania)
- Anna Korba (École nationale de la statistique et de l'administration économique Paris)
- Franca Hoffmann (University of Bonn)

**Talks in Minisymposium**:**[02466] Mirror Descent with Relative Smoothness in Measure Spaces, with application to Sinkhorn and EM****Author(s)**:**Anna Korba**(ENSAE/CREST)- Pierre-Cyril Aubin-Frankowski (INRIA/Ecole Normale Supérieure)
- Flavien Léger (INRIA)

**Abstract**: Many problems in machine learning can be formulated as optimizing a convex functional over a vector space of measures. This paper studies the convergence of the mirror descent algorithm in this infinite-dimensional setting. Defining Bregman divergences through directional derivatives, we derive the convergence of the scheme for relatively smooth and convex pairs of functionals. Such assumptions allow to handle non-smooth functionals such as the Kullback–Leibler (KL) divergence. Applying our result to joint distributions and KL, we show that Sinkhorn’s primal iterations for entropic optimal transport in the continuous setting correspond to a mirror descent, and we obtain a new proof of its (sub)linear convergence. We also show that Expectation Maximization (EM) can always formally be written as a mirror descent. When optimizing only on the latent distribution while fixing the mixtures parameters – which corresponds to the Richardson–Lucy deconvolution scheme in signal processing – we derive sublinear rates of convergence.

**[02760] Approximation by structured deep neural networks****Author(s)**:**Dingxuan Zhou**(University of Sydney)

**Abstract**: Deep learning based on deep neural networks possessing network architectures has been powerful in practical applications but is less understood theoretically. Structured neural networks are particularly difficult to analyze. An important family of structured neural networks is deep convolutional neural networks possessing convolutional structures. The convolutional architecture is key for the computational efficiency but raises scientific challenges. We describe a mathematical theory of approximating and learning functions or operators by structured deep neural networks.

**[03051] On solving/learning nonlinear PDEs with GPs****Author(s)**:**Houman Owhadi**(California Institute of Technology)

**Abstract**: We present a simple, rigorous, and unified framework for solving and learning arbitrary nonlinear PDEs with GPs. The proposed approach inherits the error bounds of kernel interpolation methods and the near-linear complexity of linear solvers for dense kernel matrices. Its generalization to high-dimensional PDEs comes with error bounds exhibiting a tradeoff between dimensionality and regularity. Parts of this talk are joint work with Pau Batlle Franch, Yifan Chen, Bamdad Hosseini, Florian Schäfer, and Andrew Stuart.

**[03216] Reliable extrapolation of deep neural operators informed by physics or sparse observations****Author(s)**:**Lu Lu**(University of Pennsylvania)- Min Zhu (University of Pennsylvania)

**Abstract**: Deep neural operators can learn nonlinear mappings between infinite-dimensional function spaces via deep neural networks. As promising surrogate solvers of partial differential equations (PDEs) for real-time prediction, deep neural operators such as deep operator networks (DeepONets) provide a new simulation paradigm in science and engineering. Pure data-driven neural operators and deep learning models, in general, are usually limited to interpolation scenarios, where new predictions utilize inputs within the support of the training set. However, in the inference stage of real-world applications, the input may lie outside the support, i.e., extrapolation is required, which may result to large errors and unavoidable failure of deep learning models. Here, we address this challenge of extrapolation for deep neural operators. First, we systematically investigate the extrapolation behavior of DeepONets by quantifying the extrapolation complexity via the 2-Wasserstein distance between two function spaces and propose a new behavior of bias-variance trade-off for extrapolation with respect to model capacity. Subsequently, we develop a complete workflow, including extrapolation determination, and we propose five reliable learning methods that guarantee a safe prediction under extrapolation by requiring additional information -- the governing PDEs of the system or sparse new observations. The proposed methods are based on either fine-tuning a pre-trained DeepONet or multifidelity learning. We demonstrate the effectiveness of the proposed framework for various types of parametric PDEs. Our systematic comparisons provide practical guidelines for selecting a proper extrapolation method depending on the available information, desired accuracy, and required inference speed.

**[03223] A duality framework for generalization analysis of random feature models and two-layer neural networks****Author(s)**:- Hongrui Chen (Peking University)
- Jihao Long (Princeton University)
**Lei Wu**(Peking University)

**Abstract**: We consider the problem of learning functions in the $\mathcal{F}_{p,\pi}$ and Barron spaces, which are natural function spaces that arise in the high-dimensional analysis of random feature models (RFMs) and two-layer neural networks. Through a duality analysis, we reveal that the approximation and estimation of these spaces can be considered equivalent in a certain sense. This enables us to focus on the easier problem of approximation and estimation when studying the generalization of both models. The dual equivalence is established by defining an information-based complexity that can effectively control estimation errors. Additionally, we demonstrate the flexibility of our duality framework through comprehensive analyses of two concrete applications. The first application is to study learning functions in $\mathcal{F}_{p,\pi}$ with RFMs. We prove that the learning does not suffer from the curse of dimensionality as long as $p>1$, implying RFMs can work beyond the kernel regime. Our analysis extends existing results (Celentano et al., 2021) to the noisy case and removes the requirement of overparameterization. The second application is to investigate the learnability of reproducing kernel Hilbert space (RKHS) under the {\em uniform metric}. We derive both lower and upper bounds of the minimax estimation error by using the spectrum of the associated kernel. We then apply these bounds to dot-product kernels and analyze how they scale with the input dimension. Our results suggest that learning with ReLU (random) features is generally intractable in terms of reaching high uniform accuracy.

**[03794] Learning High-Dimensional Banach-Valued Functions from Limited Data with Deep Neural Networks****Author(s)**:**Nick Dexter**(Florida State University)- Ben Adcock (Simon Fraser University)
- Sebastian Moraga (Simon Fraser University)
- Simone Brugiapaglia (Concordia University)

**Abstract**: Reconstructing high-dimensional functions from few samples is important for uncertainty quantification in computational science. Deep learning has achieved impressive results in parameterized PDE problems with solutions in Hilbert or Banach spaces. This work proposes a novel algorithmic approach using DL, compressed sensing, orthogonal polynomials, and finite elements to approximate smooth functions in infinite-dimensional Banach spaces. Theoretical analysis provides explicit guarantees on error and sample complexity, and numerical experiments demonstrate accurate approximations on challenging benchmark problems.

**[04018] Learning PDE operators with neural networks****Author(s)**:**Elizabeth Qian**(Georgia Institute of Technology)

**Abstract**: The term ‘surrogate modeling’ in computational science and engineering refers to the development of computationally efficient approximations for expensive simulations such as those arising from numerical solution of partial differential equations (PDEs). Surrogate modeling is an enabling methodology for many-query computations in science and engineering which include iterative methods in optimization and sampling methods in uncertainty quantification. Over the last few years several approaches to surrogate modeling for PDEs using neural networks have emerged motivated by successes in using neural networks to approximate nonlinear maps in other areas. In principle the relative merits of these different approaches can be evaluated by understanding for each one the cost required to achieve a given level of accuracy. However the absence of a complete theory of approximation error for these approaches makes it difficult to assess this cost-accuracy trade-off. In this talk we provide a careful numerical study of this issue comparing a variety of different neural network architectures for operator approximation across a range of problems arising from PDE models in continuum mechanics.

**[04519] Kernel methods for learning operators between infinite dimensional Banach spaces****Author(s)**:**Pau Batlle**(California Institute of Technology)- Matthieu Darcy (California Institute of Technology )
- Houman Owhadi (California Institute of Technology)
- Bamdad Hosseini (University of Washington)

**Abstract**: We introduce a kernel-based framework for learning operators between Banach spaces. We show that even with simple kernels, our approach is competitive in terms of cost-accuracy trade-off and either matches or beats the performance of NN methods on a majority of PDE-based benchmarks. Additionally, our framework offers several advantages inherited from kernel methods: simplicity, interpretability, convergence guarantees, a priori error estimates, and Bayesian UQ. It is therefore a natural benchmark for operator learning problems.

**[04774] Unsupervised Learning of the Total Variation Flow****Author(s)**:**Tamara G Grossmann**(University of Cambridge)- Sören Dittmer (University of Cambridge)
- Yury Korolev (University of Bath)
- Carola-Bibiane Schönlieb (University of Cambridge)

**Abstract**: The total variation (TV) flow generates a scale-space representation of an image based on the TV functional. This gradientflow observes desirable features for images and enables texture analysis. The standard numerical approach requires solving multiple non-smooth optimisation problems, this is often prohibitively expensive. We propose the TVflowNET, a neural network approach to compute the solution. We significantly speed up the computation time and show that the TVflowNET approximates the TV flow solution with high fidelity.

**[04824] Analysis of Neural Networks : Blessings of Width, Curses of Depth****Author(s)**:**Lénaïc Chizat**(EPFL)

**Abstract**: I will present several results around the large-width limit of gradient-based algorithms for artificial neural networks (NNs). After a review of two-layer NNs, I will discuss the case of deep NNs where the non-linear dynamics that arises turns out much less tractable, because of how the random matrices of the initialization interact. I will finally elaborate on the case of deep linear NNs where we have obtained a complete description of the dynamics.

**[04864] Learning solution operators for PDEs with uncertainty****Author(s)**:**Emilia Magnani**(University of Tübingen)- Nicholas Krämer (University of Tübingen)
- Runa Eschenhagen (University of Cambridge)
- Philipp Hennig (University of Tübingen)
- Lorenzo Rosasco (University of Genova & MIT)

**Abstract**: We provide a Bayesian formulation of the problem of learning solution operators of PDEs in the formalism of Gaussian processes. We extend this treatment to recent deep architectures (neural operators) that have shown promising results to tackle this task. We provide them with uncertainty estimates through methods from Bayesian deep learning. Finally, we consider particular types of operators (convolutions) and investigate the process of learning these in the context of functional regression in inverse problems.

**[05291] Covariance-Modulated Optimal Transport Geometry****Author(s)**:**Franca Hoffmann**(California Institute of Technology)- André Schlichting (University of Münster)
- Martin Burger (DESY and University of Hamburg)
- Daniel Matthes (Technische Universitaet Muenchen)
- Matthias Erbar (Universitaet Bielefeld)

**Abstract**: We present a variant of the dynamical optimal transport problem in which the energy to be minimised is modulated by the covariance matrix of the current distribution. Such transport metrics arise naturally in mean-field limits of certain ensemble Kalman methods for solving inverse problems. We show that the transport problem splits into two coupled minimization problems up to degrees of freedom given by rotations: one for the evolution of mean and covariance of the interpolating curve, and one for its shape. Similarly, on the level of the gradient flows a similar splitting into the evolution of moments and shapes of the distribution can be observed. Those show better convergence properties in comparison to the classical Wasserstein metric in terms of exponential convergence rates independent of the Gaussian target.