# Registered Data

## [00193] Adversarial robustness at the interface of analysis, geometry and statistics

**Session Time & Room**:**Type**: Proposal of Minisymposium**Abstract**: Stability and robustness have emerged as essential properties for modern machine learning methods. In this three-part minisymposium, we gather researchers from mathematics, statistics, and computer science that have been driving the research in this field in a variety of directions, offering a platform for scientific exchange and aiming at sparking new collaborations in this vibrant and important field. Some of the topics that will be covered by this mini-symposium include regularization methods and insights from variational calculus for training robust models, numerical methods for solving min-max problems, distributionally robust optimization, GANs, geometric insights on adversarial robustness, among others.**Organizer(s)**: Tim Roith, Nicolás García Trillos, Martin Burger**Classification**:__62Gxx__,__68Q32__,__49Qxx__**Minisymposium Program**:- 00193 (1/3) :
__5B__@__E503__[Chair: Nicolás García Trillos] **[00265] Distributionally Robust Gaussian Process Regression and Bayesian Inverse Problems****Format**: Talk at Waseda University**Author(s)**:**Jose Blanchet**(Stanford)

**Abstract**: We study a distributionally robust optimization formulation (i.e., a min-max game) for two representative problems in Bayesian nonparametric estimation: Gaussian process regression and, more generally, linear inverse problems. Our formulation seeks the best mean-squared error predictor, in an infinite-dimensional space, against an adversary who chooses the worst-case model in a Wasserstein ball around a nominal infinite-dimensional Bayesian model. The transport cost is chosen to control features such as the degree of roughness of the sample paths that the adversary is allowed to inject. We show that the game has a well-defined value (i.e., strong duality holds in the sense that max-min equals min-max) and that there exists a unique Nash equilibrium which can be computed by a sequence of finite-dimensional approximations. Crucially, the worst-case distribution is itself Gaussian. We explore properties of the Nash equilibrium and the effects of hyperparameters through a set of numerical experiments, demonstrating the versatility of our modeling framework.

**[01932] Adversarial distributional robustness from Wasserstein ascent-descent particle dynamics****Format**: Talk at Waseda University**Author(s)**:**Camilo García Trillos**(University College London)- Nicolas Garcia Trillos (University of Wisconsin Madison)

**Abstract**: We propose iterative algorithms to solve adversarial problems in a variety of supervised learning settings. Our algorithms, suggested by ascent-descent dynamics in a projected Wasserstein space, take the form of a system of interacting particles. We show the particle dynamics converge toward mean-field limit equations as the number of particles grows. In turn, the mean-field dynamics converge, as time goes to infinity, to epsilon-Nash equilibria of the original adversarial learning problem. We study, moreover, some advantages found on a nonconvex- strongly concave case -in a sense to be made precise in the talk-. Joint work with Nicolás García Trillos.

**[00273] Optimal Algorithms for Stochastic Nested Composition Optimization with Applications to Robust Training****Format**: Online Talk on Zoom**Author(s)**:**Krishnakumar Balasubramanian**(UC DavisUC Davis)

**Abstract**: Many robust training problems could be cast in the form of optimizing a nested composition of $T$ functions. Examples include distributionally robust optimization, risk-averse learning, robust meta learning, etc. In this talk, I will discuss stochastic optimization algorithms for minimizing nested composition of $T$ functions with and without bi-level structures. Assuming access to noisy evaluations of the functions and their gradients through a stochastic first-order oracle, I will present an algorithm using moving-average stochastic estimates for solving the above problem. We show that the proposed algorithm can achieve a sample complexity of $\mathcal{O}(1/\epsilon^4)$ for converging to an $\epsilon$-stationary point of the problem. To the best of our knowledge, this is the first time that such an online algorithm designed for the (un)constrained multi-level setting, obtains the optimal sample complexity of the smooth single-level setting, under mild assumptions on the stochastic first-order oracle.

**[00274] Convergence of GDA for mean field two-player zero-sum games****Format**: Talk at Waseda University**Author(s)**:**Yulong Lu**(University of Massachusetts Amherst)

**Abstract**: Min-max optimization problems arise from many problems in machine learning ,such as generative modeling and adversarial learning. In general, finding the global Nash-equilibrium of a two-player zero-sum game is difficult when the objective function lacks convexity and concavity assumptions. In this talk, I will introduce a mean-field setting of the game problem and discuss some global convergence results of GDA for finding mixed equilibria on the space of probability measures.

- 00193 (2/3) :
__5C__@__E503__[Chair: Tim Roith] **[00292] Gamma convergence of a nonlocal perimeter from adversarial machine learning****Format**: Talk at Waseda University**Author(s)**:**Leon Bungert**(University of Bonn)- Kerrek Stinson (University of Bonn)

**Abstract**: Adversarial training is a robust machine learning method which seeks to compute a classifier which is stable with respect to adversarial attacks. Recent analysis has shown that adversarial training admits different reformulations, e.g., as distributionally robust optimization, multi-marginal optimal transport, or geometric regularization problem. In this last context, adversarial training is equivalent to regularized empirical risk minimization $\min_{A\subset\mathbb{R}^d}\mathcal R_\mathrm{emp}(A) + \varepsilon\operatorname{Per}_\varepsilon(A)$ where a nonlocal perimeter $\operatorname{Per}_\varepsilon $ of the classifier is penalized. The nonlocality is parametrized with the so-called “adversarial budget” $\varepsilon>0$ which models the strength of the adversary. In this talk I will discuss local limits of this nonlocal perimeter as the adversarial budget goes to zero. Under generic conditions we prove Gamma convergence of $\operatorname{Per}_\varepsilon$ to a weighted local perimeter as $\varepsilon \to 0$. This is joint work with Kerrek Stinson from the University of Bonn.

**[00313] Provable Adversarial Robustness via Optimal Transport****Format**: Talk at Waseda University**Author(s)**:**Muni Sreenivas Pydi**(Université Paris Dauphine-PSL)

**Abstract**: In this talk, we explore the fundamental limits of adversarially robust classification using optimal transport. We give two characterizations of the best error: as an optimal transport cost between the true data distributions, and as the Bayes error of a minimax hypothesis test involving Wasserstein uncertainty sets. The first characterization leads to a recipe for finding the optimal classifier, and the second leads to the existence of a Nash equilibrium.

**[00277] Adversarial learning and the Wasserstein barycenter problem****Format**: Talk at Waseda University**Author(s)**:**Matt Jacobs**(Purdue University)

**Abstract**: In this talk, I will show that the adversarial training problem is equivalent to a generalized version of the Wasserstein barycenter problem. The connection between these problems allows us to completely characterize the optimal adversarial strategy and to bring in tools from optimal transport to analyze and compute optimal classifiers. We will then use these tools to better understand the regularizing effect of adversarial training.

**[00299] Optimal Adversarial Classification: geometry, regularity, and topology****Format**: Talk at Waseda University**Author(s)**:**Ryan Murray**(North Carolina State University)

**Abstract**: Classification is a fundamental task in data science and machine learning, and in the past ten years there have been significant improvements on classification tasks (e.g. via deep learning). However, recently there have been a number of works demonstrating that these improved algorithms can be "fooled" using specially constructed adversarial examples. In turn, there has been increased attention given to creating machine learning algorithms which are more robust against adversarial attacks. In this talk I will discuss delicate mathematical and geometric information which can inferred about optimal adversarial classifiers. In particular, I will describe types of available regularity theory, and draw connections with non-local isoperimetric problems which have been popular in the variational community. Time permitting, I will also discuss recent algorithmic advances which can detect and track topological changes induced by the presence of an adversary.

- 00193 (3/3) :
__5D__@__E503__[Chair: Tim Roith] **[00330] Adversarial flows****Format**: Talk at Waseda University**Author(s)**:**Lukas Weigand**(Friedrich-Alexander-Universität Erlangen-Nürnberg )- Tim Roith (Friedrich-Alexander-Universität Erlangen-Nürnberg)
- Martin Burger (Friedrich-Alexander-Universität Erlangen-Nürnberg)

**Abstract**: The fast gradient method and the fast gradient sign method are two popular methods to generate adversaries of neural networks. Iterative applications of those methods yield differential equations, corresponding to $p$-curves of maximum slope in $\mathbb{R}^d$ for the limit case $p=\infty$. We extend current analysis to this limit case, which allow us to generate distributional adversaries by corresponding $\infty$-curves of maximum slope in the $\infty$-Wasserstein space.

**[02000] Distributionally Robust Linear Predictors using the Max-Sliced Wasserstein Metric****Format**: Talk at Waseda University**Author(s)**:**Cynthia Rush**(Columbia University)

**Abstract**: We study the classical problem of predicting an outcome variable, Y, using a linear combination of a d-dimensional covariate vector, X. We provide conditions under which linear predictors that minimize the worst-case prediction error over a ball of distributions determined by a type of max-sliced Wasserstein metric are equivalent to linear predictors whose coefficients solve: inf_β (E[(Y - < β, X >)^r])^(1/r) + 𝛿 || β ||, where r >1 and 𝛿 > 0 is a regularization parameter. A detailed analysis of the statistical properties of this metric yields a simple recommendation for the choice of regularization parameter. The suggested order of 𝛿, after a suitable normalization of the covariates, is typically d/n, up to logarithmic factors. Our recommendation is computationally straightforward to implement, pivotal, has provable out-of-sample performance guarantees, and does not rely on sparsity assumptions about the true data generating process. This is joint work with Jose Montiel Olea, Amilcar Velez and Johannes Wiesel.

**[00352] Minimax results for Surrogate risks in Adversarial Learning****Format**: Talk at Waseda University**Author(s)**:**Natalie Frank**(NYU)

**Abstract**: Robustness to adversarial perturbations is of paramount concern in modern machine learning. One of the state-of-the-art methods for training robust classifiers is adversarial training, which involves minimizing a supremum-based surrogate risk. We prove a minimax theorem for this adversarial surrogate risk and discuss some of the algorithmic implications. Specifically, we use this minimax result to characterize the statistical consistency of surrogate risks in the adversarial setting.

**[00349] Robust second-order estimation algorithms****Format**: Online Talk on Zoom**Author(s)**:**Po-Ling Loh**(University of Cambridge)

**Abstract**: We provide a new computationally-efficient algorithm that finds estimators for the empirical risk minimization problem. We show that these estimators are robust for general statistical models. Our workhorse is a novel robust variant of Newton’s method, and we provide conditions under which our version of Newton’s method variant provides accurate estimators for general convex objectives.

- 00193 (1/3) :