Abstract : Over the last twenty years, data science, machine learning, and deep learning in particular, has begun transforming the global economy and modern life. While much attention is focused on purely empirical data mining results, there are considerable mathematical structures and a growing body of theory about how the structures relate to observable properties of real-world systems. Discovering such structures may lead to important mathematical insights and implications for practitioners. The minisymposium will facilitate interactions between harmonic analysts and experts on the theory of data science, machine learning, and deep learning to foster further research in this fast developing area.
[01801] Algorithms for quantizing neural networks with guaranteed accuracy
Format : Talk at Waseda University
Author(s) :
Rayan Saab (University of California San Diego)
Abstract : We present both deterministic and stochastic algorithms for quantizing neural networks, which can achieve significant savings in cost, computation time, memory, and power consumption while preserving the network's performance. Our methods are data-driven, computationally efficient, and have provable error guarantees. We showcase our results through numerical experiments on large multi-layer networks. Time permitting, we also discuss open problems and connections to other areas of research. This is joint work, in parts, with Jinjie Zhang, Yixuan Zhou, and Johannes Maly.
[01538] Learning nonlinear functionals using deep ReLU networks
Format : Talk at Waseda University
Author(s) :
Linhao Song (City University of Hong Kong)
Jun Fan (Hong Kong Baptist University)
Dingxuan Zhou (University of Sydney)
Abstract : Functional neural networks have been proposed and studied in order to approximate nonlinear continuous functionals defined on $L^p([-1, 1]^s)$ for integers $s\ge1$ and $1\le p<\infty$. However, their theoretical properties are largely unknown beyond universality of approximation or the existing analysis does not apply to the rectified linear unit $\left(\hbox{ReLU}\right)$ activation function. In this talk we investigate the approximation power of functional deep ReLU networks and establish their rates of approximation under mild regularity conditions.
[01578] Compressed sensing for the sparse Radon transform
Format : Talk at Waseda University
Author(s) :
Giovanni S. Alberti (University of Genoa)
Matteo Santacesaria (University of Genoa)
S. Ivan Trapasso (Polytechnic University of Turin)
Alessandro Felisi (University of Genoa)
Abstract : Compressed sensing allows for the recovery of signals from a number of measurements that is proportional, up to logarithmic factors, to their sparsity. The classical theory considers random linear measurements or subsampled isometries, with applications to, e.g., MRI. I will show how compressed sensing can be applied to the sparse Radon transform, where a finite number of angles are considered. The result follows from a new theory of compressed sensing for infinite-dimensional ill-posed problems.
[01808] Super-resolution of sparse measures: recent advances
Format : Online Talk on Zoom
Author(s) :
Dmitry Batenkov (Tel Aviv University)
Abstract : The problem of sparse super-resolution asks to recover a linear combination of Dirac point measures from low-frequency and inaccurate measurements. This is a popular model for applications including spectral estimation, direction of arrival, imaging of point sources, and sampling of signals below the Nyquist limit.
In this talk I will describe recent results on deriving optimal recovery bounds and corresponding algorithms for this model and its generalizations to sparse distributions defined on homogeneous Riemannian manifolds.
[04107] Proximal neural networks and Plug-and-Play methods
Format : Talk at Waseda University
Author(s) :
Gabriele Steidl (TU Berlin Berlin)
Johannes Hertrich (TU Berlin)
Sebastian Jonas Neumayer (EPFL)
Abstract : We introduce stable tight frame proximal neural networks (PNNs)
which are by construction averaged operators.
For the training of PNNs, we propose a stochastic gradient descent on (a submanifold of) the Stiefel manifold.
First, we apply cPNN based denoisers within a Plug-and-Play framework and provide convergence results
for the corresponding PnP forward-backward splitting algorithm based on an oracle construction.
Second, we use the averagedness property of PNNs to construct a new architecture within
residual flows.
[03587] The Bayesian Learning Rule
Format : Talk at Waseda University
Author(s) :
Mohammad Emtiyaz Khan (RIKEN Center for AIP)
Abstract : Humans and animals have a natural ability to autonomously learn and quickly adapt to their surroundings. How can we design machines that do the same? In this talk, I will present Bayesian principles to bridge such gaps between humans and machines. I will show that a wide-variety of machine-learning algorithms are instances of a single learning-rule derived from Bayesian principles. The rule unravels a dual perspective yielding new mechanisms for knowledge transfer in learning machines. My hope is to convince the audience that Bayesian principles are indispensable for an AI that learns as efficiently as we do.
[02888] Learning linear operators: Infinite-dimensional regression as a well-behaved non-compact inverse problem
Format : Talk at Waseda University
Author(s) :
Nicole Mücke (TU Brunswick)
Abstract : We consider the problem of learning a linear operator between two Hilbert
spaces from empirical observations, which we interpret as least squares regression in infinite
dimensions. We show that this goal can be reformulated as an inverse problem with the
feature that its forward operator is generally non-compact.
We prove that this inverse problem is equivalent to the known compact inverse
problem associated with scalar response regression.
Our framework allows for obtaining dimension-free rates for generic learning
algorithms. They hold for a variety of practically-
relevant scenarios in functional regression as well as nonlinear regression with operator-
valued kernels and match those of classical kernel regression with scalar response.
[02915] A Non-Asymptotic Analysis of Dropout in the Linear Model
Format : Talk at Waseda University
Author(s) :
Gabriel Clara (University of Twente)
Sophie Langer (University of Twente)
Johannes Schmidt-Hieber (University of Twente)
Abstract : We investigate the statistical behavior of iterates generated by gradient descent with dropout in a linear model. Non-asymptotic convergence rates for expectations and covariance matrices of the iterates are presented. Difficulties arising from the interaction between gradient descent dynamics and the variance added by dropout are examined. The results motivate and support discussion of statistical aspects of dropout, focusing on optimality of the variance.
Ryan Michael O'Dowd (Claremont Graduate University)
Abstract : In theoretical analysis of function approximation in the context of machine learning, a standard approach is to assume that given data lies on an unknown manifold. We view the unknown manifold as a sub-manifold of an ambient hypersphere and construct a one-shot approximation using spherical polynomials. Our approach does not require pre-processing of the data to obtain information about the manifold other than its dimension. We give optimal rates of approximation for relatively ``rough'' functions.
[04364] Distribution learning for count data
Format : Talk at Waseda University
Author(s) :
Xin Guo (The University of Queensland)
Qiang Fu (The University of British Columbia)
Tian-Yi Zhou (Georgia InstitGeorgia Institute of Technologyute of Technology)
Hien Nguyen (The University of Queensland)
Abstract : Parameter and density estimation for count models are classical problems in statistics, and are widely used in many branches of physical and social sciences. Grouped and right-censored (GRC) counts are widely used in criminology, demography, epidemiology, marketing, sociology, psychology and other related disciplines to study behavioural and event frequencies, especially when sensitive research topics or individuals with possibly lower cognitive capacities are at stake. Yet, the co-existence of grouping and right-censoring poses major difficulties in regression analysis. To implement generalised linear regression of GRC counts, we derive modified Poisson estimators and their asymptotic properties, develop a hybrid line search algorithm for parameter inference, demonstrate the finite-sample performance of these estimators via simulation, and evaluate its empirical applicability based on survey data of drug use in America. This method has a clear methodological advantage over the ordered logistic model for analysing GRC counts. We will also present our recent works on mixing density estimation through kernel methods and deep neural networks.
[05188] Hierarchical systems of exponential bases for partitions of intervals
Format : Online Talk on Zoom
Author(s) :
Goetz Pfander (Catholic University Eichstätt Ingolstadt, Mathematical Institute for Machine Learning and Data Science)
David Walnut (George Mason University)
Abstract : Fourier series form a cornerstone of analysis; it allows the expansion of a complex valued 1-periodic function in
the basis of integer frequency exponentials. A simple rescaling argument shows that by splitting the integers into evens and odds, we obtain orthogonal bases for functions defined on the first, respectively the second half of the unit interval.
We develop generalizations of this curiosity and show that, for example, for any finite partition of the unit interval into
subintervals exists a partition of integers into subsets, each of which forms a basis for functions supported on the
respective subinterval.