Abstract : Training neural networks remains challenging for complex or recurrent architectures. The random feature approach side steps training deep layers by sampling random weights and fitting only the final output layer to data with linear least squares. When applied to recurrent networks, this is called a reservoir computer. These methods can approximate functions, operators, and dynamical systems. This minisymposium seeks to unify knowledge and experiences of both communities on topics of 1) scaling to high-dimensional large-volume data, 2) hyperparameter learning, 3) performance evaluation and comparison, and 4) theoretical understanding of features and reservoirs.
Organizer(s) : Oliver Dunbar, Georg Gottwald, Matthew Levine, Nicholas Nelsen
[02997] Scalable Gaussian Process Regression with Quadrature-based Features
Format : Talk at Waseda University
Author(s) :
Paz Fink Shustin (Tel Aviv University)
Abstract : Gaussian processes provide a powerful probabilistic kernel learning framework, which allows high-quality nonparametric learning via methods such as Gaussian process regression. Nevertheless, its learning phase requires unrealistic massive computations for large datasets. In this talk, we present a quadrature-based approach for scaling up Gaussian process regression via a low-rank approximation of the kernel matrix. The low-rank structure is utilized to achieve effective hyperparameter learning, training, and prediction. Our Gauss-Legendre features method is inspired by the well-known random Fourier features approach, which also builds low-rank approximations via numerical integration. However, our method is capable of generating high-quality kernel approximation using a number of features that is poly-logarithmic in the number of training points, while similar guarantees will require an amount that is at the very least linear in the number of training points when using random Fourier features. The utility of our method for learning with low-dimensional datasets is demonstrated using numerical experiments.
[03226] Advances in Time Series Analysis With Reservoir Computing
Format : Talk at Waseda University
Author(s) :
Braden John Thorne (University of Western Australia)
Michael Small (University of Western Australia)
Débora Cristina Corrêa (University of Western Australia)
Ayham Zaitouny (University of Doha for Science and Technology)
Abstract : Reservoir computers have proven to be powerful embedding machines for dynamical systems. However, bridging the gap from their machine learning origins to time series analysis is still relatively new, with great potential for novel discoveries. In this talk, we will outline what reservoir time series analysis is and why one should care about it amidst the ecosystem of other embedding-based techniques. We will then present some use cases and applications to motivate future work.
[03272] Error analysis of random feature neural networks for Black-Scholes-type PDEs
Format : Online Talk on Zoom
Author(s) :
Lukas Gonon (Imperial College London)
Abstract : We mathematically analyse the learning performance of random feature neural networks for learning solutions to a class of PDEs which includes the Black-Scholes PDE as special case. In contrast to other existing mathematical results on neural network-based PDE-learning, in our context it is possible to obtain a full error analysis addressing all error components (approximation, generalization and optimization) with the derived bounds (convergence rates and constants) not suffering from the curse of dimensionality.
[04091] Theoretical advances for learning functions and operators with random features
Format : Talk at Waseda University
Author(s) :
Nicholas H. Nelsen (California Institute of Technology)
Abstract : This talk provides a complete error analysis of operator learning with random features (RF). The theoretical results are developed in a fully general infinite-dimensional input-output setting. The highlights include strong consistency of RF estimators under model misspecification and minimax optimal convergence rates. This work also contributes theory for rigorous uncertainty quantification by establishing (i) new pointwise error bounds for vector-valued Gaussian process (GP) regression and (ii) strong consistency of RF estimators of GPs.
[04530] Photonic reservoir computing with small networks
Format : Online Talk on Zoom
Author(s) :
Joseph David Hart (US Naval Research Laboratory)
Thomas Carroll (US Naval Research Laboratory)
Francesco Sorrentino (University of New Mexico)
Joel Q Grim (US Naval Research Laboratory)
Allan Bracker (US Naval Research Laboratory)
Abstract : The model-free training permitted by reservoir computing makes it particularly attractive for implementation in analog physical hardware, which can offer significant improvements in speed and power requirements over digital hardware. In many cases, however, it can be difficult or undesirable to build a large, tunable analog network. In this talk, we will present recent results using photonic analog hardware to implement reservoir computers made up of small networks.
[04854] Latent GP-ODEs with Informative Priors
Format : Online Talk on Zoom
Author(s) :
Ilze Amanda Auzina (University of Amsterdam)
Cagatay Yildiz (University of Tuebingen)
Efstratios Gavves (University of Amsterdam)
Abstract : We propose a novel framework by combining a generative and a Bayesian nonparametric model which learns a physically meaningful latent representation and solves an ODE system in latent space. The model is able to account for uncertainty as well as to be constrained with informative physical priors. The method demonstrates its ability to learn dynamics from high dimensional data and we obtain state-of-the-art performance compared to earlier nonparametric ODE models on dynamic forecasting.
[04314] A Framework for Hyperparameter Optimization for Randomized Machine Learning
Format : Talk at Waseda University
Author(s) :
Oliver Dunbar (Division of Geological and Planetary Sciences, California Institute of Technology)
Nicholas Nelsen (California Institute of Technology)
Maya Mutic (Princeton University)
Abstract : Randomization can be used to replace layers of neural networks or kernel matrices of Gaussian processes. This approach accelerates numerical methods and converts training to a least-squares problem. In practice however, necessary hyperparameter optimization becomes more challenging, as optimization objective functions are non-deterministic. In the context of the random features, we present a framework and algorithm based on the ensemble Kalman filter, that can automate this optimization, and demonstrate practical performance through illustrative examples.
[04861] Next-Generation Reservoir Computing, and On Explaining the Surprising Success of a Random Neural Network for Forecasting Chaos
Format : Online Talk on Zoom
Author(s) :
Erik Matthew Bollt (Clarkson University)
Abstract : Machine learning is widely popular and successful, including for data-driven science, especially for forecasting complex dynamical systems. Reservoir computers (RC) have emerged as random neural networks, for simplicity and computational advantage, where only read-out weights are trained. That it is cheap is clear, but that it works at all is perhaps a surprise, which we explain here. Furthermore our discussion leads to a new, equivalent even simpler variant we call, next generation reservoir computing, NG-RC.
[05356] Minimax optimal inference of inhomogeneous diffusions
Format : Online Talk on Zoom
Author(s) :
Grant Rotskoff (Stanford University)
Abstract : Inferring a diffusion equation from discretely-observed measurements is a statistical challenge of significant importance in a variety of fields, from single-molecule tracking in biophysical systems to modeling financial instruments. Assuming that the underlying dynamical process obeys a $d$-dimensional stochastic differential equation of the form
$$
d\boldsymbol{x}_t=\boldsymbol{b}(\boldsymbol{x}_t)dt+\Sigma(\boldsymbol{x}_t)d\boldsymbol{w}_t,
$$
we show that no diffusion estimator using $N$ discretely sampled data points converges faster than $N^{-\frac{2s+2}{2s+2+d}}$ when the drift $\boldsymbol{b}$ and diffusion tensor $D = \Sigma\Sigma^{T}$ are $s$ and $s+1$-Hölder continuous, respectively. We further propose neural network estimators for both $D$ and $\boldsymbol{b}$, establish convergence guarantees, and show that the estimators achieve a nearly optimal rate for correlated data.
[05361] Nonlinear Time Series Analysis and Data Driven Forecasting: Regional Weather Prediction & Earth's Geodynamo
Format : Online Talk on Zoom
Author(s) :
Luke Fairbanks (UCSD)
Ashley Thorshov (UCSD)
Abstract : Amongst the zoo of complex systems analysis frameworks and methods exist some which leverage tools from mathematics and physics with machine learning to attempt a more interpretable and possibly better results with respect to time series prediction of said systems. Our work in the domains of weather prediction and the geodynamo is a model for the interdisciplinary union between experimentalists, theorists, and computational researchers such as ourselves in the pursuit of complex system algorithmic synchronization.
[04474] Random Features for Epidemic Prediction
Format : Online Talk on Zoom
Author(s) :
Esha Saha (University of Waterloo)
Lam Ho (Dalhousie University )
Giang Tran (University of Waterloo)
Abstract : Predicting the evolution of diseases is challenging. Compartmental models stratify the population into compartments and model the dynamics using dynamical systems. These predefined systems may not capture the true dynamics of the epidemic due to the complexity variable interactions. We propose Sparsity and Delay Embedding based Forecasting (SPADE4) for predicting epidemics for predicting the future trajectory of a variable without the knowledge of the underlying system using random features and Takens' delay embedding theorem.