[00752] Theory and efficient methods for large-scale structured optimization models
Session Time & Room : 2E (Aug.22, 17:40-19:20) @D408
Type : Proposal of Minisymposium
Abstract : This mini-symposium is mainly on large-scale structured optimization models popular in statistical or machine learning community. We focus on the design of efficient algorithms for solving those large-scale structured optimization problems and the analysis of convergence theory of the proposed algorithms.
[01213] Augmented Lagrangian method for matrix optimization
Format : Talk at Waseda University
Author(s) :
Chao Ding (Chinese Academy of Sciences)
Abstract : In this talk, we will introduce some new convergence results on the matrix optimization problems including nonlinear semidefinite programming and nonsmooth optimization on Riemannian manifold.
[02195] Determinantal point processes for sampling minibatches in SGD
Format : Online Talk on Zoom
Author(s) :
Rémi Bardenet (Université de Lille)
Subhroshekhar Ghosh (National University of Singapore)
Meixia Lin (Singapore University of Technology and Design)
Abstract : In this work, we contribute an orthogonal polynomial-based determinantal point process paradigm for performing minibatch sampling in SGD. Our approach leverages the specific data distribution at hand, which endows it with greater sensitivity and power over existing data-agnostic methods. We substantiate our method via a detailed theoretical analysis of its convergence properties, interweaving between the discrete data set and the underlying continuous domain. In particular, we show how specific DPPs and a string of controlled approximations can lead to gradient estimators with a variance that decays faster with the batchsize than under uniform sampling. Coupled with existing finite-time guarantees for SGD on convex objectives, this entails that, for a large enough batchsize and a fixed budget of item-level gradients to evaluate, DPP minibatches lead to a smaller bound on the mean square approximation error than uniform minibatches. Moreover, our estimators are amenable to a recent algorithm that directly samples linear statistics of DPPs without sampling the underlying DPP, thereby reducing computational overhead.
[02194] Bregman Proximal Point Algorithm Revisited: A New Inexact Version and its Inertial Variant
Format : Talk at Waseda University
Author(s) :
Lei Yang (Sun Yat-Sen University)
Kim-Chuan Toh (National University of Singapore)
Abstract : In this talk, we focus on a general convex optimization problem, which covers various classic problems in different areas and particularly includes many optimal transport related problems arising in recent years. To solve this problem, we revisit the classic Bregman proximal point algorithm (BPPA) and introduce a new inexact stopping condition for solving the subproblems, which can circumvent the underlying feasibility difficulty often appearing in existing inexact conditions when the problem has a complex feasible set. Our inexact condition also covers several existing inexact conditions as special cases and hence makes our inexact BPPA (iBPPA) more flexible to fit different scenarios in practice. As an application to the standard optimal transport (OT) problem, our iBPPA with the entropic proximal term can bypass some numerical instability issues that usually plague the popular Sinkhorn's algorithm in the OT community. The iteration complexity of $O(1/k)$ and the convergence of the sequence are also established for our iBPPA under some mild conditions. Moreover, inspired by Nesterov's acceleration technique, we develop an inertial variant of our iBPPA, denoted by V-iBPPA, and establish the iteration complexity of $O(1/k^{\lambda})$, where $\lambda\geq1$ is a quadrangle scaling exponent of the kernel function. In particular, when the proximal parameter is a constant and the kernel function is strongly convex with Lipschitz continuous gradient (hence $\lambda=2$), our V-iBPPA achieves a faster rate of $O(1/k^2)$ just as existing accelerated inexact proximal point algorithms. Some preliminary numerical experiments for solving the standard OT problem are conducted to show the convergence behaviors of our iBPPA and V-iBPPA under different inexactness settings. The experiments also empirically verify the potential of our V-iBPPA for improving the convergence speed.
[02388] On efficient and scalable computation of the nonparametric maximum likelihood estimator in mixture models
Format : Talk at Waseda University
Author(s) :
Yangjing ZHANG (Chinese Academy of Sciences)
Ying Cui (University of Minnesota)
Bodhisattva Sen (Columbia University)
Kim-Chuan Toh (National University of Singapore)
Abstract : The nonparametric maximum likelihood estimation is a classic and important method to estimate the mixture models
from finite observations. In this talk, we propose an efficient semismooth Newton based augmented Lagrangian
method (ALM). By carefully exploring the structure of the ALM subproblem, we show that the computational cost of the generalized Hessian is independent of the number of grid points. Extensive experiments are conducted to show the effectiveness of our approach.