Abstract : Non-convex optimization has become increasingly important for modern data science applications, with many unsolved challenging open problems. Due to the absence of convexity, the well- developed paradigms convex optimization cannot be fully extended here, leading to the current situation that the theory of non-convex optimization is way behind the practice. In this mini-symposium, we focus on recent advances in non-convex optimization, including the analysis and understanding of the fundamental nature of the non-convex optimization, algorithmic bias/implicit regularization of gradient-based algorithms, fast convergent algorithms in non-convex problems, and their applications in inverse problems, imaging and machine learning and many others.
[02446] Convergence rate analysis of a Dykstra-type projection algorithm
Format : Talk at Waseda University
Author(s) :
Xiaozhou Wang (The Hong Kong Polytechnic University)
Ting Kei Pong (The Hong Kong Polytechnic University)
Abstract : We extend the Dykstra’s projection algorithm to find projections onto the intersection of linear preimages of closed convex sets. The algorithm only makes use of projections onto the latter sets and operations by the linear maps and their adjoints in every iteration. Explicit convergence rate is derived when each set is $C^{1,\alpha}$-cone reducible for some $\alpha\in (0,1]$, under standard relative interior conditions. Concrete examples are constructed to illustrate the necessity of some of our assumptions.
[04336] Critical points of the projection onto the set of low rank tensors
Format : Talk at Waseda University
Author(s) :
Shenglong Hu (Hangzhou Dianzi University)
Abstract : In 2009, SIAM von Neumann prize-winner Yousef Saad proposed the open problem on characterizing the convergence rate of the classical alternating polar decomposition method for low rank orthogonal tensor approximation problem. Actually, this problem was initiated by Gene Golub in 2001 for the rank one case, and received considerable study in the past twenty years. In 2015, concrete examples were given showing that the convergence rate may be sublinear, linear and superlinear. In this talk, we show that for a generic tensor, the algorithm converges linearly without any further assumption by studying the critical points of the projection onto the set of low rank tensors.
[05256] Nonconvex Semi-algebraic Optimization: From Exact to Convergent Conic Program Relaxations
Format : Talk at Waseda University
Author(s) :
Jeya Jeyakumar (UNSW Sydney)
Abstract : Semi-algebraic optimization is the study of optimization problems where the feasible set is defined in terms of polynomial inequalities, called a semi-algebraic set. In addition to the usual tools of nonlinear optimization, such as convex analysis and linear algebra, powerful techniques of real algebraic geometry, such as representation theorems for polynomials, and conic programming methods, such as semi-definite programming and copositive programming, can be employed to study these problems. In this talk, I will describe the key results in this area, highlighting our recent work on the development of exact conic programming relaxations and convergent hierarchy of conic programming relaxations for classes of semi-algebraic optimization problems.
[04681] Convergence theory for mean-field optimization methods
Format : Talk at Waseda University
Author(s) :
Atsushi Nitanda (Kyushu Institute of Technology)
Denny Wu (University of Toronto)
Taiji Suzuki (The University of Tokyo)
Abstract : Optimization of mean-field models recently attracts attention due to its connection to training two-layer neural networks under the mean-field regime. To analyze the optimization dynamics of mean-field models, we have established the theory of convex analysis and have derived several optimization methods. In this talk, we present recent advances in the theory for mean-field optimization methods.
[03322] Convergence Theorem for Deep Neural Network with ReLU Activation
Format : Talk at Waseda University
Author(s) :
Gue Myung Lee (Department of Applied Mathematics, Pukyong National University)
Jae Hyoung Lee (Department of Applied Mathematics, Pukyong National University)
Kwang Baik Lee (Department of Applied Mathematics, Pukyong National University)
Abstract : Deep Neural Network (Deep Learning) is an artificial neural
network which consists of input layer, output layer and many
hidden layers between them. The gradient descent algorithm is
the key one in the learning system for the deep neural network.
So, for understanding the learning system, we should know the
convergence for the descent algorithm for the cost function of
L-layers neural network. The cost function is a composition of
weight variables, biases variables and activation functions.
In this talk, we consider the convergence theorem for the
descent algorithm for the cost function of L-layers neural
network with ReLU function(:$\sigma (z) = {\rm max} \{0, z\}$) as its activation function.
First, we show that the cost function is semi-algebraic and locally Lipschitz.
Secondly, we show that the sequence, which was
generated by the descent algorithm with Clarke generalized
gradient, globally converges to a stationary
point (critical point) under certain assumptions.
Thirdly, we prove that the set of critical values of the cost
function is finite.
[05068] Theoretical and practical applications of signomial rings to polynomial optimization
Format : Talk at Waseda University
Author(s) :
Mareike Dressler (UNSW Sydney)
Abstract : Signomials generalize polynomials by allowing arbitrary real exponents, at the expense of restricting the resulting function to the positive orthant. In this talk, I present a signomial Positivstellensatz based on conditional "sums of arithmetic-geometric exponentials" (SAGE). The Positivstellensatz applies to compact sets which need not be convex or even basic semi-algebraic. In the first part of the talk, I explain how this result is derived through the newly-defined concept of signomial rings. Then I show how the same concept leads to a novel convex relaxation hierarchy of lower bounds for signomial optimization. These relaxations (which are based on relative entropy programming) can be solved more reliably than those arising from earlier SAGE-based Positivstellensätze. Moreover, this increase in reliability comes at no apparent cost of longer solver runtimes or worse bounds. Numerical examples are provided to illustrate the performance of the hierarchy on a problem in chemical reaction networks.
This talk is based on joint work with Riley Murray.
[01258] Global convergence of the gradient method for functions definable in o-minimal structures
Format : Online Talk on Zoom
Author(s) :
Cedric Josz (Columbia University)
Abstract : We consider the gradient method with variable step size for minimizing functions that are definable in o-minimal structures on the real field and differentiable with locally Lipschitz gradients. A sufficient condition ensuring global convergence is discussed, with implications for the convergence rate and convergence to a local minimum. Applications include principal component analysis, matrix sensing, and linear neural networks.
[04380] Proximal methods for nonsmooth and nonconvex fractional programs: when sparse optimization meets fractional programs
Format : Talk at Waseda University
Author(s) :
Guoyin Li (University of new south wales )
Radu Ioan Bot (University of Vienna)
Minh Dao (Royal Melbourne Institute of Technology)
Abstract : Nonsmooth and nonconvex fractional programs are ubiquitous and also highly challenging. It includes the composite optimization problems studied extensively lately, and encompasses many important modern optimization problems arising from diverse areas such as the recent proposed scale invariant sparse signal reconstruction problem in signal processing, the robust Sharpe ratio optimization problems in finance and the sparse generalized eigenvalue problem in discrimination analysis. In this talk, we will introduce extrapolated proximal methods for solving nonsmooth and nonconvex fractional programs and analyse their convergence behaviour. Interestingly, we will show that the proposed algorithm exhibits linear convergence for sparse generalized eigenvalue problem with either cardinality regularization or sparsity constraints. This is achieved by identifying the explicit desingularization function of the Kurdyka-Lojasiewicz inequality for the merit function of the fractional optimization models. Finally, if time permits, we will present some preliminary encouraging numerical results for the proposed methods for sparse signal reconstruction and sparse Fisher discriminant analysis.
[03321] Error bounds based on facial residual functions
Format : Talk at Waseda University
Author(s) :
Bruno Lourenço (Institute of Statistical Mathematics)
Scott B. Lindstrom (Curtin University)
Ting Kei Pong (The Hong Kong Polytechnic University)
Abstract : In this talk, we overview some recent error bound results obtained under the framework of facial residual functions.
This includes new tight error bounds for optimization problems with constraints involving
exponential cones, p-cones and others. Time allowing, we will briefly illustrate the applications of such results in
proving precise convergence rates of certain algorithms and in determining automorphism groups of cones.
[02798] Doubly majorized algorithm for sparsity-inducing optimization problems with regularizer-compatible constraints
Format : Talk at Waseda University
Author(s) :
Tianxiang Liu (Tokyo Institute of Technology )
Ting Kei Pong (The Hong Kong Polytechnic University)
Akiko Takeda (The University of Tokyo)
Abstract : We consider a class of sparsity-inducing optimization problems whose constraint set is regularizer-compatible. By exploiting absolute-value symmetry and other properties in the regularizer, we propose a new algorithm, called the Doubly Majorized Algorithm (DMA). Without invoking any commonly used constraint qualification conditions, we show that the sequence generated by DMA clusters in a new stationary point inspired by the notion of L-stationarity. Finally, numerical performance of DMA on variants of ordered LASSO is also illustrated.
[04271] Differentiating Nonsmooth Solutions to Parametric Monotone Inclusion Problems
Format : Talk at Waseda University
Author(s) :
Antonio José Silveti-Falls (Université Paris-Saclay, CentraleSupélec, INRIA OPIS, France)
Edouard Pauwels (Université Toulouse 3 Paul Sabatier)
Jérôme Bolte (Université Toulouse Capitole, Toulouse School of Economics)
Abstract : Understanding the differentiability and regularity of the solution to a monotone inclusion problem is an important question with consequences for convex optimization, machine learning, signal processing, and beyond. Past attempts have been made either under very restrictive assumptions that ensure the solution is continuously differentiable or using mathematical tools that are incompatible with automatic differentiation. In this talk, we discuss how to leverage path differentiability and a recent result on nonsmooth implicit differentiation calculus to give sufficient conditions ensuring that the solution to a monotone inclusion problem will be path differentiable and provide formulas for computing its generalized gradient. Our approach is fully compatible with automatic differentiation and comes with assumptions which are easy to check, roughly speaking: semialgebraicity and strong monotonicity. We illustrate the scope of our results by considering three fundamental composite problem settings: strongly convex problems, dual solutions to convex minimization problems and primal-dual solutions to min-max problems.
[01315] Optimal Neural Network Approximation of Wasserstein Gradient Direction via Convex Optimization
Format : Online Talk on Zoom
Author(s) :
Yifei Zack Wang (Stanford University)
Peng Chen (Georgia Institute of Technology)
Mert Pilanci (Stanford University)
Wuchen Li (University of South Carolina)
Abstract : The computation of Wasserstein gradient direction is essential for posterior sampling problems and scientific computing. For finite samples, we approximate the score function in the family of two-layer networks with squared-ReLU activations. We derive a semi-definite programming (SDP) relaxation of the variational problem, which can be efficiently solved by standard interior point method. Numerical experiments including PDE-constrained Bayesian inference and parameter estimation in COVID-19 modeling demonstrate the effectiveness of the proposed method.
[01537] Data-informed deep optimization
Format : Talk at Waseda University
Author(s) :
Lulu Zhang (Shanghai Jiao Tong University)
Abstract : Motivated by the impressive success of deep learning, we explore the application of deep learning into a specific class of optimization problems lacking explicit formulas for both objective function and constraints. In this work, we propose a data-informed deep optimization (DiDo) approach emphasizing on the adaptive fitting of the the feasible region. To demonstrate the effectiveness of our DiDo approach, we consider a practical design case in industry and a 100-dimension toy example.
Abstract : Towards designing new algorithms that benefit from the best of both first- and second-order optimization methods, we introduce a new dynamical system, called VM-DIN-AVD, at the interface between second-order dynamics with inertia and Newton's method. This system extends the class of inertial Newton-like dynamics by featuring a time-dependent parameter in front of the acceleration, called variable mass. For strongly convex optimization, we provide guarantees on how the Newtonian and inertial behaviors of the system can be non-asymptotically controlled by means of this variable mass. A connection with the Levenberg-Marquardt --or regularized Newton's-- method is also made. We then show the effect of the variable mass on the asymptotic rate of convergence of the dynamics, and in particular, how it can turn the latter into an accelerated Newton method. We present numerical experiments supporting our findings.
[01611] Inertial quasi-Newton methods for monotone inclusion
Format : Talk at Waseda University
Author(s) :
Shida Wang (Universität Tübingen)
Jalal Fadili (Normandie Univ, ENSICAEN)
Peter Ochs (Universität Tübingen)
Abstract : We introduce an inertial quasi-Newton Forward-Backward Splitting Algorithm to solve a class of monotone inclusion problems. While the inertial step is computationally cheap, in general, the bottleneck is the evaluation of the resolvent operator. A change of the metric makes its computation hard even for (otherwise in the standard metric) simple operators. In order to fully exploit the advantage of adapting the metric, we develop a new efficient resolvent calculus for a low-rank perturbed standard metric, which accounts exactly for quasi-Newton metrics. Moreover, we prove the convergence of our algorithms, including linear convergence rates in case one of the two considered operators is strongly monotone. Beyond the general monotone inclusion setup, we instantiate a novel inertial quasi-Newton Primal-Dual Hybrid Gradient Method for solving saddle point problems. The favourable performance of our inertial quasi-Newton PDHG method is demonstrated on several numerical experiments in image processing.
[05250] Extrapolated Proximal Algorithms for Nonconvex and Nonsmooth Min-max problems
Format : Talk at Waseda University
Author(s) :
Peter Wu (UNSW)
Guoyin Li (The University of New South Wales)
Minh Dao (RMIT)
Abstract : In this talk, we consider an extrapolated proximal algorithm for solving nonsmooth and nonconvex minmax problems. We establish convergence of the full sequence to a stationary point under some gentle assumptions. If time permits, numerical experiment on its application to multi-domain robust sparse learning will be presented.
[05578] Global stability of first-order methods for coercive tame functions
Format : Talk at Waseda University
Author(s) :
Cédric Josz (Columbia University)
Lexiao Lai (Columbia University)
Abstract : We consider first-order methods with constant step size for minimizing locally Lipschitz coercive functions that are tame in an o-minimal structure on the real field. We prove that if the method is approximated by subgradient trajectories, then the iterates eventually remain in a neighborhood of a connected component of the set of critical points. Under suitable model-dependent regularity assumptions, this result applies to the random reshuffling and momentum and the random-permutations cyclic coordinate descent method.