Registered Data

[00893] Higher Order-type Optimization Methods for Machine Learning

  • Session Time & Room :
    • 00893 (1/3) : 1E (Aug.21, 17:40-19:20) @D501
    • 00893 (2/3) : 2C (Aug.22, 13:20-15:00) @D501
    • 00893 (3/3) : 2D (Aug.22, 15:30-17:10) @D501
  • Type : Proposal of Minisymposium
  • Abstract : Higher order optimization mechanisms are popular and powerful tools to accelerate, robustify, and enhance the performance of first order algorithms. Albeit the high and general prevalence of first order schemes, deterministic and stochastic higher order-type methods have recently gained increasing attention and have been successfully utilized to solve challenging large-scale learning tasks, reinforcement learning problems, and other big data applications. The purpose of this minisymposium is to highlight recent advances and discuss novel techniques and strategies in the development and analysis of deterministic and stochastic higher order-type methods for large-scale minimization problems and machine learning applications.
  • Organizer(s) : Andre Milzarek, Zaiwen Wen
  • Classification : 90C06, 90C30
  • Minisymposium Program :
    • 00893 (1/3) : 1E @D501 [Chair: Andre Milzarek]
      • [03305] An efficient skipping BFGS algorithm with nice Hessian correction properties
        • Author(s) :
          • Yu-Hong Dai (AMSS)
        • Abstract : An efficient skipping BFGS algorithm is designed with nice Hessian correction properties. We analyze the skipping method from the perspective of improving quasi-Newton equations by operators and derive stronger quadratic termination properties. Global convergence and superlinear convergence results are established under classical assumptions. Numerical experiments illustrate the efficiency of the proposed method compared with the standard BFGS method.
      • [02797] An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning
        • Author(s) :
          • Tiande Guo (University of Chinese Academy of Sciences)
          • Yan Liu (Naikai University)
          • Congying Han (University of Chinese Academy of Sciences)
        • Abstract : Numerous intriguing optimization problems arise as a result of the advancement of machine learning. Second-order algorithms have their typical advantages in dealing with highly nonlinear and ill-conditioning problems. This paper provides a review on recent developments in stochastic variants of quasi-Newton methods, which construct the Hessian approximations using only gradient information. We concentrate on BFGS-based methods in stochastic settings and highlight the algorithmic improvements that enable the algorithm to work in various scenarios. Future research on stochastic quasi-Newton methods should focus on enhancing its applicability, lowering the computational and storage costs, and improving the convergence rate.
      • [02795] Riemannian Natural Gradient Methods
        • Format : Talk at Waseda University
        • Author(s) :
          • Zaiwen Wen (Peking University)
          • Jiang Hu (The Chinese University of Hong Kong)
          • Ruicheng Ao (Peking University)
          • Anthony Man-Cho So (The Chinese University of Hong Kong)
          • Minghan Yang (Peking University)
        • Abstract : In this talk, we present a Riemannian natural gradient method for large-scale optimization problems on Riemannian manifolds. Such problems arise in various machine learning and signal processing applications. The notion of Fisher information matrix is extended from the Euclidean setting. We establish the almost-sure global convergence and local convergence of our proposed method under standard assumptions. Numerical experiments on applications arising from machine learning demonstrate the advantages of the proposed method over state-of-the-art ones.
      • [04755] Newton-PMR: Newton Subspace Methods with Complexity Guarantees for Non-convex Optimization
        • Format : Talk at Waseda University
        • Author(s) :
          • Yang Liu (University of Oxford)
          • Andre Milzarek (The Chinese University of HThe Chinese University of Hong Kong, Shenzhenong Kong, Shenzhen)
          • Fred Roosta (the University of Queensland)
        • Abstract : Recently, Newton-MR methods have been introduced for solving nonconvex unconstrained smooth optimization problems. Leveraging the inherent ability of the minimum residual (MINRES) inner solver to detect directions of nonpositive curvature (NPC), Newton-MR variants enjoy a variety of optimal complexity guarantees. However, the application of these methods to modern high-dimensional problems remains challenging. To address this, we present novel variants that incorporate certain dimensionality reduction techniques. In particular, our proposed methods are based on recent results that have shown that preconditioning MINRES with a positive semi-definite but singular preconditioner is in fact equivalent to solving a low-dimensional problem whose dimension corresponds to the nullity of the preconditioning matrix. Utilizing these dimensionality reduction properties of preconditioned MINRES, we present novel variants of Newton-MR, called Newton-PMR, which can be readily applied to high-dimensional problems, while achieving desirable complexity guarantees.
    • 00893 (2/3) : 2C @D501 [Chair: Yu-Hong Dai]
      • [02799] Real-time tool path planning using deep learning for subtractive manufacturing
        • Author(s) :
          • Yi-Fei Feng (University of Chinese Academy of Sciences)
          • Li-Yong Shen (University of Chinese Academy of Sciences)
          • Hong-YU Ma (University of Chinese Academy of Sciences)
          • Chun-Ming Yuan (Academy of Mathematics and Systems Sciences, CAS)
          • Xin Jiang (University of Chinese Academy of Sciences)
        • Abstract : Tool path planning is a crucial factor of computer-aided design and manufacturing. To generate suitable tool paths, the previous methods bases on traditional optimization often take to a long computational time. To achieve real-time planning, we propose an efficient neural network-based direct tool path generating method, and the whole process only takes a few microseconds. As an auxiliary result, a new tool path dataset with confined scallop height is first established for tool path training.
      • [02796] NeuroPrim: An Attention-based Model for Solving NP-hard Spanning Tree Problems
        • Author(s) :
          • Yuchen Shi (University of Chinese Academy of Sciences)
          • Congying Han (University of Chinese Academy of Sciences)
          • Tiande Guo (University of Chinese Academy of Sciences)
        • Abstract : We define the Markov Decision Process (MDP) for general combinatorial optimization problems on graphs and propose NeuroPrim, a novel framework for reducing the action and state space using the technique of Prim algorithm, which is trained by REINFORCE to solve various spanning tree problems. We apply it to three difficult problems on Euclidean spaces, namely Degree-constrained Minimum Spanning Tree (DCMST) problem, Minimum Routing Cost Spanning Tree (MRCST) problem and Steiner Tree Problem in graphs (STP). Experimental results on literature instances show that our model is able to outperform strong heuristics and obtain small optimality gaps up to 250 vertices. In addition, we find no signifi cant degradation on problem instances as large as 1000, which demonstrates our model has strong generalization ability.
      • [02832] Streaming Algorithms for Maximizing the Difference of Submodular Functions
        • Author(s) :
          • Wenguo Yang
          • Cheng LU (UCAS)
          • Suixiang GAO (UCAS)
        • Abstract : In this paper, we study the problem of maximizing the Difference of two Submodular (DS) functions in the streaming model, where elements of the ground set arrive one at a time in an arbitrary order. We present one-pass streaming algorithms for both the unconstrained and cardinality-constrained problems. Our analysis shows that the algorithms we propose are able to produce solutions with provable approximation guarantees. To the best of our knowledge, this is the first theoretical guarantee for the DS maximization problem in the streaming model.
      • [03350] Recursive Importance Sketching for Rank Constrained Least Squares
        • Author(s) :
          • Xudong Li (Fudan University)
          • Yuetian Luo (University of Wisconsin-Madison)
          • Anru Zhang (Duke University)
          • Wen Huang (Xiamen University)
        • Abstract : In this talk, we propose a new Recursive Importance Sketching algorithm for rank constrained least squares optimization (RISRO). As its name suggests, the algorithm is based on a new sketching framework, recursive importance sketching. Several existing algorithms in the literature can be reinterpreted under the new sketching framework and RISRO offers clear advantages over them. RISRO is easy to implement and computationally efficient, where the core procedure in each iteration is only solving a dimension reduced least squares problem. Different from numerous existing algorithms with locally geometric convergence rate, we establish the local quadratic-linear and quadratic rate of convergence for RISRO under some mild conditions. In addition, we discover a deep connection of RISRO to Riemannian manifold optimization on fixed rank matrices. The effectiveness of RISRO is demonstrated in two applications in machine learning and statistics: low-rank matrix trace regression and phase retrieval. Simulation studies demonstrate the superior numerical performance of RISRO.
    • 00893 (3/3) : 2D @D501 [Chair: Tiande Guo]
      • [03278] A semismooth Newton stochastic proximal point algorithm with variance reduction
        • Format : Talk at Waseda University
        • Author(s) :
          • Andre Milzarek (The Chinese University of Hong Kong, Shenzhen)
          • Fabian Schaipp (Technical University of Munich)
          • Michael Ulbrich (Technical University of Munich)
        • Abstract : We present an implementable stochastic proximal point (SPP) method for a class of weakly convex, composite optimization problems. The proposed stochastic proximal point algorithm incorporates a variance reduction mechanism and the resulting SPP updates are solved using an inexact semismooth Newton framework. We establish detailed convergence results that take the inexactness of the SPP steps into account. Finally, numerical experiments are shown illustrating that SPP competes favorably with other state-of-the-art methods.