[02387] Recent Advances on Distributed Optimization
Session Time & Room : 2E (Aug.22, 17:40-19:20) @E804
Type : Proposal of Minisymposium
Abstract : Distributed algorithms have emerged as a key driving force in solving large-scale optimization and machine learning problems, with a wide range of applications spanning from training deep neural network models over GPU clusters to boosting edge intelligence over networks consisting of cellphones, tablets, and wearables. In this minisymposium, we will review recent advances in distributed optimization, with a focus on state-of-the-art sample and communication complexities, novel communication compression techniques, and new decentralized or federated algorithms. By bringing together leading researchers and practitioners in this field, we hope to identify key challenges and opportunities for advancing distributed optimization.
[03522] Optimal Gradient Tracking for Decentralized Optimization
Format : Talk at Waseda University
Author(s) :
Ming Yan (The Chinese University of Hong Kong, Shenzhen)
Abstract : We focus on solving the decentralized problem over a network. Assuming smoothness and strong convexity, we propose Optimal Gradient Tracking (OGT), which simultaneously achieves the optimal gradient computation complexity and communication complexity. Its development involves two building blocks that are of independent interest. The first is the decentralized gradient tracking method, SSGT, which achieves optimal gradient computation. The second is a loopless method that achieves similar performance as Chebyshev acceleration.
[03541] Optimal Complexity in Distributed Learning with Communication Compression
Format : Talk at Waseda University
Author(s) :
Yutong He (Peking University)
Xinmeng Huang (University of Pennsylvania)
Wotao Yin (Alibaba (US) Group)
Kun Yuan (Peking University)
Abstract : Recent advances in distributed optimization and learning have shown that communication compression is one of the most effective means of reducing communication. While there have been many results on convergence rates under communication compression, a theoretical lower bound is still missing.
Analyses of algorithms with communication compression have attributed convergence to two abstract properties: the unbiased property or the contractive property. In this talk, we consider distributed stochastic algorithms for minimizing smooth convex and non-convex objective functions under communication compression. We establish convergence lower bounds for algorithms whether using unbiased or contractive compressors. To close the gap between the lower bound and the existing upper bounds, we further propose an algorithm, NEOLITHIC, which almost reaches our lower bound (up to logarithm factors) under mild conditions. The experimental results validate our findings.
[03556] Asymptotic Network Independence in Distributed Stochastic Gradient Methods
Format : Talk at Waseda University
Author(s) :
Shi Pu (The Chinese University of Hong Kong, Shenzhen)
Abstract : We discuss the so-called asymptotic network independence property in distributed stochastic optimization, which is achieved whenever a distributed method executed over a network of $n$ nodes asymptotically converges to the optimal solution at a comparable rate to a centralized method with the same computational power as the entire network; it is as if the network is not even there! We explain this property through examples involving the training of ML models and present a short mathematical analysis. We also discuss the transient times for distributed stochastic gradient methods to achieve network independent convergence rates. Finally, we introduce some recent works on distributed random reshuffling (RR) methods.
[03570] Unified and Refined Analysis of Decentralized Optimization and Learning Algorithms
Format : Talk at Waseda University
Author(s) :
Sulaiman A Alghunaim (Kuwait University)
Abstract : Decentralized multi-agent optimization is a powerful paradigm with numerous applications in learning and engineering design. In these setups, a network of agents is linked by a graph, and agents are only allowed to share information locally. Through localized interactions, they seek the minimizer of a global optimization problem. In decentralized consensus problems, the agents are linked by a common consensus variable on which they must agree.
This talk will present a unified and improved analysis for decentralized consensus optimization methods. We demonstrate how the analysis of several state-of-the-art bias-correction decentralized methods, such as EXTRA, Exact-Diffusion, NIDS, and Gradient-Tracking methods, can be unified by a decentralized algorithmic framework that encompasses these methods. We develop a novel analysis technique that establishes the framework's convergence under nonconvex, convex, and strongly convex objectives. We provide refined and improved convergence rate bounds. The analysis reveals important characteristics for these methods, such as how their performances are influenced by network graph.