Abstract : Combinatorial and discrete mathematical problems arise in many real-life applications. Their solution requires designing, analyzing and implementing discrete algorithms. The research in this area therefore brings together mathematicians, computer scientists, statisticians, domain scientists and engineers to solve problems of applied and computational combinatorics. The proposed two-part minisymposium covers scientific computing, machine learning, and graph algorithms. The objective is to summarize the latest discrete algorithmic developments and their applications with computational studies on the applications. This minisymposium follows similar ones held in ICIAM 2019, ICIAM 2015, and ICIAM 2011 on combinatorial aspects of sparse matrix computations.
Abstract : There has been significant interest in graph analytics due to their applications in many domains, including social network and Web analytics, machine learning, biology, and physical simulations. Real-world graphs today are massive and also dynamic. As many real-world graphs change rapidly, it is crucial to design dynamic algorithms that efficiently maintain graph statistics upon updates, since the cost of re-computation from scratch can be prohibitive. Furthermore, due to the high frequency of updates, we can improve performance by using parallelism to process batches of updates at a time. This talk presents new graph algorithms in this parallel batch-dynamic setting.
Specifically, we present the first parallel batch-dynamic algorithm for approximate k-core decomposition that is efficient in both theory and practice. Our algorithm is based on our novel parallel level data structure, inspired by the sequential level data structures of Bhattacharya et al. and Henzinger et al. Given a graph with n vertices and a batch of B updates, our algorithm maintains a (2 + epsilon)-approximation of the coreness values of all vertices (for any constant epsilon > 0) in O(B log^2(n)) amortized work and O(log^2(n) loglog(n)) span (parallel time) with high probability. We implement and experimentally evaluate our algorithm, and demonstrate significant speedups over state-of-the-art serial and parallel implementations for dynamic k-core decomposition.
We have also designed new parallel batch-dynamic algorithms for low out-degree orientation, maximal matching, clique counting, graph coloring, minimum spanning forest, single-linkage clustering, some of which use our parallel level data structure.
[03109] Approximation: a Paradigm for Designing Parallel Graph Algorithms
Format : Online Talk on Zoom
Author(s) :
Alex Pothen (Purdue University )
S M Ferdous (Pacific Northwest National Lab )
Abstract : We describe a paradigm for designing parallel algorithms by approximation techniques. Instead of solving a problem exactly, for which parallel algorithms may not exist, we seek a solution with provable approximation guarantees. Furthermore, we design these algorithms to be concurrent. We discuss linear and submodular matching and edge cover problems for which such algorithms have been designed, and describe their use in solving problems in sparse matrix computations and load balancing problems in quantum chemistry.
[05454] Accelerating AI using Fast and Feasible Matrix Multiplication
Format : Talk at Waseda University
Author(s) :
Oded Schwartz (The Hebrew University)
Abstract : AI requires large resources, both for training and for inference. It involves significant time spent on matrix multiplication, typically between 45%-95%. Most current math libraries (for CPU and GPU) and all state-of-the-art hardware accelerators (such as Google’s TPU and Intel’s / Habana Lab’s Gaudi) are based on the cubic-time classic matrix multiplication algorithm, despite more than five decades of research on sub-cubic time algorithms. In this talk I will review several of the challenges in utilizing fast matrix multiplication algorithms, and recent solutions that can allow faster application in practice.
[02991] Nested dissection ordering on GPUs
Format : Talk at Waseda University
Author(s) :
Xiaoye Sherry Li (Lawrence Berkeley National Lab)
Abstract : Nested dissection ordering is an important preprocessing step
in sparse matrix factorizations. It is effective in reducing the
amount of fill in the factored matrices.
As of now, there is no good algorithm nor software to compute such
an ordreing on GPUs. We have made some progress in this direction.
We will present our new algorithms designed for
GPUs, including multilevel graph coarsening and finding small separators
at each level, and the results of parallel runtime and ordering quality.
[03099] On heuristics for the Birkhoff--von Neumann decomposition
Format : Online Talk on Zoom
Author(s) :
Bora Ucar (CNRS and ENS de Lyon)
Jeremy Cohen (CREATIS, CNRS)
Abstract : The Birkhoff--von Neumann decomposition expresses a doubly stochastic
matrix as a convex combination of permutation matrices. This talk
will be an introduction to this decomposition. We will discuss algorithmic and combinatorial problems associated with this decomposition.
[03435] Recent Advances in Streaming (Hyper)Graph Partitioning
Format : Online Talk on Zoom
Author(s) :
Christian Schulz (Heidelberg University)
Abstract : Partitioning a (hyper)graph into balanced blocks such that few edges run between blocks is a key problem for large-scale distributed processing.
Currently there is a gap observed in the space of available partitioning algorithms. On the one hand, there are streaming algorithms that have been adopted to partition massive graph data on small machines. In the streaming model, vertices arrive one at a time including their neighborhood and then have to be assigned directly to a block. These algorithms can partition huge graphs quickly with little memory, but they produce partitions with low solution quality. On the other hand, there are offline (shared-memory) multilevel algorithms that produce partitions with high quality but also need a machine with enough memory to partition huge networks.
In this talk, we present recent advances in the area of streaming algorithms for the problem. First, we present a buffered streaming approach: this model allows to read more than one node and its neighborhood at the time. This enables our algorithm to leverage multilevel techniques, and thus significantly improve solution quality while surprisingly also enhancing the overall complexity of the algorithm.
On the other hand, we present a shared-memory streaming multi-recursive partitioning scheme that performs recursive multi-sections on the fly without knowing the overall input graph to compute hierarchical partitionings. If the topology of a distributed system is known, it is possible to further optimize the communication costs by mapping partitions onto processing elements.
[04746] Efficient Data Structures for Sparse Dynamic Graphs
Format : Talk at Waseda University
Author(s) :
Helen Xu (Lawrence Berkeley National Laboratory)
Abstract : Sparse graphs underlie many key applications such as social networks analyses and scientific computing. This talk discusses dynamic graph data structures and how to choose data structures that optimize for locality, which is key to performance in graph computations. The focus is on how different use cases, which can cause different data patterns and access patterns, can influence the design of these structures and how to exploit these differences to maximize performance.
[04669] Communication Efficient Stratified SGD on Distributed-Memory Systems
Format : Online Talk on Zoom
Author(s) :
Nabil Abubaker (Bilkent University)
Ozan Karsavuran (Lawrence Berkeley National Lab)
Cevdet Aykanat (Bilkent University)
Abstract : Stratified Stochastic Gradient Descent (SSGD) enables stale-free low-rank approximation of sparse matrices on distributed-memory systems. We present a framework to efficiently scale SSGD on HPC systems. The framework leverages point-to-point communications to reduce the bandwidth overhead, and a novel combinatorial algorithm to reduce the upper bound on the number of messages from O(K^2) to O(K logK), where K is the number of processors. Our experiments show an order of magnitude performance gap in favor of our framework-enabled SSGD compared to the state-of-the-art.