Registered Data

[01060] Exploring Arithmetic and Data Representation Beyond the Standard in HPC

  • Session Time & Room :
    • 01060 (1/3) : 1C (Aug.21, 13:20-15:00) @E803
    • 01060 (2/3) : 1D (Aug.21, 15:30-17:10) @E803
    • 01060 (3/3) : 1E (Aug.21, 17:40-19:20) @E803
  • Type : Proposal of Minisymposium
  • Abstract : This mini-symposium explores the potential of utilizing arithmetic operations and data representations other than FP32 (single-precision) and FP64 (double-precision) in numerical computations with HPC. Such attempts include not only higher or lower floating-point precision, but also integer representation, error handling, rounding, etc., and are intended not only for performance but also for quality and reliability of computations. We explore various angles of this challenge, from low-level implementations to applications.
  • Organizer(s) : Daichi Mukunoki, Naohito Nakasato, Tomonori Kouya
  • Classification : 68M99, 65Y04, 65Y05
  • Minisymposium Program :
    • 01060 (1/3) : 1C @E803 [Chair: Daichi Mukunoki]
      • [01783] FP-ANR: A representation format to handle floating-point cancellation at run- time
        • Format : Talk at Waseda University
        • Author(s) :
          • David DEFOUR (Universite de Perpignan)
        • Abstract : When dealing with floating-point numbers there are several sources of error which can drastically reduce the numerical quality of computed results. Among those errors, the loss of significance, or cancellation, occurs during for example the subtraction of two nearly equal numbers. In this article, we propose a representation format named Floating-Point Adaptive Noise Reduction (FP-ANR). This format embeds cancellation information directly into the floating-point representation format thanks to a dedicated pattern. With this format, unsignificant trailing bit lost during cancellation are removed from every manipulated floating-point number. The immediate consequence is that it increases the numerical confidence of computed values. The proposed representation format corresponds to a simple and efficient implementation of significance arithmetic based and compatible with the IEEE-754 standard.
      • [01812] Precision autotuning using stochastic arithmetic
        • Format : Talk at Waseda University
        • Author(s) :
          • Quentin Ferro (Sorbonne University)
          • Stef Graillat (Sorbonne University)
          • Thibault Hilaire (Sorbonne University)
          • Fabienne Jezequel (Sorbonne University)
        • Abstract : We present PROMISE, a tool that makes it possible to provide a mixed precision version of a program by taking into account the requested accuracy on the computed results. With PROMISE the numerical quality of results is verified using Discrete Stochastic Arithmetic that enables one to estimate round-off errors. PROMISE has been used for floating point auto-tuning on neural networks to lower their precision while keeping an accurate output.
      • [01839] Implementation of highly optimized multiple precision BLAS: Strassen vs. Ozaki scheme
        • Format : Talk at Waseda University
        • Author(s) :
          • Tomonori Kouya (Shizuoka Institute of Science and Technology)
        • Abstract : We have already developed a highly optimized extended multiple precision basic linear algebra subprogram library using various technologies such as AVX2 and OpenMP. In particular, Strassen algorithm and Ozaki scheme we employed are distinctive methods for accelerating matrix multiplication. In this talk, we describe the software structure of our library and highlight the advantages and disadvantages of Strassen algorithm and Ozaki scheme through benchmark tests using fixed- and arbitrary-precision floating-point arithmetic.
      • [03869] Accelerating 128-bit Matrix Multiplication for Applications using FPGAs
        • Format : Talk at Waseda University
        • Author(s) :
          • Fumiya Kono (Shizuoka Institute of Science and Technology)
        • Abstract : General Matrix Multiplication (GEMM) is a core of various scientific applications. Since the requirement for the number of bits representing floating-point numbers depends on individual applications, the precision of GEMM is critical. Particularly, Semidefinite Programming requires higher precision, such as binary128. We researched methods of accelerating GEMM in binary128 using FPGAs because arithmetic in binary128 was very slow without hardware unit support. This talk presents an evaluation of our designs on several Intel FPGAs.
    • 01060 (2/3) : 1D @E803 [Chair: Naohito Nakasato]
      • [04191] Multiple Integer Divisions with an Invariant Dividend
        • Format : Talk at Waseda University
        • Author(s) :
          • Daisuke Takahashi (University of Tsukuba)
        • Abstract : In this talk, we propose an algorithm for multiple integer divisions with an invariant dividend and monotonically increasing or decreasing divisors. In such multiple integer divisions, we show that if the dividend and divisors satisfy a certain condition, then if only one quotient is calculated by division first, the remaining quotients can be obtained by correcting the previously calculated quotients at most once.
      • [04628] Reduced-Precision Data Representation on Sparse Matrix-Vector Multiplications
        • Format : Talk at Waseda University
        • Author(s) :
          • Daichi Mukunoki (RIKEN Center for Computational Science)
          • Masatoshi Kawai (Nagoya University)
          • Toshiyuki Imamura (RIKEN Center for Computational Science)
        • Abstract : In sparse iterative solvers, data and arithmetic precision affect convergence and solution accuracy, and their precision can be optimized. There is a demand for its kernel, sparse matrix vector multiplication (SpMV), with different precision. Due to its memory-intensive nature, SpMV could benefit from low-precision floating-point formats to improve performance. In this talk, we demonstrate the performance of our SpMV implementations on CPU and GPU, allowing precision adjustment in 8-bit increments from 16 to 64 bits.
      • [04862] High-performance multidimensional integration
        • Format : Online Talk on Zoom
        • Author(s) :
          • Elise Helene de Doncker (Western Michigan University)
        • Abstract : Techniques for parallel multidimensional integration will be presented, with applications to Feynman loop integrals in high energy physics. Integrand singularities are addressed with adaptive region partitioning, transformations, and convergence acceleration via linear or nonlinear extrapolation. Parallel implementations are commonly layered over a platform that interfaces with the underlying computer architecture, including MPI (Message Passing Interface), OpenMP for iterated multi-threaded integration, and CUDA kernels supporting integrand evaluation for lattice type rules on thousands of GPU CUDA cores.
      • [04875] Introducing MPLAPACK 2.0.1: An Extension of BLAS and LAPACK for Multiple Precision Computation
        • Format : Talk at Waseda University
        • Author(s) :
          • Maho Nakata (RIKEN)
        • Abstract : MPLAPACK, a multiple-precision extension of LAPACK, offers enhanced numerical linear algebra capabilities. Translated from Fortran 90 to C++ using FABLE, MPLAPACK 2.0.1 supports MPBLAS, real and complex versions, and all LAPACK features except mixed-precision routines. Porting legacy C/C++ numerical codes is straightforward, and it supports various numerical libraries for diverse precision levels. MPLAPACK offers OpenMP acceleration for some routines and CUDA support for specific double-double versions. Achieving impressive performance, MPLAPACK is available under the 2-clause BSD license on GitHub.
    • 01060 (3/3) : 1E @E803 [Chair: Tomonori Kouya]
      • [04912] Evaluation of various arithmetic for linear algebra on GPU and FPGA
        • Format : Talk at Waseda University
        • Author(s) :
          • Naohito Nakasato (University of Aizu)
        • Abstract : We present the evaluation of various non-standard floating-point (FP) arithmetic for linear algebra. We accelerate matrix multiplication in 128-bit FP arithmetic on both GPU and FPGA. Also we evaluate other FP format such as POSIT and reduced precision FP arithmetic. We discuss the energy efficiency and the numerica accuracy of the non-standard FP arithmetic on GPUs and FPGA accelerators for dense matrices.
      • [05094] Using quad-precision numbers for preconditioner of domain decomposition method
        • Format : Talk at Waseda University
        • Author(s) :
          • Hiroshi Kawai (Toyo University)
          • Masao Ogino (Daido University)
          • Ryuji Shioya (Toyo University)
        • Abstract : Domain decomposition method with BDD preconditioner is one of the effective parallelization methods for the finite element method. Coarse grid correction in BDD handles a medium-sized linear system, which could be bottleneck in parallel environment. To accelerate this step, the inverse approach is adopted. It replaces foward and back substitution to parallel matrix vector multiplication. Double-double is utilized to preserve the accuracy of the inverse matrix.