Compressing Floating-Point Number Stream for Numerical Applications

hgpu.org » Programming » Algorithms » Compressing Floating-Point Number Stream for Numerical Applications

Compressing Floating-Point Number Stream for Numerical Applications

Hisanobu Tomari, Mary Inaba, Kei Hiraki

Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan

First International Conference on Networking and Computing (ICNC), 2010

DOI:10.1109/IC-NC.2010.24

BibTeX

Source

1926

views

A cluster of commodity computers and general-purpose computers with accelerators such as GPGPUs are now common platforms to solve computationally intensive tasks like scientific simulations. Both technologies provide users with high performance at relatively low cost. However, the low bandwidth of interconnect compared to the computing performance hinders efficient operation of both cluster and accelerator in the case of many algorithms that require heavy data transmission. For clusters the network is one of the major performance bottlenecks, and for accelerators the peripheral bus to transfer data from host to the memory on the accelerator card is. In this paper, we propose a method of accelerating the performance of floating-point intensive algorithms by compressing the floating point number stream. With the efficient software encoder and hardware decoder, the method eliminates redundancy in the exponential part in the array of numbers on the stream and compacts the entire array to 82.8% of its original size at theoretical limit. The compression ratio is better than Gzip or Bzip2 for floating point numbers. The reduction in communication time directly leads to the reduction in total application running time for programs whose processing time is largely dominated by communication performance. We implemented a high-speed decoder using FPGA that operates at over 6 GB/s. We estimated the application performance using FFT and matrix multiplication on a cluster and the GRAPE-DR accelerator respectively, and our approach is useful in both configurations.

Tags: Algorithms, Compression, Computer science, FFT, FPGA, Matrix multiplication

April 20, 2011 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org