Compressing Floating-Point Number Stream for Numerical Applications
Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan
First International Conference on Networking and Computing (ICNC), 2010
@conference{tomari2010compressing,
title={Compressing Floating-Point Number Stream for Numerical Applications},
author={Tomari, H. and Inaba, M. and Hiraki, K.},
booktitle={2010 First International Conference on Networking and Computing},
pages={112–119},
year={2010},
organization={IEEE}
}
A cluster of commodity computers and general-purpose computers with accelerators such as GPGPUs are now common platforms to solve computationally intensive tasks like scientific simulations. Both technologies provide users with high performance at relatively low cost. However, the low bandwidth of interconnect compared to the computing performance hinders efficient operation of both cluster and accelerator in the case of many algorithms that require heavy data transmission. For clusters the network is one of the major performance bottlenecks, and for accelerators the peripheral bus to transfer data from host to the memory on the accelerator card is. In this paper, we propose a method of accelerating the performance of floating-point intensive algorithms by compressing the floating point number stream. With the efficient software encoder and hardware decoder, the method eliminates redundancy in the exponential part in the array of numbers on the stream and compacts the entire array to 82.8% of its original size at theoretical limit. The compression ratio is better than Gzip or Bzip2 for floating point numbers. The reduction in communication time directly leads to the reduction in total application running time for programs whose processing time is largely dominated by communication performance. We implemented a high-speed decoder using FPGA that operates at over 6 GB/s. We estimated the application performance using FFT and matrix multiplication on a cluster and the GRAPE-DR accelerator respectively, and our approach is useful in both configurations.
April 20, 2011 by hgpu