FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs
Indiana University, Bloomington, IN, USA
arXiv:2304.12557 [cs.DC]; (2 May 2023)
@article{zhang2023fz,
title={FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs},
author={Zhang, Boyuan and Tian, Jiannan and Di, Sheng and Yu, Xiaodong and Feng, Yunhe and Liang, Xin and Tao, Dingwen and Cappello, Franck},
journal={arXiv preprint arXiv:2304.12557},
year={2023}
}
Today’s large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. However, existing lossy compressors for scientific data cannot achieve a high compression ratio and throughput simultaneously, hindering their adoption in many applications requiring fast compression, such as in-memory compression. To this end, in this work, we develop a fast and high-ratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU). Specifically, we first design a new compression pipeline that consists of fully parallelized quantization, bitshuffle, and our newly designed fast encoding. Then, we propose a series of deep architectural optimizations for each kernel in the pipeline to take full advantage of CUDA architectures. We propose a warp-level optimization to avoid data conflicts for bit-wise operations in bitshuffle, maximize shared memory utilization, and eliminate unnecessary data movements by fusing different compression kernels. Finally, we evaluate FZ-GPU on two NVIDIA GPUs (i.e., A100 and RTX A4000) using six representative scientific datasets from SDRBench. Results on the A100 GPU show that FZ-GPU achieves an average speedup of 4.2X over cuSZ and an average speedup of 37.0X over a multi-threaded CPU implementation of our algorithm under the same error bound. FZ-GPU also achieves an average speedup of 2.3X and an average compression ratio improvement of 2.0X over cuZFP under the same data distortion.
May 7, 2023 by hgpu