29746

cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio

Yafan Huang, Sheng Di, Guanpeng Li, Franck Cappello
Computer Science Department, University of Iowa, Iowa City, IA, USA
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024

@inproceedings{huang2024cuszp2,

   title={cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio},

   author={Huang, Yafan and Di, Sheng and Li, Guanpeng and Cappello, Franck},

   booktitle={SC24: International Conference for High Performance Computing, Networking, Storage and Analysis},

   pages={1–18},

   year={2024},

   organization={IEEE}

}

Existing GPU lossy compressors suffer from expensive data movement overheads, inefficient memory access patterns, and high synchronization latency, resulting in limited throughput. This work proposes CUSZP2, a generic single-kernel error-bounded lossy compressor purely on GPUs designed for applications that require high speed, such as large-scale GPU simulation and large language model training. In particular, CUSZP2 proposes a novel lossless encoding method, optimizes memory access patterns, and hides synchronization latency, achieving extreme end-to-end throughput and optimized compression ratio. Experiments on NVIDIA A100 GPU with 9 real-world HPC datasets demonstrate that, even with higher compression ratios and data quality, CUSZP2 can deliver on average 332.42 and 513.04 GB/s end-to-end throughput for compression and decompression, respectively, which is around 2x of existing pure-GPU compressors and 200x of CPU-GPU hybrid compressors.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: