Effects of compression on data intensive algorithms

Ahmed Adnan Aqrawi
Department of Computer and Information Science, Norwegian University of Science and Technology
Norwegian University of Science and Technology, 2010


   title={Effects of Compression on Data Intensive Algorithms},

   author={Aqrawi, A.A. and Elster, A.C.},

   journal={Norwegian University of Science and Technology},



Download Download (PDF)   View View   Source Source   



In recent years, the gap between bandwidth and computational throughput has become a major challenge in high performance computing (HPC). Data intensive algorithms are particularly affected. by the limitations of I/O bandwidth and latency. In this thesis project, data compression is explored so that fewer bytes need to be read from disk. The computational capabilities of the GPU are then utilized for faster decompression. Seismic filtering algorithms, which are known to be very data intensive, are used as tests cases. In the thesis, both lossless and lossy compression algorithms are considered. We have developed, optimized and implemented several compression algorithms for both the CPU and GPU using C, OpenMP and NVIDIA CUDA. A scheme for utilizing both the CPU and GPU using asynchronous I/O to further improve performance is also developed. Compression algorithms studied and optimized include RLE, Huffman encoding, 1D-3D DCT, 1D-3D Fast DCT AAN algorithm, and the fast LOT. 3D convolution and the Hough transform filtering algorithms are also developed and optimized. Lossy compression algorithms using transform encoding are also studied. Using these transforms for compression include: 1) transformation, 2) quantization and 3) encoding. Transformation and quantization are shown to be especially suitable for the GPU because of their parallelizable nature. The encoding step is shown to be best done on the CPU because of its sequential nature. GPU and CPU are used in asynchronous co-operation to perform the compression on seismic data sizes (up to 32GB). Transform coding is lossy, but the errors we experience are minimally visible and are within acceptable loss given the type of data (a max. of 0.46% ME and 81 rMSE for our seismic data sets). HDD disk with 70MB/s transfer rate, and a speedup of 3.3 for a modern SSD with a 140MB/s transfer rate. Several other results on both the recent NVIDIA Tesla c1060 GPU and the new NVIDIA Tesla c2050 Fermi-based GPU, as well as results for using CPU and GPU together using asynchronous I/O is included. The major bottleneck now is the PCI express bus limitations, and for files that do not compress well, the I/O bandwidth and latency is still an issue.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: