Posts
Apr, 21
Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation
We propose a system-independent representation of sparse matrix formats that allows a compiler to generate efficient, system-specific code for sparse matrix operations. To show the viability of such a representation we have developed a compiler that generates and tunes code for sparse matrix-vector multiplication (SpMV) on GPUs. We evaluate our framework on six state-of-the-art matrix […]
Apr, 21
Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular […]
Apr, 21
Assessment of GPU computational enhancement to a 2D flood model
This paper presents a study of the computational enhancement of a Graphics Processing Unit (GPU) enabled 2D flood model. The objectives are to demonstrate the significant speedup of a new GPU-enabled full dynamic wave flood model and to present the effect of model spatial resolution on its speedup. A 2D dynamic flood model based on […]
Apr, 21
Solving knapsack problems on GPU
A parallel implementation via CUDA of the dynamic programming method for the knapsack problem on NVIDIA GPU is presented. A GTX 260 card with 192 cores (1.4GHz) is used for computational tests and processing times obtained with the parallel code are compared to the sequential one on a CPU with an Intel Xeon 3.0GHz. The […]
Apr, 21
Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs
Graphics Processing Unit (GPU) has become an attractive coprocessor for scientific computing due to its massive processing capability. The sparse matrix-vector multiplication (SpMV) is a critical operation in a wide variety of scientific and engineering applications, such as sparse linear algebra and image processing. This paper presents an auto-tuning framework that can automatically compute and […]
Apr, 21
A performance prediction model for the CUDA GPGPU platform
The significant growth in computational power of modern Graphics Processing Units (GPUs) coupled with the advent of general purpose programming environments like NVIDIA’s CUDA, has seen GPUs emerging as a very popular parallel computing platform. Till recently, there has not been a performance model for GPGPUs. The absence of such a model makes it difficult […]
Apr, 21
Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit
Particle-in-cell (PIC) simulations with Monte-Carlo collisions are used in plasma science to explore a variety of kinetic effects. One major problem is the long run-time of such simulations. Even on modern computer systems, PIC codes take a considerable amount of time for convergence. Most of the computations can be massively parallelized, since particles behave independently […]
Apr, 20
Exploring scalability of FIR filter realizations on Graphics Processing Units
General-Purpose Computing on Graphics Processing Units (GPGPU) has lately been of great interest due to the release of architectures and software that simplifies programming graphics cards. This study explores how performance scales with FIR digital filters by varying the number of taps and the samples. We also discuss the trade-offs with various techniques for GPGPU […]
Apr, 20
Stream processing of moment invariants for real-time classifiers
This paper introduces a general purpose graphics processing unit (GPGPU) stream processing implementation of moment invariants using an integral image or summed area table approach. Summed area tables have been used to help attain real-time performance for some classifier systems, however due to the computational complexity of moment invariants, a high throughput computational platform is […]
Apr, 20
Improving the performance of PIR Protocol in Outsourced Databases
Outsourcing database as service instead of using in-house database management is a new trend emerging in a computing industry; there has been growing interest in outsourcing database services in both the commercial world and the research community. In this paper, we present analysis of non-concurrent model of fast single-database Private Information Retrieval (PIR) scheme for […]
Apr, 20
A GPU-based calculation using the three-dimensional FDTD method for electromagnetic field analysis
Numerical simulations with the numerical human model using the finite-difference time domain (FDTD) method have recently been performed frequently in a number of fields in biomedical engineering. However, the FDTD calculation runs too slowly. We focus, therefore, on general purpose programming on the graphics processing unit (GPGPU). The three-dimensional FDTD method was implemented on the […]
Apr, 20
Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster
In this paper, the mixed precision algorithm to solve the linear system of equations and the implementation of HPL package are introduced. We use this mixed precision algorithm to improve HPL package on CPU + GPGPU heterogeneous clusters, which is named for GHPL, and give the implementation mechanisms in detail. The experimental results are measured […]

