Posts
Apr, 14
An implementation of the tile QR factorization for a GPU and multiple CPUs
The tile QR factorization provides an efficient and scalable way for factoring a dense matrix in parallel on multicore processors. This article presents a way of efficiently implementing the algorithm on a system with a powerful GPU and many multicore CPUs.
Apr, 14
Parallelization of PageRank on Multicore Processors
PageRank is a prominent metric used by search engines for ranking of search results. Page rank of a particular web page is a function of page ranks of all the web pages pointing to this page. The algorithm works on a large number of web pages and is thus computational intensive. The need of hardware […]
Apr, 14
phiGEMM: a CPU-GPU library for porting Quantum ESPRESSO on hybrid systems
GPU computing has revolutionized HPC by bringing the performance of the supercomputer to the desktop. Attractive price, performance, and power characteristics allow multiple GPUs to be plugged into both desktop machines as well as supercomputer nodes for increased performance. Excellent performance and scalability can be achieved for some problems using hybrid combinations of multiple GPUs […]
Apr, 14
GPU parallel computing: Programming language, debugging tools and data structures
With many cores driven by high memory bandwidth, today’s graphics processing unit (GPU) has involved into an absolute computing workhorse. More and more scientists, researchers and software developers are using GPUs to accelerate their algorithms and applications. Developing complex programs and software on the GPU, however, is still far from easy with existing tools provided […]
Apr, 13
A GPU Memory System Comparison for an Elliptic Test Problem
This paper presents GPU-based solutions to the Poisson equation with homogeneous Dirichlet boundary conditions in two spatial dimensions. This problem has well-understood behavior, but similar computation to many more complex real-world problems. We analyze the GPU performance using three types of memory access in the CUDA memory model (direct access to global memory, texture access, […]
Apr, 13
Software Model Checking for GPGPU Programs, Towards a Verification Tool
The tremendous computing power GPUs are capable of makes of them the epicenter of an unprecedented attention for applications other than graphics and gaming. Apart from the highly parallel nature of the programs to be run on GPUs, the sought after gain in computing power is only achieved with low level tuning at threads level […]
Apr, 13
Writing a modular GPGPU program in Java
This paper proposes a Java to CUDA runtime program translator for scientific-computing applications. Traditionally, these applications have been written in Fortran or C without using a rich modularization mechanism. Our translator enables those applications to be written in Java and run on GPGPUs while exploiting a rich modularization mechanism in Java. This translator dynamically generates […]
Apr, 13
Parallel programming with CUDA
This report documents our master thesis project, which is about parallel programming with CUDA, the NVIDIA GPU architecture with support for general purpose computing. The purpose of the thesis is to uncover the qualities of CUDA as a parallel computing platform, determining the possibilities and limitations of its ability to handle different types of algorithms. […]
Apr, 13
Design of high-performance parallelized gene predictors in MATLAB
BACKGROUND: This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel’s algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). FINDINGS: Results show that an implementation using […]
Apr, 12
Spatial Indexing of Large-Scale Geo-Referenced Point Data on GPGPUs Using Parallel Primitives
Modern positioning and locating technologies, e.g., GPS, have generated huge amounts of geo-referenced point data that are crucial to understand environmental and social-economic phenomena. Unfortunately, traditional disk-resident databases are inefficient in handling large-scale point data. In this study, we propose to utilize the massive data parallel processing power of General Purpose computing on Graphics Processing […]
Apr, 12
Verifying GPU Kernels by Test Amplification
We present a novel technique for verifying properties of data parallel GPU programs via test amplification. The key insight behind our work is that we can use the technique of static information flow to amplify the result of a single test execution over the set of all inputs and interleavings that affect the property being […]
Apr, 12
Programming issues for video analysis on Graphics Processing Units
Video processing is a part of signal processing where input and/or output signals are video streams. It covers a wide variety of applications that are generally very compute-intensive due to the algorithmic complexity. Moreover, many of these applications demand real-time performance. Fulfilling these requirements makes necessary the use of hardware acceleration such as Graphics Processing […]