Posts
May, 24
Tuning a Finite Difference Computation for Parallel Vector Processors
Current CPU and GPU architectures heavily use data and instruction parallelism at different levels. Floating point operations are organised in vector instructions of increasing vector length. For reasons of performance it is mandatory to use the vector instructions efficiently. Several ways of tuning a model problem finite difference stencil computation are discussed. The combination of […]
May, 24
Compiler optimizations for directive-based programming for accelerators
Parallel programming is difficult. For regular computation on central processing units application programming interfaces such as OpenMP, which augment normal sequential programs with preprocessor directives to achieve parallelism, have proven to be easy for programmers and they provide good multithreaded performance. OpenACC is a fork of the OpenMP project, which aims to provide a similar […]
May, 24
Fine-Grained Resource Sharing for Concurrent GPGPU Kernels
General purpose GPU (GPGPU) programming frameworks such as OpenCL and CUDA allow running individual computation kernels sequentially on a device. However, in some cases it is possible to utilize device resources more efficiently by running kernels concurrently. This raises questions about load balancing and resource allocation that have not previously warranted investigation. For example, what […]
May, 23
GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs
Driven by the cost-effectiveness and the power-efficiency, GPUs are being increasingly used to accelerate computations in many domains. However, developing highly efficient GPU implementations requires a lot of expertise and effort. Thus, tool support for tuning GPU programs is urgently needed, and more specifically, lowoverhead mechanisms for collecting fine-grained runtime information are critically required. Unfortunately, […]
May, 23
Molecular Distance Geometry Optimization Using Geometric Build-up and Evolutionary Techniques on GPU
We present a combination of methods addressing the molecular distance problem, implemented on a graphic processing unit. First, we use geometric build-up and depth-first graph traversal. Next, we refine the solution by simulated annealing. For an exact but sparse distance matrix, the buildup method reconstructs the 3D structures with a root-meansquare error (RMSE) in the […]
May, 23
Medical Image Registration using OpenCL
Medical image registration is a computational task involving the spatial realignment of multiple sets of images of the same or different modalities. A novel method of using the Open Computing Language (OpenCL) framework to accelerate affine image registration across multiple processing architectures is presented. The use of this method on graphics processors results in a […]
May, 23
Investigating Warp Size Impact in GPUs
There are a number of design decisions that impact a GPU’s performance. Among such decisions deciding the right warp size can deeply influence the rest of the design. Small warps reduce the performance penalty associated with branch divergence at the expense of a reduction in memory coalescing. Large warps enhance memory coalescing significantly but also […]
May, 23
Adaptive fast multipole methods on the GPU
We present a highly general implementation of fast multipole methods on graphics processing units (GPUs). Our two-dimensional double precision code features an asymmetric type of adaptive space discretization leading to a particularly elegant and flexible implementation. All steps of the multipole algorithm are efficiently performed on the GPU, including the initial phase which assembles the […]
May, 20
Self-Tuning Distribution of DB-Operations on Hybrid CPU/GPU Platforms
A current research trend focuses on accelerating database operations with the help of GPUs (Graphics Processing Units). Since GPU algorithms are not necessarily faster than their CPU counterparts, it is important to use them only if they outperform their CPU counterparts. In this paper, we address this problem by constructing a decision model for a […]
May, 20
High-Level Support for Pipeline Parallelism on Many-Core Architectures
With the increasing architectural diversity of many-core architectures the challenges of parallel programming and code portability will sharply rise. The EU project PEPPHER addresses these issues with a component-based approach to application development on top of a task-parallel execution model. Central to this approach are multi-architectural components which encapsulate different implementation variants of application functionality […]
May, 20
Computing 2D Alpha Shapes Using GPU
This report presents an approach to compute Alpha Shapes for a 2D un-weighted point set using the graphics processing unit (GPU). The problem of alpha shapes has been well-defined and algorithms have been developed to compute it efficiently in 2D and 3D using CPU. However, the nature of this problem makes it well-suited for solving […]
May, 20
Spatial Data Structures, Sorting and GPU Parallelism for Situated-agent Simulation and Visualisation
Spatial data partitioning techniques are important for obtaining fast and efficient simulations of N-Body particle and spatial agent based models where they considerably reduce redundant entity interaction computation times. Highly parallel techniques based on concurrent threading can be deployed to further speed up such simulations. We study the use of GPU accelerators and highly data […]