7733

Posts

May, 25

Effective Sparse Matrix Representation for the GPU Architectures

General purpose computation on graphics processing unit (GPU) is prominent in the high performance computing era of this time. Porting or accelerating the data parallel applications onto GPU gives the default performance improvement because of the increased computational units. Better performances can be seen if application specific fine tuning is done with respect to the […]
May, 25

Accelerating In-Memory Graph Database traversal using GPGPUS

The paper aims to provide a comparitive analysis on the performance of in memory databases as opposed to a customised graph database written ground up whose joins(searches) are performed on a GPGPU. This is done primarily to serve as a proof of concept on how databases that are represented as graphs can benefit by fostering […]
May, 25

Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs

This work presents a methodology that parallelizes the simulation of mixed-abstraction level SystemC models across multicore CPUs, and graphics processing units (GPUs) for improved simulation performance. Given a SystemC model, we partition it into processes suitable for GPU execution and CPU execution. We convert the processes identified for GPU execution into GPU kernels with additional […]
May, 24

Java on CUDA architecture

Traditional CPU is able to run only a few complex threads concurrently. On the other side, a GPU allows a concurrent execution of hundreds or thousands of simpler threads. The GPU was originally designed for a computer graphics, but nowadays it is being used for general-purpose calculations using a GPGPU technology. CUDA, one of the […]
May, 24

Sparse direct solvers with accelerators over DAG runtimes

The current trend in the high performance computing shows a dramatic increase in the number of cores on the shared memory compute nodes. Algorithms, especially those related to linear algebra, need to be adapted to these new computer architectures in order to be efficient. PASTIX is a sparse parallel direct solver, that incorporates a dynamic […]
May, 24

Tuning a Finite Difference Computation for Parallel Vector Processors

Current CPU and GPU architectures heavily use data and instruction parallelism at different levels. Floating point operations are organised in vector instructions of increasing vector length. For reasons of performance it is mandatory to use the vector instructions efficiently. Several ways of tuning a model problem finite difference stencil computation are discussed. The combination of […]
May, 24

Compiler optimizations for directive-based programming for accelerators

Parallel programming is difficult. For regular computation on central processing units application programming interfaces such as OpenMP, which augment normal sequential programs with preprocessor directives to achieve parallelism, have proven to be easy for programmers and they provide good multithreaded performance. OpenACC is a fork of the OpenMP project, which aims to provide a similar […]
May, 24

Fine-Grained Resource Sharing for Concurrent GPGPU Kernels

General purpose GPU (GPGPU) programming frameworks such as OpenCL and CUDA allow running individual computation kernels sequentially on a device. However, in some cases it is possible to utilize device resources more efficiently by running kernels concurrently. This raises questions about load balancing and resource allocation that have not previously warranted investigation. For example, what […]
May, 23

GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs

Driven by the cost-effectiveness and the power-efficiency, GPUs are being increasingly used to accelerate computations in many domains. However, developing highly efficient GPU implementations requires a lot of expertise and effort. Thus, tool support for tuning GPU programs is urgently needed, and more specifically, lowoverhead mechanisms for collecting fine-grained runtime information are critically required. Unfortunately, […]
May, 23

Molecular Distance Geometry Optimization Using Geometric Build-up and Evolutionary Techniques on GPU

We present a combination of methods addressing the molecular distance problem, implemented on a graphic processing unit. First, we use geometric build-up and depth-first graph traversal. Next, we refine the solution by simulated annealing. For an exact but sparse distance matrix, the buildup method reconstructs the 3D structures with a root-meansquare error (RMSE) in the […]
May, 23

Medical Image Registration using OpenCL

Medical image registration is a computational task involving the spatial realignment of multiple sets of images of the same or different modalities. A novel method of using the Open Computing Language (OpenCL) framework to accelerate affine image registration across multiple processing architectures is presented. The use of this method on graphics processors results in a […]
May, 23

Investigating Warp Size Impact in GPUs

There are a number of design decisions that impact a GPU’s performance. Among such decisions deciding the right warp size can deeply influence the rest of the design. Small warps reduce the performance penalty associated with branch divergence at the expense of a reduction in memory coalescing. Large warps enhance memory coalescing significantly but also […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: