high performance computing on graphics processing units: hgpu.org

Posts

Jun, 18

OpenACC – First Experiences with Real-World Applications

Today’s trend to use accelerators like GPGPUs in heterogeneous computer systems has entailed several low-level APIs for accelerator programming. However, programming these APIs is often tedious and therefore unproductive. To tackle this problem, recent approaches employ directive-based high-level programming for accelerators. In this work, we present our first experiences with OpenACC, an API consisting of […]

OpenCL

Jun, 17

GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. […]

CUDA

Jun, 17

Branch and Data Herding: Reducing Control and Memory Divergence for Error-tolerant GPU Applications

Control and memory divergence between threads within the same execution bundle, or warp, have been shown to cause significant performance bottlenecks for GPU applications. In this paper, we exploit the observation that many GPU applications exhibit error tolerance to propose branch and data herding. Branch herding eliminates control divergence by forcing all threads in a […]

CUDA

Jun, 16

ScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU

In this paper, we analyze the special requirements of a dynamic memory allocator that is designed for massively parallel architectures such as Graphics Processing Units (GPUs). We show that traditional strategies, which work well on CPUs, are not well suited for the use on GPUs and present the thorough design of ScatterAlloc, which can efficiently […]

CUDA

Jun, 16

E-MOGA: A General Purpose Platform for Multi Objective Genetic Algorithm running on CUDA

This paper introduces an Enhanced Multi Objective Genetic Algorithm (E-MOGA) running on Compute Unified Device Architecture (CUDA) hardware, as a general purpose tool that can solve conflict optimization problems. The tool demonstrates significant speed gains using affordable, scalable and commercially available hardware. The objectives of this research are: to enhance the general purpose Multi Objective […]

CUDA

Jun, 16

Accelerating Lambert’s Problem on the GPU in MATLAB

The challenges and benefits of using the GPU to compute solutions to Lambert’s Problem are discussed. Three algorithms (Universal Variables, Gooding’s algorithm, and Izzo’s algorithm) were adapted for GPU computation directly within MATLAB. The robustness of each algorithm was considered, along with the speed at which it could be computed on each of three computers. […]

CUDA

Jun, 16

Parallel Primitives based Spatial Join of Geospatial Data on GPGPUs

Modern GPU architectures closely resemble supercomputers. Commodity GPUs that have already been equipped with personal and cluster computers can be used to boost the performance of spatial databases and GIS. In this study, we report our preliminary work on designing and implementing a spatial join algorithm on GPUs by using generic parallel primitives that have […]

CUDA

Jun, 16

GiST Scan Acceleration using Coprocessors

Efficient lookups in huge, possibly multi-dimensional datasets are crucial for the performance of numerous use cases that generate multiple search operations at the same time, like point queries in ray tracing or spatial joins in collision detection of interactive 3D applications. These applications greatly benefit from index structures that quickly filter relevant candidates for further […]

CUDA

Jun, 15

Energy Efficiency Analysis of GPUs

In the last few years, Graphics Processing Units (GPUs) have become a great tool for massively parallel computing. GPUs are specifically designed for throughput and face several design challenges, specially what is known as the Power and Memory Walls. In these devices, available resources should be used to enhance performance and throughput, as the performance […]

CUDA

Jun, 14

SAGA: SystemC Acceleration on GPU Architectures

SystemC is a widespread language for HW/SW system simulation and design exploration, and thus a key development platform in embedded system design. However, the growing complexity of SoC designs is having an impact on simulation performance, leading to limited SoC exploration potential, which in turns affects development and verification schedules and time-to-market for new designs. […]

CUDA

Jun, 14

Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors

Conjugate gradient is an important iterative method used for solving least squares problems. It is compute-bound and generally involves only simple matrix computations. One would expect that we could fully parallelize such computation on the GPU architecture with multiple Stream Multiprocessors (SMs), each consisting of many SIMD processing units. While implementing a conjugate gradient method […]

CUDA

Jun, 14

Exploiting Unexploited Computing Resources for Computational Logics

We present an investigation of the use of GPGPU techniques to parallelize the execution of a satisfiability solver, based on the traditional DPLL procedure – which, in spite of its simplicity, still represents the core of the most competitive solvers. The investigation tackles some interesting problems, including the use of a predominantly data-parallel architecture, like […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

OpenACC – First Experiences with Real-World Applications

GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

Branch and Data Herding: Reducing Control and Memory Divergence for Error-tolerant GPU Applications

ScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU

E-MOGA: A General Purpose Platform for Multi Objective Genetic Algorithm running on CUDA

Accelerating Lambert’s Problem on the GPU in MATLAB

Parallel Primitives based Spatial Join of Geospatial Data on GPGPUs

GiST Scan Acceleration using Coprocessors

Energy Efficiency Analysis of GPUs

SAGA: SystemC Acceleration on GPU Architectures

Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors

Exploiting Unexploited Computing Resources for Computational Logics

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)