high performance computing on graphics processing units: hgpu.org

Posts

Sep, 8

Experiences with Mapping Non-linear Memory Access Patterns into GPUs

Modern Graphics Processing Units (GPU) are very powerful computational systems on a chip. For this reason there is a growing interest in using these units as general purpose hardware accelerators (GPGPU). To facilitate the programming of general purpose applications, NVIDIA introduced the CUDA programming environment. CUDA provides a simplified abstraction of the underlying complex GPU […]

CUDA

Sep, 8

A Fast GPU Implementation for Solving Sparse Ill-Posed Linear Equation Systems

Image reconstruction, a very compute-intense process in general, can often be reduced to large linear equation systems represented as sparse under-determined matrices. Solvers for these equation systems (not restricted to image reconstruction) spend most of their time in sparse matrix-vector multiplications (SpMV). In this paper we will present a GPU-accelerated scheme for a Conjugate Gradient […]

CUDA

Sep, 8

Programming Many-Core Chips

This book presents new concepts, techniques and promising programming models for designing software for chips with "many" (hundreds to thousands) processor cores. Given the scale of parallelism inherent to these chips, software designers face new challenges in terms of operating systems, middleware and applications. This will serve as an invaluable, single-source reference to the state-of-the-art […]

CUDA

•

OpenCL

Sep, 8

GPU Computation in Bioinspired Algorithms: A Review

Bioinspired methods usually need a high amount of computational resources. For this reason, parallelization is an interesting alternative in order to decrease the execution time and to provide accurate results. In this sense, recently there has been a growing interest in developing parallel algorithms using graphic processing units (GPU) also refered as GPU computation. Advances […]

CUDA

•

OpenCL

Sep, 8

Towards GPGPU Assisted Computing in Virtualized Environments

General Purpose Computation on Graphics Processing Units (GPGPU) makes it possible to use the massive computing power of modern graphics cards for generic high-performance computing. However, the new virtualization technologies will typically not support high-performance graphics cards and as a consequence GPGPU resources can not be used in typical virtualization setups. In this paper we […]

CUDA

•

OpenCL

Sep, 8

Implementing Independent Component Analysis in General-Purpose GPU Architectures

New computational architectures, such as multi-core processors and graphics processing units (GPUs), pose challenges to application developers. Although in the case of general-purpose GPU programming, environments and toolkits such as CUDA and OpenCL have simplified application development, different ways of thinking about memory access, storage, and program execution are required. This paper presents a strategy […]

CUDA

•

OpenCL

Sep, 8

Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design

The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using a single unified programming interface and language. While the standard guarantees portability of functionality for complying applications and platforms, performance portability on such a diverse set of hardware is limited. Devices may vary significantly in memory architecture as well as […]

OpenCL

Sep, 8

Accelerating Clustering Coefficient Calculations on a GPU Using OPENCL

The growth in multicore CPUs and the emergence of powerful manycore GPUs has led to proliferation of parallel applications. Many applications are not straight forward to be parallelized. This paper examines the performance of a parallelized implementation for calculating measurements of Complex Networks. We present an algorithm for calculating complex networks topological feature clustering coefficient, […]

OpenCL

Sep, 7

Pegasus: coordinated scheduling for virtualized accelerator-based systems

Heterogeneous multi-cores–platforms comprised of both general purpose and accelerator cores–are becoming increasingly common. While applications wish to freely utilize all cores present on such platforms, operating systems continue to view accelerators as specialized devices. The Pegasus system described in this paper uses an alternative approach that offers a uniform resource usage model for all cores […]

CUDA

Sep, 7

GPU-Based approaches for multiobjective local search algorithms. A case study: the flowshop scheduling problem

Multiobjective local search algorithms are efficient methods to solve complex problems in science and industry. Even if these heuristics allow to significantly reduce the computational time of the solution search space exploration, this latter cost remains exorbitant when very large problem instances are to be solved. As a result, the use of graphics processing units […]

Sep, 7

Automatic CPU-GPU communication management and optimization

The performance benefits of GPU parallelism can be enormous, but unlocking this performance potential is challenging. The applicability and performance of GPU parallelizations is limited by the complexities of CPU-GPU communication. To address these communications problems, this paper presents the first fully automatic system for managing and optimizing CPU-GPU communcation. This system, called the CPU-GPU […]

CUDA

Sep, 7

High performance computation and interactive display of molecular orbitals on GPUs and multi-core CPUs

The visualization of molecular orbitals (MOs) is important for analyzing the results of quantum chemistry simulations. The functions describing the MOs are computed on a three-dimensional lattice, and the resulting data can then be used for plotting isocontours or isosurfaces for visualization as well as for other types of analyses. Existing software packages that render […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Experiences with Mapping Non-linear Memory Access Patterns into GPUs

A Fast GPU Implementation for Solving Sparse Ill-Posed Linear Equation Systems

Programming Many-Core Chips

GPU Computation in Bioinspired Algorithms: A Review

Towards GPGPU Assisted Computing in Virtualized Environments

Implementing Independent Component Analysis in General-Purpose GPU Architectures

Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design

Accelerating Clustering Coefficient Calculations on a GPU Using OPENCL

Pegasus: coordinated scheduling for virtualized accelerator-based systems

GPU-Based approaches for multiobjective local search algorithms. A case study: the flowshop scheduling problem

Automatic CPU-GPU communication management and optimization

High performance computation and interactive display of molecular orbitals on GPUs and multi-core CPUs

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)