high performance computing on graphics processing units: hgpu.org

Posts

Jun, 12

FFT-SPA Non-Binary LDPC Decoding on GPU

It is well known that non-binary LDPC codes outperform the BER performance of binary LDPC codes for the same code length. The superior BER performance of non-binary codes comes at the expense of more complex decoding algorithms that demand higher computational power. In this paper, we propose parallel signal processing algorithms for performing the FFT-SPA […]

CUDA

Jun, 12

OpenCL Implementation of a Color Based Object Tracking

In this paper we present an algorithm for realtime object tracking based on color. Firstly, a two-layer perceptron is trained aimed at coping with scene illumination changes. Based on this training, a piece of OpenCL code is generated for the purpose of harnessing the power of GPU computing. Then, color based object tracking is done […]

OpenCL

Jun, 12

Performance of a GPU-based Direct Summation Algorithm for Computation of Small Angle Scattering Profile

Small Angle Scattering (SAS) of X-rays or neutrons is an experimental technique that provides valuable structural information for biological macromolecules under physiological conditions and with no limitation on the molecular size. In order to refine molecular structure against experimental SAS data, ab initio prediction of the scattering profile must be recomputed hundreds of thousands of […]

OpenCL

Jun, 10

OCLoptimizer: An Iterative Optimization Tool for OpenCL

Nowadays, computers include several computational devices with parallel capacities, such as multicore processors and Graphic Processing Units (GPUs). OpenCL enables the programming of all these kinds of devices. An OpenCL program consists of a host code which discovers the computational devices available in the host system and it queues up commands to the devices, and […]

OpenCL

Jun, 10

Accelerating Genetic Programming Using Graphics Processing Units

Evolution through natural selection offers the possibility of automatically generating functionally complex solutions to a wide range of problems. Methods such as Genetic Programming (GP) show the promise of this approach but tend to stagnate after relatively few generations. To research this issue, execution speed must be substantially improved. This thesis presents work to accelerate […]

CUDA

Jun, 10

Processing XPath Structural Constraints on GPU

Technologies such as CUDA and OpenCL have popularized the usage of graphics cards (GPUs) for general purpose programming, often with impressive performance gains. However, using such cards for speeding up XML Databases processing is yet to be fully explored. XML databases offer much flexibility for Web-oriented systems. Nonetheless, such flexibility comes at a considerable computational […]

CUDA

•

OpenCL

Jun, 10

A flexible algorithm for calculating pair interactions on SIMD architectures

Calculating interactions or correlations between pairs of particles is typically the most time-consuming task in particle simulation or correlation analysis. Straightforward implementations using a double loop over particle pairs have traditionally worked well, especially since compilers usually do a good job of unrolling the inner loop. In order to reach high performance on modern CPU […]

CUDA

Jun, 10

Recent Advances on GPU Computing in Operations Research

In the last decade, Graphics Processing Units (GPUs) have gained an increasing popularity as accelerators for High Performance Computing (HPC) applications. Recent GPUs are not only powerful graphics engines but also highly threaded parallel computing processors that can achieve sustainable speedup as compared with CPUs. In this context, researchers try to exploit the capability of […]

Jun, 9

GPU Acceleration of Algebraic Multigrid for Low-Frequency Finite Element Methods

This paper introduces a GPU acceleration of a Wavelet-based Algebraic Multigrid used as preconditioner for solving the Laplace’s equation discretized by Finite Element Method. We conduct some tests using a CPU-based direct solver, a CPU-based Preconditined Conjugate Gradient (PCG), and a GPU-based PCG. Finally, we report the solution time and the speed-up achieved in solving […]

Jun, 9

Understanding Dynamic Parallelism at Any Scale with Allinea’s Unified Tools (webinar)

Dynamic Parallelism is a great new feature introduced by NVIDIA in CUDA 5. As powerful features like this are introduced, the complexity of debugging and profiling often increase. This webinar will provide technical insight into how Allinea’s powerful tools can save the day if bugs come up when developing with Dynamic Parallelism. The webinar, presented […]

Jun, 8

GPU Acceleration of Particle Advection Workloads in a Parallel, Distributed Memory Setting

Although there has been significant research in GPU acceleration, both of parallel simulation codes (i.e., GPGPU) and of single GPU visualization and analysis algorithms, there has been relatively little research devoted to visualization and analysis algorithms on GPU clusters. This oversight is significant: parallel visualization and analysis algorithms have markedly different characteristics – computational load, […]

CUDA

Jun, 8

High Resolution Sparse Voxel DAGs

We show that a binary voxel grid can be represented orders of magnitude more efficiently than using a sparse voxel octree (SVO) by generalising the tree to a directed acyclic graph (DAG). While the SVO allows for efficient encoding of empty regions of space, the DAG additionally allows for efficient encoding of identical regions of […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

FFT-SPA Non-Binary LDPC Decoding on GPU

OpenCL Implementation of a Color Based Object Tracking

Performance of a GPU-based Direct Summation Algorithm for Computation of Small Angle Scattering Profile

OCLoptimizer: An Iterative Optimization Tool for OpenCL

Accelerating Genetic Programming Using Graphics Processing Units

Processing XPath Structural Constraints on GPU

A flexible algorithm for calculating pair interactions on SIMD architectures

Recent Advances on GPU Computing in Operations Research

GPU Acceleration of Algebraic Multigrid for Low-Frequency Finite Element Methods

Understanding Dynamic Parallelism at Any Scale with Allinea’s Unified Tools (webinar)

GPU Acceleration of Particle Advection Workloads in a Parallel, Distributed Memory Setting

High Resolution Sparse Voxel DAGs

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)