high performance computing on graphics processing units: hgpu.org

Posts

Jul, 15

A framework for cost based optimization of hybrid CPU/GPU query plans in database systems

Current database research identified the use of computational power of GPUs as a way to increase the performance of database systems. As GPU algorithms are not necessarily faster than their CPU counterparts, it is important to use the GPU only if it is beneficial for query processing. In a general database context, only few research […]

CUDA

Jul, 15

KernelGen – the design and implementation of a next generation compiler platform for accelerating numerical models on GPUs

GPUs are becoming pervasive in scientific computing. Originally served as peripheral accelerators, now they are gradually turning into central computing nodes. However, most current directive-based approaches for parallelizing sequential legacy code such as OpenACC and HMPP simply off-load "hot" CPU code onto GPUs, entailing a lot of limitations such as unsupported external calls and coarse-grained […]

CUDA

Jul, 15

Parallelization of SAT Algorithms on GPUs

The Boolean Satisfability Problem is one of the most important problems in computer science with applications spanning many areas of research. Despite this importance and the extensive study and improvements that have been made, no efficient solution to the problem has been found to the date. During the last years, nVidia introduced CUDA, a platform […]

CUDA

Jul, 15

CUDA-C implementation of the ADER-DG method for linear hyperbolic PDEs

We implement the ADER-DG numerical method using the CUDA-C language to run the code in a Graphic Processing Unit (GPU). We focus on solving linear hyperbolic partial differential equations where the method can be expressed as a combination of precomputed matrix multiplications becoming a good candidate to be used on the GPU hardware. Moreover, the […]

CUDA

Jul, 15

GPU Based Implementation of Recursive Digital Filtering Algorithms

Recursive filtering is widely used for many signal processing applications. Speeding-up the computation of recursive filtering using many processing elements is difficult because of the dependency problem. In this paper, massively parallel computation of recursive filtering algorithms using GPGPUs (General Purpose Graphics Processing Units) is studied. The proposed method uses the multi-block parallel processing algorithm, […]

CUDA

Jul, 15

Exploiting Space and Time Coherence in Grid-based Sorting

In recent years, many approaches for real-time simulation of physical phenomena using particles have been proposed. Many of these use 3D grids for representing spatial distributions and employ a collision detection technique where particles must be sorted with respect to the cells they occupy. In this paper we propose several techniques that make it possible […]

OpenCL

Jul, 15

Near-LSPA Performance at MSA Complexity

The tradeoff between error-correcting performance and numerical complexity of LDPC decoding algorithms is a well-known problem. In this paper we depict the unseen error-floor performance of the Self-Corrected Min-Sum algorithm for long length DVB-S2 codes. We developed a massively parallel simulation using GPUs which allowed a comprehensive BER characterization either in the waterfall or in […]

CUDA

Jul, 14

Equilibrium and Non-Equilibrium Ising Models by Means of PCA

We propose a unified approach to reversible and irreversible PCA dynamics, and we show that in the case of 1D and 2D nearest neighbour Ising systems with periodic boundary conditions we are able to compute the stationary measure of the dynamics also when the latter is irreversible. We also show how, according to [DPSS12], the […]

CUDA

Jul, 14

Benchmarking Intel Xeon Phi to Guide Kernel Design

With a minimum of 50 cores, Intel’s Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two levels of caches, and a very fast interconnection, the Xeon Phi is able to achieve theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility – it can be used […]

Jul, 13

The CUDA Handbook: A Comprehensive Guide to GPU Programming

The CUDA Handbook begins where CUDA by Example (Addison-Wesley, 2011) leaves off, discussing CUDA hardware and software in greater detail and covering both CUDA 5.0 and Kepler. Every CUDA developer, from the casual to the most sophisticated, will find something here of interest and immediate usefulness. Newer CUDA developers will see how the hardware processes […]

CUDA

Jul, 13

Identifying the Key Features of Intel Xeon Phi: A Comparative Approach

With the increasing diversity of many-core processors, it becomes more and more difficult to guarantee performance portability with a unified programming model. The main reason lies in the architecture disparities, e.g., CPUs and GPUs have different architectural features from each other, which leads to the differences in performance optimization techniques. Thus, it is of great […]

Jul, 13

Optimized MFCC Feature Extraction on GPU

In this paper, we update our previous research for Mel-Frequency Cepstral Coefficient (MFCC) feature extraction [1] and describe the optimizations required for improving throughput on the Graphics Processing Units (GPU). We not only demonstrate that the feature extraction process is suitable for GPUs and a substantial reduction in computation time can be obtained by performing […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

A framework for cost based optimization of hybrid CPU/GPU query plans in database systems

KernelGen – the design and implementation of a next generation compiler platform for accelerating numerical models on GPUs

Parallelization of SAT Algorithms on GPUs

CUDA-C implementation of the ADER-DG method for linear hyperbolic PDEs

GPU Based Implementation of Recursive Digital Filtering Algorithms

Exploiting Space and Time Coherence in Grid-based Sorting

Near-LSPA Performance at MSA Complexity

Equilibrium and Non-Equilibrium Ising Models by Means of PCA

Benchmarking Intel Xeon Phi to Guide Kernel Design

The CUDA Handbook: A Comprehensive Guide to GPU Programming

Identifying the Key Features of Intel Xeon Phi: A Comparative Approach

Optimized MFCC Feature Extraction on GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)