10106

Posts

Jul, 15

KernelGen – the design and implementation of a next generation compiler platform for accelerating numerical models on GPUs

GPUs are becoming pervasive in scientific computing. Originally served as peripheral accelerators, now they are gradually turning into central computing nodes. However, most current directive-based approaches for parallelizing sequential legacy code such as OpenACC and HMPP simply off-load "hot" CPU code onto GPUs, entailing a lot of limitations such as unsupported external calls and coarse-grained […]
Jul, 15

Parallelization of SAT Algorithms on GPUs

The Boolean Satisfability Problem is one of the most important problems in computer science with applications spanning many areas of research. Despite this importance and the extensive study and improvements that have been made, no efficient solution to the problem has been found to the date. During the last years, nVidia introduced CUDA, a platform […]
Jul, 15

CUDA-C implementation of the ADER-DG method for linear hyperbolic PDEs

We implement the ADER-DG numerical method using the CUDA-C language to run the code in a Graphic Processing Unit (GPU). We focus on solving linear hyperbolic partial differential equations where the method can be expressed as a combination of precomputed matrix multiplications becoming a good candidate to be used on the GPU hardware. Moreover, the […]
Jul, 15

GPU Based Implementation of Recursive Digital Filtering Algorithms

Recursive filtering is widely used for many signal processing applications. Speeding-up the computation of recursive filtering using many processing elements is difficult because of the dependency problem. In this paper, massively parallel computation of recursive filtering algorithms using GPGPUs (General Purpose Graphics Processing Units) is studied. The proposed method uses the multi-block parallel processing algorithm, […]
Jul, 15

Exploiting Space and Time Coherence in Grid-based Sorting

In recent years, many approaches for real-time simulation of physical phenomena using particles have been proposed. Many of these use 3D grids for representing spatial distributions and employ a collision detection technique where particles must be sorted with respect to the cells they occupy. In this paper we propose several techniques that make it possible […]
Jul, 15

Near-LSPA Performance at MSA Complexity

The tradeoff between error-correcting performance and numerical complexity of LDPC decoding algorithms is a well-known problem. In this paper we depict the unseen error-floor performance of the Self-Corrected Min-Sum algorithm for long length DVB-S2 codes. We developed a massively parallel simulation using GPUs which allowed a comprehensive BER characterization either in the waterfall or in […]
Jul, 14

Equilibrium and Non-Equilibrium Ising Models by Means of PCA

We propose a unified approach to reversible and irreversible PCA dynamics, and we show that in the case of 1D and 2D nearest neighbour Ising systems with periodic boundary conditions we are able to compute the stationary measure of the dynamics also when the latter is irreversible. We also show how, according to [DPSS12], the […]
Jul, 14

Benchmarking Intel Xeon Phi to Guide Kernel Design

With a minimum of 50 cores, Intel’s Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two levels of caches, and a very fast interconnection, the Xeon Phi is able to achieve theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility – it can be used […]
Jul, 13

The CUDA Handbook: A Comprehensive Guide to GPU Programming

The CUDA Handbook begins where CUDA by Example (Addison-Wesley, 2011) leaves off, discussing CUDA hardware and software in greater detail and covering both CUDA 5.0 and Kepler. Every CUDA developer, from the casual to the most sophisticated, will find something here of interest and immediate usefulness. Newer CUDA developers will see how the hardware processes […]
Jul, 13

Identifying the Key Features of Intel Xeon Phi: A Comparative Approach

With the increasing diversity of many-core processors, it becomes more and more difficult to guarantee performance portability with a unified programming model. The main reason lies in the architecture disparities, e.g., CPUs and GPUs have different architectural features from each other, which leads to the differences in performance optimization techniques. Thus, it is of great […]
Jul, 13

Optimized MFCC Feature Extraction on GPU

In this paper, we update our previous research for Mel-Frequency Cepstral Coefficient (MFCC) feature extraction [1] and describe the optimizations required for improving throughput on the Graphics Processing Units (GPU). We not only demonstrate that the feature extraction process is suitable for GPUs and a substantial reduction in computation time can be obtained by performing […]
Jul, 13

GPU Simulation of Radiation in Matter

Parallel programming on GPUs is introduced in the context of simulating collision energy loss and bremsstrahlung for charged particles propagating in matter. The employed Monte Carlo methods and the involved physics are presented, followed by an introduction to the concepts of GPU parallel programming for the Nvidia CUDA architecture. The simulations implemented in C++ and […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org