high performance computing on graphics processing units: hgpu.org

Posts

Nov, 15

Accelerating the Gillespie Exact Stochastic Simulation Algorithm Using Hybrid Parallel Execution on Graphics Processing Units

The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple […]

Nov, 15

High Dimensional Spaces and Modelling in the task of Speaker Recognition

The automatic speaker recognition made a significant progress in the last two decades. Huge speech corpora containing thousands of speakers recorded on several channels are at hand, and methods utilizing as much information as possible were developed. Nowadays state-of-the-art methods are based on Gaussian mixture models used to estimate relevant statistics from feature vectors extracted […]

CUDA

Nov, 14

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization

Rendering massive 3D models has been recognized as a challenging task. Due to the limited size of GPU memory, a massive model containing hundreds of millions of primitives cannot fit into most of modern GPUs. By applying parallel levelof-detail (LOD), as proposed in [1], only a portion of primitives instead of the whole are necessary […]

CUDA

•

OpenGL

Nov, 14

G-SNPM – A GPU-based SNP mapping tool

MOTIVATION AND OBJECTIVES: In genotyping analysis often researchers need to merge together genetic datasets coming from different genotyping platforms that use different sets of Single Nucleotide Polymorphisms (SNPs) to represent genetic polymorphisms. In order to do this, it is necessary to know the exact position of a SNP in a chromosome and update this information […]

CUDA

Nov, 14

Performance modeling of atomic additions on GPU scratchpad memory

GPU application implementations using scatter approaches will fall into write contention due to atomic updates of output elements, if these result from more than one input element. Colliding threads will be serialized, seriously harming performance. Dealing with these issues requires a proper understanding of the behavior of the scratchpad or shared memory under conflicting accesses […]

CUDA

Nov, 14

A simple method to accelerate fringe analysis algorithms based on graphics processing unit and MATLAB

With the fast development during the past few years, multicore has become a revolutionary technique for the performance improvement of computing devices, ranging from supercomputers to cell phones. Among multicore processors, a graphics processing units (GPU) is outstanding because of its huge computational performance and comparably low cost. It can be used as a coprocessor […]

CUDA

Nov, 14

Correctly rounding elementary functions on GPU

The IEEE 754-2008 standard recommends the correct rounding of elementary functions. This requires to solve the Table Maker’s Dilemma which implies a huge amount of CPU computation time. We consider in this paper accelerating such computations, namely Lef’evre algorithm, on Graphics Processing Units (GPU) which are massively parallel architectures with a partial SIMD execution (Single […]

CUDA

Nov, 14

Efficient similarity search on multimedia databases

Manipulating and retrieving multimedia data has received increasing attention with the advent of cloud storage facilities. The ability of querying by similarity over large data collections is mandatory to improve storage and user interfaces. But, all of them are expensive operations to solve only in CPU; thus, it is convenient to take into account High […]

CUDA

Nov, 14

An architecture for real time fluid simulation using multiple GPUs

Natural phenomena simulation, such as water and smoke, is a very important topic in order to increase real time scene realism in videogames and general real time simulations. However, this kind of simulation requires numerically solving the Navier-Stokes equations, which is a computationally expensive task. Additionally, to deal more immersing simulation, interaction between the flow […]

CUDA

Nov, 14

Real-Time Scheduling Using GPUs – Advanced and More Accurate Proof of Feasibility

This paper will report our evaluation to use OpenCL as a platform for hard real-time scheduling. Especially, we have evaluated which types of tasks are faster on GPGPU than on CPU. We have investigated computational tasks, memory intensive tasks (especially tasks using low latency GDDR memory) and disk intensive tasks. This study is the part […]

OpenCL

Nov, 14

Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

Data warehousing applications represent an emerging application arena that requires the processing of relational queries and computations over massive amounts of data. Modern general purpose GPUs are high bandwidth architectures that potentially offer substantial improvements in throughput for these applications. However, there are significant challenges that arise due to the overheads of data movement through […]

CUDA

Nov, 14

Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs

Marching Cubes (MC) is an algorithm that extracts surfaces from volumetric scalar data. It is used extensively in visualization and analysis of medical data from modalities like CT and MR, usually after a 3D segmentation of the structures of interest have been performed. Implementations of MC on CPUs are slow, using several seconds (even minutes) […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Accelerating the Gillespie Exact Stochastic Simulation Algorithm Using Hybrid Parallel Execution on Graphics Processing Units

High Dimensional Spaces and Modelling in the task of Speaker Recognition

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization

G-SNPM – A GPU-based SNP mapping tool

Performance modeling of atomic additions on GPU scratchpad memory

A simple method to accelerate fringe analysis algorithms based on graphics processing unit and MATLAB

Correctly rounding elementary functions on GPU

Efficient similarity search on multimedia databases

An architecture for real time fluid simulation using multiple GPUs

Real-Time Scheduling Using GPUs – Advanced and More Accurate Proof of Feasibility

Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)