high performance computing on graphics processing units: hgpu.org

Posts

Jun, 1

GPS forward model computing study on CPU/GPU co-processing parallel system using CUDA

Profiles of refraction and bending angle, which computed through the forward model for GPSRO (Global Positioning System radio occultation), are extremely important for GPS radio occultation data assimilation to the forecast system of NWP (Numerical Weather Prediction). The daily processing of GPS RO data in assimilation system costs amount of time, thus there is an […]

CUDA

Jun, 1

GPUMLib: A new Library to combine Machine Learning algorithms with Graphics Processing Units

The Graphics Processing Unit (GPU) is a highly parallel, many-core device with enormous computational power, especially well-suited to address Machine Learning (ML) problems that can be expressed as data-parallel computations. As problems become increasingly demanding, parallel implementations of ML algorithms become critical for developing hybrid intelligent real-world applications. The relative low cost of GPUs combined […]

CUDA

Jun, 1

A GPU/CUDA implementation of the collection-diffusion model to compute SER of large area and complex circuits

This work reports the CUDA implementation of the collection-diffusion model to compute the soft-error rate (SER) of large area and/or complex circuits on graphics processing units (GPU). We detail the time parallelization introduced in the algorithm to accelerate by one order of magnitude the SER calculation. Code performances are evaluated on a NVIDIA Tesla C1060 […]

CUDA

Jun, 1

CUDA Accelerated LTL Model Checking

Recent technological developments made available various many-core hardware platforms. For example, a SIMD-like hardware architecture became easily accessible for many users who have their computers equipped with modern NVIDIA GPU cards with CUDA technology. In this paper we redesign the maximal accepting predecessors algorithm for LTL model checking in terms of matrix-vector product in order […]

CUDA

Jun, 1

Implementations of Parallel Computation of Euclidean Distance Map in Multicore Processors and GPUs

Given a 2-D binary image of size nxn, Euclidean Distance Map (EDM) is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel. It is known that a sequential algorithm can compute the EDM in O(n2) and thus this algorithm is optimal. Also, work-time […]

Jun, 1

The Parallel Processing Based on CUDA for Convolution Filter FDK Reconstruction of CT

Computed tomography (CT) technology has been used in many fields. But the slow speed of CT image reconstruction is unbearable in some situation. The parallel processing based on graphic processing unit (GPU) is a great option to accelerate the speed of CT image reconstruction. But the general purpose GPU program model is difficult to use […]

CUDA

Jun, 1

Parallel particle filter algorithm in face tracking

This paper proposed a parallel particle filter algorithm with the help of GPU (Graphic Processing Unit) in face tracking. Due to illumination and occlusion problems, face tracking usually does not work stably based on a single cue. Three different visual cues, color histogram, edge orientation histogram and wavelet feature, are integrated under the framework of […]

Jun, 1

Robust Low Complexity Feature Tracking using CUDA

In this paper, we propose a real-time video processing implementation of a Robust Low Complexity Feature Tracking (RLCT) algorithm on GPU (Graphics Processing Unit) using the CUDA (Compute Unified Device Architecture) paradigm. The RLCT outperforms state-of-the-art implementations of pyramidal KLT (Kanade-Lucas-Tomasi) on GPU by removing the overhead of the image pyramid construction, by predicting the […]

CUDA

Jun, 1

Texture-Based Visualization of Unsteady 3D Flow by Real-Time Advection and Volumetric Illumination

This paper presents an interactive technique for the dense texture-based visualization of unsteady 3D flow, taking into account issues of computational efficiency and visual perception. High efficiency is achieved by a 3D graphics processing unit (GPU)-based texture advection mechanism that implements logical 3D grid structures by physical memory in the form of 2D textures. This […]

Jun, 1

Extremely fast simulator for decoding LDPC codes

Decoding low-density parity-check (LDPC) codes requires a lot of computation time, particularly when bit error rates as low as 10-9 are needed. In this paper, we improve the simulation speed by making use of an inexpensive graphics processing unit (GPU). A dedicated program is written to utilize the hardware resources in the GPU to decode […]

May, 31

Recent trends in software and hardware for GPGPU computing: A comprehensive survey

With the growth of Graphics Processor (GPU) programmability and processing power, graphics hardware has become a compelling platform for computationally demanding tasks in a wide variety of application domains. This state of art paper gives the technical motivations that underlie GPU computing and describe the hardware and software developments that have led to the recent […]

May, 31

Impact of the Random Number generator quality on particle swarm optimization algorithm running on graphic processor units

Particle swarm optimization (PSO) is a bioinspired technique widely used to solve real optimization problems. In the recent years, the use of Graphics Processing Units (GPU) has been proposed for some general purpose computing applications. Some PSO implementations on GPU were already proposed. The major benefit to implement the PSO for GPU is the possibility […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPS forward model computing study on CPU/GPU co-processing parallel system using CUDA

GPUMLib: A new Library to combine Machine Learning algorithms with Graphics Processing Units

A GPU/CUDA implementation of the collection-diffusion model to compute SER of large area and complex circuits

CUDA Accelerated LTL Model Checking

Implementations of Parallel Computation of Euclidean Distance Map in Multicore Processors and GPUs

The Parallel Processing Based on CUDA for Convolution Filter FDK Reconstruction of CT

Parallel particle filter algorithm in face tracking

Robust Low Complexity Feature Tracking using CUDA

Texture-Based Visualization of Unsteady 3D Flow by Real-Time Advection and Volumetric Illumination

Extremely fast simulator for decoding LDPC codes

Recent trends in software and hardware for GPGPU computing: A comprehensive survey

Impact of the Random Number generator quality on particle swarm optimization algorithm running on graphic processor units

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)