high performance computing on graphics processing units: hgpu.org

Posts

Apr, 14

OpenCL/OpenGL aproach for studying active Brownian motion

This work presents a methodology for studying active Brownian dynamics on ratchet potentials using interoperating OpenCL and OpenGL frameworks. Programing details along with optimization issues are discussed, followed by a comparison of performance on different devices. Time of visualization using OpenGL sharing buffer with OpenCL has been tested against another technique which, while using OpenGL, […]

OpenCL

•

OpenGL

Apr, 13

23d International Conference on Parallel Computational Fluid Dynamics 2011, ParCFD 2011

ParCFD is the annual international conference devoted to the discussion of recent developments and applications of parallel computing in the field of CFD and related disciplines. Since establishment of the ParCFD conference series, parallel computers have become the dominant form of large-scale computing. Emergence of multi-core and heterogeneous architectures in parallel computers has created new […]

Apr, 13

Hardware-Efficient Belief Propagation

Loopy belief propagation (BP) is an effective solution for assigning labels to the nodes of a graphical model such as the Markov random field (MRF), but it requires high memory, bandwidth, and computational costs. Furthermore, the iterative, pixel-wise, and sequential operations of BP make it difficult to parallelize the computation. In this paper, we propose […]

CUDA

Apr, 13

Speeding up K-Means Algorithm by GPUs

Cluster analysis plays a critical role in a wide variety of applications, but it is now facing the computational challenge due to the continuously increasing data volume. Parallel computing is one of the most promising solutions to overcoming the computational challenge. In this paper, we target at parallelizing k-Means, which is one of the most […]

Apr, 13

Accelerate Cache Simulation with Generic GPU

Trace-driven cache simulation is the most widely used method to evaluate different cache structures. Several techniques have been proposed to reduce the simulation time of sequential trace-driven simulation. An obvious way to achieve fast parallel simulation is to simulate the individual independent sets of a cache concurrently on different compute resources. We propose improvements to […]

CUDA

Apr, 13

NQueens on CUDA: Optimization Issues

Todays commercial off-the-shelf computer systems are multicore computing systems as a combination of CPU, graphic processor (GPU) and custom devices. In comparison with CPU cores, graphic cards are capable to execute hundreds up to thousands compute units in parallel. To benefit from these GPU computing resources, applications have to be parallelized and adapted to the […]

CUDA

Apr, 13

Memory Saving Discrete Fourier Transform on GPUs

This paper will show an alternative method to compute the two-dimensional Discrete Fourier Transform. While current GPU Fourier transform libraries need a large buffer for storing intermediate results, our method can compute the same output with far less memory. This will function by exploiting the separability of the Fourier transform. Using this scheme, it is […]

OpenCL

Apr, 13

A Multi-GPU Spectrometer System for Real-Time Wide Bandwidth Radio Signal Analysis

This paper describes the implementation of a large bandwidth multi-GPU signal processing system for radio astronomy observation. This system performs very large Fast Fourier Transform (FFT) and spectrum analysis to achieve real-time analysis of a large bandwidth spectrum. This is accomplished by implementing a four-step FFT algorithm in Compute Unified Device Architecture (CUDA). The key […]

CUDA

Apr, 13

GPU-Accelerated KLT Tracking with Monte-Carlo-Based Feature Reselection

Many computer vision methods rely on frame registration information obtained with algorithms such as the Kanade-Lucas-Tomasi (KLT) feature tracker, which is known for its excellent performance in that area. Various research groups proposed methods to extend its performance, both in terms of execution time and stability. Recent research has shown that current graphics processing units […]

Apr, 13

Efficient Collision Detection and Physics-Based Deformation for Haptic Simulation with Local Spherical Hash

While real time computer graphics rely on a frame rate of 30 iterations per second to fool the eye and render smooth motion transitions, computer haptics deals with the sense of touch, which requires a higher rate of around 1kHz to avoid discontinuities. The use of haptics on interactive applications as surgical simulations or games, […]

CUDA

Apr, 13

Community Structure Discovery algorithm on GPU with CUDA

The automatic search and community discovery in large and complex network has important practical application. It is difficult to be tradeoff in computing speed and clustering exactness. To improve clustering exactness have to decrease the time complexity. In this paper a novel single instruction Multiple Data architecture processors based on Newman algorithm is proposed. The […]

CUDA

Apr, 13

Efficient parallelized particle filter design on CUDA

Particle filtering is widely used in numerous nonlinear applications which require reconfigurability, fast prototyping, and online parallel signal processing. The emerging computing platform, CUDA, may be regarded as the most appealing platform for such implementation. However, there are not yet literatures exploring how to utilize CUDA for particle filters. This parer aims to provide two […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

OpenCL/OpenGL aproach for studying active Brownian motion

23d International Conference on Parallel Computational Fluid Dynamics 2011, ParCFD 2011

Hardware-Efficient Belief Propagation

Speeding up K-Means Algorithm by GPUs

Accelerate Cache Simulation with Generic GPU

NQueens on CUDA: Optimization Issues

Memory Saving Discrete Fourier Transform on GPUs

A Multi-GPU Spectrometer System for Real-Time Wide Bandwidth Radio Signal Analysis

GPU-Accelerated KLT Tracking with Monte-Carlo-Based Feature Reselection

Efficient Collision Detection and Physics-Based Deformation for Haptic Simulation with Local Spherical Hash

Community Structure Discovery algorithm on GPU with CUDA

Efficient parallelized particle filter design on CUDA

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)