high performance computing on graphics processing units: hgpu.org

Posts

Mar, 11

Multi Agent Navigation on the GPU

We present a unique and elegant graphics hardware realization of multi agent simulation. Specifically, we adapted Velocity Obstacles that suits well parallel computation on single instruction, multiple thread, SIMT, type architecture. We explore hash based nearest neighbors search to considerably optimize the algorithm when mapped on to the GPU. Moreover, to alleviate inefficiencies of agent […]

CUDA

Mar, 11

Real-time human detection using histograms of oriented gradients on a GPU

Human detection has always been an important part of computer vision but many implementations lack the real-time performance that real world applications require. This paper presents a real-time implementation of human detection in video using the state-of-the-art histograms of oriented gradients method. Each image in the video sequence is tested at multiple scales using a […]

CUDA

Mar, 10

Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform

The introduction of NVidia’s powerful Tesla GPU hardware and Compute Unified Device Architecture (CUDA) platform enable many-core parallel programming. As a result, existing algorithms implemented on a GPU can run many times faster than on modern CPUs. Relatively little research has been done so far on GPU implementations of discrete optimisation algorithms. In this paper, […]

CUDA

Mar, 10

Skeletal rigid skinning with blending patches on the GPU

In this paper, we present a novel skeletal rigid skinning approach. First, we introduce a skeleton extraction technique that produces refined skeletons appropriate for animation from decomposed solid models. Then, to avoid the artifacts generated in previous rigid skinning approaches and the associated high training costs, we develop an efficient and robust rigid skinning technique […]

OpenGL

Mar, 10

Implementing Ultrasound Beamforming on the GPU using CUDA

Todays ultrasound equipment consists mainly of a PC that is attached to several large cards that process the received signals in hardware. These cards take up a lot of space and are costly to develop. As processing power in PC’s increase it is possible move some of this signal processing from specialized hardware to standard […]

CUDA

Mar, 10

Real-Time Crowd Rendering and Interactions on GPU

The simulation of large crowds of characters is important in many fields of virtual reality, as they can increase the credibility of the virtual environments. Rendering large crowd of characters requires a great mount of computational power. To increase the efficiency for this render, we propose a GPU-based crowd rendering method. We present a novel […]

Mar, 10

A GPU-enhanced cluster for accelerated FMS

The forces modeling and simulation (FMS) community has often been hampered by constraints in computing: not enough resolution, not enough entities, not enough behavioral variants. High performance computing can ameliorate those constraints. The use of Linux clusters is one path to higher performance; the use of graphics processing units (GPU) as accelerators is another. Merging […]

Mar, 10

Realtime phase-based optical flow on the GPU

Phase-based optical flow algorithms are characterized by high precision and robustness, but also by high computational requirements. Using the CUDA platform, we have implemented a phase-based algorithm that maps exceptionally well on the GPUs architecture. This optical flow algorithm revolves around a reliability measure that evaluates the consistency of phase information over time. By exploiting […]

CUDA

Mar, 10

An Efficient SAR Processor Based on GPU via CUDA

A novel and efficient Synthetic Aperture Radar (SAR) processor is introduced in this paper. This new processor is implemented on the Graphics Processing Unit (GPU). GPU is traditionally used for graphics rendering, but in recent years, it has rapidly evolved as a highly-parallel processor with tremendous computation capability and ultra-high memory bandwidth. The algorithm of […]

CUDA

Mar, 10

Using a GPU to accelerate die and mold fabrication

The authors present a GPU-based method for generating and verifying cutter paths for numerically controlled milling. A CAM system based on this technology is now employed in production at Mazda Motor Corporation for manufacturing stamping dies. This system can compute cutter paths more than 20 times faster than previous methods

Mar, 10

A Predictive Shutdown Technique for GPU Shader Processors

As technology continues to shrink, reducing leakage is critical to achieve energy efficiency. Previous works on low-power GPU (graphics processing unit) focus on techniques for dynamic power reduction, such as DVFS (Dynamic Voltage/Frequency Scaling) and clock gating. In this paper, we explore the potential of adopting architecture-level power gating techniques for leakage reduction on GPU. […]

Mar, 10

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Multi Agent Navigation on the GPU

Real-time human detection using histograms of oriented gradients on a GPU

Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform

Skeletal rigid skinning with blending patches on the GPU

Implementing Ultrasound Beamforming on the GPU using CUDA

Real-Time Crowd Rendering and Interactions on GPU

A GPU-enhanced cluster for accelerated FMS

Realtime phase-based optical flow on the GPU

An Efficient SAR Processor Based on GPU via CUDA

Using a GPU to accelerate die and mold fabrication

A Predictive Shutdown Technique for GPU Shader Processors

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

Recent source codes

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

Most viewed papers (last 30 days)