high performance computing on graphics processing units: hgpu.org

Posts

Apr, 12

Implementation and optimization of image processing algorithms on handheld GPU

The advent of GPUs with programmable shaders on handheld devices has motivated embedded application developers to utilize GPU to offload computationally intensive tasks and relieve the burden from embedded CPU. In this work, we propose an image processing toolkit on handheld GPU with programmable shaders using OpenGL ES 2.0 API. By using the image processing […]

OpenGL

Apr, 12

Power analysis and optimizations for GPU architecture using a power simulator

As one of the most popular many-core architecture, GPUs have illustrated power in many non-graphic applications. Traditional general purpose computing systems tend to integrate GPU as the co-processor to accelerate parallel computing tasks. Meanwhile, GPUs also result in high power consumption, which accounts for a large proportion of the total system power consumption. In this […]

Apr, 12

CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator

Modern GPUs open a completely new field to optimize embarrassingly parallel algorithms. Implementing an algorithm on a GPU confronts the programmer with a new set of challenges for program optimization. Some of the most notable ones are isolating the part of the algorithm that can be optimized to run on the GPU; tuning the program […]

CUDA

Apr, 12

Automated development of applications for graphical processing units using rewriting rules

Recently there was an active development of parallel programming methods concerning implementation of general-purpose algorithms on graphical processing units (GPUs). Using this specialized hardware allows increasing performance significantly, but requires low-level programming and understanding details of underlying hardware and software platform. Therefore there is a need for automating development process. This paper presents a technique […]

Apr, 12

Stream-Centric Stereo Matching and View Synthesis: A High-Speed Approach on GPUs

In this paper, we propose a real-time image-based rendering (IBR) system. It is specifically designed for photorealistic view synthesis at high-speed on the graphics processing unit (GPU). We steer the proposed IBR system design with two high-level ideas. First, for cost-effective IBR, as long as the synthesized views look visually plausible, the estimated disparity and […]

Apr, 12

An Analytical Approach to the Design of Parallel Block Cipher Encryption/Decryption: A CPU/GPU Case Study

GPUs are at the fore-front of a radical transformation that is taking place in software design. The ability to process multiple data streams simultaneously is delivering substantial benefits to a large collection of domains. Depending on the application, these benefits can be expanded by utilizing the not-insignificant power of traditional CPUs. Multi-core CPUs with a […]

Apr, 12

GPU Accelerated Path-Planning for Multi-agents in Virtual Environments

Many games are populated by synthetic humanoid actors that act as autonomous agents. The animation of humanoids in real-time applications is yet a challenge if the problem involves attaining a precise location in a virtual world (path-planning), and moving realistically according to its own personality, intentions and mood (motion planning). In this paper we present […]

CUDA

Apr, 12

The fast evaluation of hidden Markov models on GPU

It is compute-intensive to evaluate the probability of an observation sequence on a hidden Markov model. Some fast algorithms exit, the forward-backward procedure is the most popular one among them. The forward-backward procedure can save much computation, but its time complexity is N^2T, in other words, there is a high computational complexity in the algorithm. […]

Apr, 12

Accelerating System-Level Design Tasks Using Commodity Graphics Hardware: A Case Study

Many system-level design tasks (e.g. timing analysis, hardware/software partitioning and design space exploration) involve computational kernels that are intractable (usually NP-hard). As a result, they involve high running times even for mid-sized problems. In this paper we explore the possibility of using commodity graphics processing units (GPUs) to accelerate such tasks that commonly arise in […]

OpenGL

Apr, 12

Simulating Spiking Neural P systems without delays using GPUs

We present in this paper our work regarding simulating a type of P system known as a spiking neural P system (SNP system) using graphics processing units (GPUs). GPUs, because of their architectural optimization for parallel computations, are well-suited for highly parallelizable problems. Due to the advent of general purpose GPU computing in recent years, […]

CUDA

Apr, 11

Interactive Simulation and Visualization of Fluids with Surface Raycasting

We present a method to couple particle-based fluid simulation methods such as Smoothed Particle Hydrodynamics (SPH) and volume rendering in order to visualize the fluid. A volume is generated from the fluid’s implicit density field so volume raycasting can be performed to render the surface on the GPU. The volume generation algorithm is also implemented […]

OpenGL

Apr, 11

Real-time 3-D object recognition using scale invariant feature transform and stereo vision

Scale invariant feature transform (SIFT) and stereo vision are applied together to recognize objects in real time. This work reports the performance of a GPU (graphic processing unit) based real-time feature detector in capturing the features of 3D objects when the objects undergo rotational and translational motions in cluttered backgrounds. We have compared the performance […]

OpenGL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Implementation and optimization of image processing algorithms on handheld GPU

Power analysis and optimizations for GPU architecture using a power simulator

CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator

Automated development of applications for graphical processing units using rewriting rules

Stream-Centric Stereo Matching and View Synthesis: A High-Speed Approach on GPUs

An Analytical Approach to the Design of Parallel Block Cipher Encryption/Decryption: A CPU/GPU Case Study

GPU Accelerated Path-Planning for Multi-agents in Virtual Environments

The fast evaluation of hidden Markov models on GPU

Accelerating System-Level Design Tasks Using Commodity Graphics Hardware: A Case Study

Simulating Spiking Neural P systems without delays using GPUs

Interactive Simulation and Visualization of Fluids with Surface Raycasting

Real-time 3-D object recognition using scale invariant feature transform and stereo vision

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)