high performance computing on graphics processing units: hgpu.org

Posts

Jul, 4

A Fast GPU-Based Motion Estimation Algorithm for H.264/AVC

H.264/AVC is the most recent predictive video compression standard to outperform other existing video coding standards by means of higher computational complexity. In recent years, heterogeneous computing has emerged as a cost-efficient solution for high-performance computing. In the literature, several algorithms have been proposed to accelerate video compression, but so far there have not been […]

CUDA

Jul, 4

GPU Parallelization of an Unstructured Overset Grid Incompressible Navier-Stokes Solver for Moving Bodies

In pursuit of obtaining high fidelity solutions to the fluid flow equations in a short span of time, Graphics Processing Units ( GPUs ) which were originally intended for gaming applications, are currently being used to accelerate Computational Fluid Dynamics codes. With a high peak throughput of about 1 TFLOPS on a PC, GPUs seem […]

CUDA

Jul, 3

The 19th IEEE International Symposium on High Performance Computer Architecture Collocated with PPoPP-2013 and CGO-2013, HPCA-2013

The International Symposium on High-Performance Computer Architecture provides a high-quality forum for scientists and engineers to present their latest research findings in this rapidly-changing field. Authors are invited to submit papers on all aspects of high-performance computer architecture. Topics of interest include, but are not limited to: * Processor, cache, and memory architectures * Parallel […]

Jul, 3

Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators based on a Domain-Specific Language for Medical Imaging

An efficient memory bandwidth utilization for GPU accelerators is crucial for memory bound applications. In medical imaging, the performance of many kernels is limited by the available memory bandwidth since only a few operations are performed per pixel. For such kernels only a fraction of the compute power provided by GPU accelerators can be exploited […]

CUDA

•

OpenCL

Jul, 3

Two Stage Data Mining Technique for Fast Monsoon Onset Prediction

The onset of monsoon is eagerly awaited in the Indian sub-continent as it has deep impact in the economic and social domain and hence has been monitored and studied in great depth. With the advent of satellite imagery, it’s now possible to monitor the different parameters which affect or gets affected by the monsoon in […]

CUDA

Jul, 3

Parallel Processing using FPGAs and GPUs

This report includes use of parallel architectures like that of the Graphic Processing Units (GPU) for general purpose computations. It also includes, filter design using Field Programmable Gate Arrays exploiting its, inherently parallel nature. Implementation of Least Mean Square filters, which is an adaptive filter algorithm, is done using Xilinx Virtex 5 FPGA, and tested […]

CUDA

Jul, 3

Using OpenCL: Programming Massively Parallel Computers

In 2011 many computer users were exploring the opportunities and the benefits of the massive parallelism offered by heterogeneous computing. In 2000 the Khronos Group, a not-for-profit industry consortium, was founded to create standard open APIs for parallel computing, graphics and dynamic media. Among them has been OpenCL, an open system for programming heterogeneous computers […]

OpenCL

Jul, 3

On the Use of GPUs in Realizing Cost-Effective Distributed RAID

The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Computing (HPC) storage systems, which are at the forefront of handling the data deluge, typically employ hardware RAID at the backend. However, such solutions are costly, do not ensure end-to-end data integrity, and […]

CUDA

Jul, 2

kANN on the GPU with Shifted Sorting

We describe the implementation of a simple method for finding k approximate nearest neighbors (ANNs) on the GPU. While the performance of most ANN algorithms depends heavily on the distributions of the data and query points, our approach has a very regular data access pattern. It performs as well as state of the art methods […]

CUDA

Jul, 2

Acceleration of bilateral filtering algorithm for manycore and multicore architectures

This work explores multicore and manycore acceleration for the embarrassingly parallel, compute-intensive bilateral filtering kernel. For manycore architectures, we have created a pair-symmetric algorithm to avoid redundant calculations. For multicore architectures, we improve the algorithm by use of low level single instruction multiple data (SIMD) parallelism across multiple threads. We propose architecture specific optimizations, such […]

CUDA

Jul, 2

Deformation of skeleton based implicit objects

In this paper we present a precise contact modeling environment for skeleton based implicit objects. To render the scene composed of these implicit objects, we have implemented the state-of-the-art raycasting algorithm, called marching points, on GPU using CUDA. Further, we introduce how to interactively deform the implicit objects when they collide. To achieve this we […]

CUDA

Jul, 2

Halo Gathering Scalability for Large Scale Multi-dimensional Sznajd Opinion Models Using Data Parallelism with GPUs

The Sznajd model of opinion formation exhibits complex phase transitional and growth behaviour and can be studied with numerical simulations on a number of different network structures. Large system sizes and detailed statistical sampling of the model both require data-parallel computing to accelerate simulation performance. Data structures and computational performance issues are reported for simulations […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Fast GPU-Based Motion Estimation Algorithm for H.264/AVC

GPU Parallelization of an Unstructured Overset Grid Incompressible Navier-Stokes Solver for Moving Bodies

The 19th IEEE International Symposium on High Performance Computer Architecture Collocated with PPoPP-2013 and CGO-2013, HPCA-2013

Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators based on a Domain-Specific Language for Medical Imaging

Two Stage Data Mining Technique for Fast Monsoon Onset Prediction

Parallel Processing using FPGAs and GPUs

Using OpenCL: Programming Massively Parallel Computers

On the Use of GPUs in Realizing Cost-Effective Distributed RAID

kANN on the GPU with Shifted Sorting

Acceleration of bilateral filtering algorithm for manycore and multicore architectures

Deformation of skeleton based implicit objects

Halo Gathering Scalability for Large Scale Multi-dimensional Sznajd Opinion Models Using Data Parallelism with GPUs

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)