high performance computing on graphics processing units: hgpu.org

Posts

Jan, 3

Wavelet Encoding and Multi-GPU Programming

We investigate compression of large-volume spatial data using the wavelet transform, computed massively in parallel on NVIDIA graphics processing units (GPUs). In particular, Haar basis wavelets are used to achieve compression ratios of [100x] or more. Computation is done over a set of computing nodes consisting of multiple nodes and multiple GPUs per node. Significantly […]

CUDA

Jan, 3

Adhoc On-Demand Distance Vector Protocol For Energy Efficiency

The use of computer networks is drastically growing and the need for enhancing the existing network protocols and enforcing communication security thus is increasing. Tools like network simulators are used by researchers in order to test new scenarios and protocols in a controlled and reproducible environment. They allow the user to represent various topologies, simulate […]

CUDA

Jan, 3

Accelerating Simulation Codes through the GeMTC Framework

GPU Computing utilizes high level language to run sequential part of the code on the CPU as well as speeds up parallel part via running it on GPUs but GPUs are SIMD by default which means they can run only single instruction on multiple data. The introduction of GEMTC framework [1] addresses these limitations by […]

CUDA

Jan, 3

Nemo: A parallelized Lagrangian particle-tracking model

Lagrangian particle-tracking models are a computationally intensive, but massively parallelizable method for investigating marine larval dispersal processes, seed dispersal of plants, or a variety of other material transport processes. In order to fully capture the distribution of potential dispersal patterns, highly efficient models with the capacity to simulate tens of millions or more particles are […]

CUDA

Jan, 2

Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer’s Disease

Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e. for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: […]

OpenCL

Jan, 2

Interactive Ray-tracing Based on OptiX to Visualize Signed Distance Fields

We propose a parallel ray-tracing technique to visualize signed distance fields generated from triangular meshes based on NVIDIA OptiX. Our method visualizes signed distance fields with various distance offset values at interactive rates (2-12 fps). Our method utilizes a parallel kd-tree implementation to query the nearest triangle and the sphere tracing method to visualize the […]

CUDA

•

OpenGL

Jan, 2

A Unified Rolling Shutter and Motion Blur Model for 3D Visual Registration

Motion blur and rolling shutter deformations both inhibit visual motion registration, whether it be due to a moving sensor or a moving target. Whilst both deformations exist simultaneously, no models have been proposed to handle them together. Furthermore, neither deformation has been considered previously in the context of monocular full-image 6 degrees of freedom registration […]

OpenCL

Jan, 2

Optimal polygonal L1 linearization and fast interpolation of nonlinear systems

The analysis of complex nonlinear systems is often carried out using simpler piecewise linear representations of them. We propose a principled and practical technique to linearize and evaluate arbitrary continuous nonlinear functions using polygonal (continuous piecewise linear) models under the L1 norm. A thorough error analysis is developed to guide an optimal design of two […]

CUDA

Jan, 2

Achieving TeraCUPS on Longest Common Subsequence Problem using GPGPUs

In this paper, we describe a novel technique to optimize longest common subsequence (LCS) algorithm for one-to-many matching problem on GPUs by transforming the computation into bit-wise operations and a post-processing step. The former can be highly optimized and achieves more than a trillion operations (cell updates) per second (CUPS)-a first for LCS algorithms. The […]

CUDA

Dec, 31

4kUHD H264 wireless live video streaming using CUDA

Ultra-High definition video streaming has been explored in recent years. Most recently the possibility of 4kUHD video streaming over wireless 802.11n was presented, using pre-encoded video. Live encoding for streaming using x264 has proven to be very slow. The use of parallel encoding has been explored to speed up the process using CUDA. However there […]

CUDA

Dec, 31

High-Speed Turbo Equalization for GPP-based Software Defined Radios

High data rate waveforms for software defined radios (SDR) have to cope with frequency selective fading due to the mobile use in different harsh transmission environments. The received signal needs to be equalized in order to restore the transmitted information. Turbo equalization is a promising approach to deal with the inter-symbol interference occurring at the […]

OpenCL

Dec, 31

Efficient Processing of MRFs for Unconstrained-Pose Face Recognition

The paper addresses the problem of pose-invariant recognition of faces via an MRF matching model. Unlike previous costly matching approaches, the proposed algorithm employs effective techniques to reduce the MRF inference time. To this end, processing is done in a parallel fashion on a GPU employing a dual decomposition framework. The optimisation is further accelerated […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Wavelet Encoding and Multi-GPU Programming

Adhoc On-Demand Distance Vector Protocol For Energy Efficiency

Accelerating Simulation Codes through the GeMTC Framework

Nemo: A parallelized Lagrangian particle-tracking model

Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer’s Disease

Interactive Ray-tracing Based on OptiX to Visualize Signed Distance Fields

A Unified Rolling Shutter and Motion Blur Model for 3D Visual Registration

Optimal polygonal L1 linearization and fast interpolation of nonlinear systems

Achieving TeraCUPS on Longest Common Subsequence Problem using GPGPUs

4kUHD H264 wireless live video streaming using CUDA

High-Speed Turbo Equalization for GPP-based Software Defined Radios

Efficient Processing of MRFs for Unconstrained-Pose Face Recognition

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)