high performance computing on graphics processing units: hgpu.org

Posts

Jan, 2

Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer’s Disease

Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e. for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: […]

OpenCL

Jan, 2

Interactive Ray-tracing Based on OptiX to Visualize Signed Distance Fields

We propose a parallel ray-tracing technique to visualize signed distance fields generated from triangular meshes based on NVIDIA OptiX. Our method visualizes signed distance fields with various distance offset values at interactive rates (2-12 fps). Our method utilizes a parallel kd-tree implementation to query the nearest triangle and the sphere tracing method to visualize the […]

CUDA

•

OpenGL

Jan, 2

A Unified Rolling Shutter and Motion Blur Model for 3D Visual Registration

Motion blur and rolling shutter deformations both inhibit visual motion registration, whether it be due to a moving sensor or a moving target. Whilst both deformations exist simultaneously, no models have been proposed to handle them together. Furthermore, neither deformation has been considered previously in the context of monocular full-image 6 degrees of freedom registration […]

OpenCL

Jan, 2

Optimal polygonal L1 linearization and fast interpolation of nonlinear systems

The analysis of complex nonlinear systems is often carried out using simpler piecewise linear representations of them. We propose a principled and practical technique to linearize and evaluate arbitrary continuous nonlinear functions using polygonal (continuous piecewise linear) models under the L1 norm. A thorough error analysis is developed to guide an optimal design of two […]

CUDA

Jan, 2

Achieving TeraCUPS on Longest Common Subsequence Problem using GPGPUs

In this paper, we describe a novel technique to optimize longest common subsequence (LCS) algorithm for one-to-many matching problem on GPUs by transforming the computation into bit-wise operations and a post-processing step. The former can be highly optimized and achieves more than a trillion operations (cell updates) per second (CUPS)-a first for LCS algorithms. The […]

CUDA

Dec, 31

4kUHD H264 wireless live video streaming using CUDA

Ultra-High definition video streaming has been explored in recent years. Most recently the possibility of 4kUHD video streaming over wireless 802.11n was presented, using pre-encoded video. Live encoding for streaming using x264 has proven to be very slow. The use of parallel encoding has been explored to speed up the process using CUDA. However there […]

CUDA

Dec, 31

High-Speed Turbo Equalization for GPP-based Software Defined Radios

High data rate waveforms for software defined radios (SDR) have to cope with frequency selective fading due to the mobile use in different harsh transmission environments. The received signal needs to be equalized in order to restore the transmitted information. Turbo equalization is a promising approach to deal with the inter-symbol interference occurring at the […]

OpenCL

Dec, 31

Efficient Processing of MRFs for Unconstrained-Pose Face Recognition

The paper addresses the problem of pose-invariant recognition of faces via an MRF matching model. Unlike previous costly matching approaches, the proposed algorithm employs effective techniques to reduce the MRF inference time. To this end, processing is done in a parallel fashion on a GPU employing a dual decomposition framework. The optimisation is further accelerated […]

CUDA

Dec, 31

Improved Sequential & Parallel Designs and Implementations of the Eight Direction Prewitt Edge Detection

The exponential growth of the world’s technological industry has an important impact on our lives; we are witnessing an expansion in computer power combined with a noticeable development of digital camera capabilities. To keep up with the requirements of the digitalized world, the focus has been set on the computer vision field. One of the […]

Dec, 31

Real Time Background Subtraction On GPU Using CUDA

Although trivial Background Subtraction algorithms which are median- based, Gaussian-based and Kernel density-based approaches can perform quite fast, but they are not roust enough to be used in various computer vision problems. Some complex algorithms usually give better results, but are too slow to be applied to real-time systems. Here, we examine the GPU architecture […]

CUDA

Dec, 29

Developing a High Performance Software Library with MPI and CUDA for Matrix Computations

Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude comparing to optimized CPU implementations. Hybrid approaches that combine the message passing model with the shared memory model […]

CUDA

Dec, 29

Optimizing LZSS Compression on GPGPUs

In this paper, we present an algorithm and provide design improvements needed to port the serial Lempel-Ziv-Storer-Szymanski (LZSS), lossless data compression algorithm, to a parallelized version suitable for general purpose graphic processor units (GPGPU), specifically for NVIDIA’s CUDA Framework. The two main stages of the algorithm, substring matching and encoding, are studied in detail to […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer’s Disease

Interactive Ray-tracing Based on OptiX to Visualize Signed Distance Fields

A Unified Rolling Shutter and Motion Blur Model for 3D Visual Registration

Optimal polygonal L1 linearization and fast interpolation of nonlinear systems

Achieving TeraCUPS on Longest Common Subsequence Problem using GPGPUs

4kUHD H264 wireless live video streaming using CUDA

High-Speed Turbo Equalization for GPP-based Software Defined Radios

Efficient Processing of MRFs for Unconstrained-Pose Face Recognition

Improved Sequential & Parallel Designs and Implementations of the Eight Direction Prewitt Edge Detection

Real Time Background Subtraction On GPU Using CUDA

Developing a High Performance Software Library with MPI and CUDA for Matrix Computations

Optimizing LZSS Compression on GPGPUs

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)