high performance computing on graphics processing units: hgpu.org

Posts

Dec, 3

A Multi-View Stereo Implementation on Massively Parallel Hardware

In recent years, we have seen several approaches to implement hardware-accelerated multi-view stereo (MVS) algorithms employing the graphics processing unit (GPU) for fast and parallel computation. To our knowledge, all of them resort to various rendering passes to perform their computations. In contrast, modern GPU compute frameworks give access to the massively parallel compute capability […]

CUDA

Dec, 3

Impact of Floating-Point Precision on Boundary Layer Instabilities Modeled on Fermi GPU

We have implemented two-dimensional and three-dimensional Rayleigh-Benard convection for infinite Prandtl number, appropriate for the Earth’s mantle, on a single Fermi GPU by utilizing a second-order finite-difference method. The code was written in C for CUDA and heavily itilized optimized CUBLAS routines. These implementations enjoyed performance on the order 535 GFLOP/s and 100 GFLOP/s in […]

CUDA

Dec, 3

Implementations of a Parallel Algorithm for Computing Euclidean Distance Map in Multicore Processors and GPUs

Given a 2-D binary image of size nxn, Euclidean Distance Map (EDM) is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel. It is known that a sequential algorithm can compute the EDM in O(n2) and thus this algorithm is optimal. Also, work-time […]

CUDA

Dec, 3

Design and Optimization of Hybrid MD5-Blowfish Encryption on GPUs

Nowadays, data has been playing an indispensable role in almost all industrial areas. Data integrity and security over Internet, other types of media and applications have become the major concerns in computer world. If confidential or sensitive data is forged, juggled or wiretapped by an attacker, capital losses might occur. Encryption is one of the […]

CUDA

Dec, 3

Mapping the Arnold web with a graphic processing unit

The Arnold diffusion constitutes a dynamical phenomenon which may occur in the phase space of a non-integrable Hamiltonian system whenever the number of the system degrees of freedom is M>=3. The diffusion is mediated by a web-like structure of resonance channels, which penetrates the phase space and allows the system to explore the whole energy […]

CUDA

Dec, 3

Ultrasound Image Simulation with GPU-based Ray Tracing

Medical simulators are gaining importance because the experience and skills necessary to perform many of the medical procedures are difficult to obtain due to patient safety and ethical issues. With the development of graphic cards, stereographic and haptic devices, more VR-based simulators are being created. We are developing an interactive ultrasound image simulation that include […]

Dec, 3

Comparison of CPML Implementations for the GPU-Accelerated FDTD Solver

Three distinctively different implementations of convolutional perfectly matched layer for the FDTD method on CUDA enabled graphics processing units are presented. All implementations store additional variables only inside the convolutional perfectly matched layers, and the computational speeds scale according to the thickness of these layers. The merits of the different approaches are discussed, and a […]

CUDA

Dec, 3

EM+TV for Reconstruction of Cone-beam CT with Curved Detectors using GPU

Computerized tomography (CT) plays a critical role in the practice of modern medicine. However, the radiation associated with CT is significant. Methods that can enable CT imaging at reduced radiation exposure without sacrificing image quality are therefore extremely important. This paper introduces a novel method for enabling improved reconstruction at lower radiation exposure levels. The […]

CUDA

Dec, 2

On the design of architecture-aware algorithms for emerging applications

This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from […]

CUDA

Dec, 2

Effective GPU Strategies for LU Decomposition

GPUs are becoming an attractive computing platform not only for traditional graphics computation but also for general-purpose computation because of the computational power, programmability and comparatively low cost of modern GPUs. This has lead to a variety of complex GPGPU applications with significant performance improvements. The LU decomposition represents a fundamental step in many computationally […]

CUDA

Dec, 2

Parallel-META: A high-performance computational pipeline for metagenomic data analysis

Metagenomics method directly sequences and analyzes genome information from microbial communities. There are usually more than hundreds of genomes from different microbial species in the same community, and the main computational tasks for metagenomics data analysis include taxonomical and functional component of these genomes in the microbial community. Metagenomic data analysis is both data- and […]

Dec, 2

Efficient Cubic B-spline Image Interpolation on a GPU

Application of geometric transformation to images requires an interpolation step. When applied to image rotation, the presently most efficient GPU implementation for the cubic spline image interpolation still cost about 10 times as much as linear interpolation. This implementation involves two steps: a prefilter step performs a two-pass forward-backward recursive filter, then a cubic polynomial […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

A Multi-View Stereo Implementation on Massively Parallel Hardware

Impact of Floating-Point Precision on Boundary Layer Instabilities Modeled on Fermi GPU

Implementations of a Parallel Algorithm for Computing Euclidean Distance Map in Multicore Processors and GPUs

Design and Optimization of Hybrid MD5-Blowfish Encryption on GPUs

Mapping the Arnold web with a graphic processing unit

Ultrasound Image Simulation with GPU-based Ray Tracing

Comparison of CPML Implementations for the GPU-Accelerated FDTD Solver

EM+TV for Reconstruction of Cone-beam CT with Curved Detectors using GPU

On the design of architecture-aware algorithms for emerging applications

Effective GPU Strategies for LU Decomposition

Parallel-META: A high-performance computational pipeline for metagenomic data analysis

Efficient Cubic B-spline Image Interpolation on a GPU

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)