high performance computing on graphics processing units: hgpu.org

Posts

Aug, 4

High Quality Image Reconstruction of Point Models

The parallel and discrete nature of many image-based techniques, such as interpolation methods, are highly suitable for GPU implementations. An important advantage is their bounded complexity by the screen resolution as opposed to the data size. Consequently, the direct surface reconstruction of points using image-based methods is an attractive solution to interactively render large data-sets. […]

OpenGL

Aug, 4

Using Graphics Processors to Accelerate the Solution of Out-of-Core Linear Systems

We investigate the use of graphics processors (GPUs) to accelerate the solution of large-scale linear systems when the problem data is larger than the main memory of the system and storage on disk is employed. Our solution addresses the programmability problem with a combination of the high-level approach in libflame (the FLAME library for dense […]

CUDA

Aug, 4

Efficient MIMD architectures for high-performance ray tracing

Ray tracing efficiently models complex illumination effects to improve visual realism in computer graphics. Typical modern GPUs use wide SIMD processing, and have achieved impressive performance for a variety of graphics processing including ray tracing. However, SIMD efficiency can be reduced due to the divergent branching and memory access patterns that are common in ray […]

Aug, 3

A Variant of Mersenne Twister Suitable for Graphic Processors

The author proposes pseudorandom number generators suitable to execute on a graphic processor. They generate pseudorandom numbers in device memory on graphic processors. Each generator uses shared memory on graphic processors as its internal state space, and uses constant memory as a look-up table for a linear transformation. Output formats of the generator are 32-bit […]

CUDA

Aug, 3

High-Performance Pseudo-Random Number Generation on Graphics Processing Units

This work considers the deployment of pseudo-random number generators (PRNGs) on graphics processing units (GPUs), developing an approach based on the xorgens generator to rapidly produce pseudo-random numbers of high statistical quality. The chosen algorithm has configurable state size and period, making it ideal for tuning to the GPU architecture. We present a comparison of […]

CUDA

Aug, 3

Contour-based algorithm for vectorization of satellite images

Process of object recognition in satellite images of high resolution is a complex task associated with a time consumption and complexity of the operator’s work. This paper describes an innovative approach for solving this problem. Based on monochromatic high-resolution satellite images (in the process of using data from the QuickBird satellite with a maximum resolution […]

CUDA

Aug, 3

Exploring reconfigurable architectures for explicit finite difference option pricing models

This paper explores the application of reconfigurable hardware and graphics processing units (GPUs) to the acceleration of financial computation using the finite difference (FD) method. A parallel pipelined architecture has been developed to support concurrent valuation of independent options with high pricing throughput. Our FPGA implementation running at 106 MHz on an xc4vlx160 device demonstrates […]

CUDA

Aug, 3

PIR: PMaC’s Idiom Recognizer

The speed of the memory subsystem often constrains the performance of large-scale parallel applications. Experts tune such applications to use hierarchical memory subsystems efficiently. Hardware accelerators, such as GPUs, can potentially improve memory performance beyond the capabilities of traditional hierarchical systems. However, the addition of such specialized hardware complicates code porting and tuning. During porting […]

Aug, 3

Surface quality assessment of subdivision surfaces on programmable graphics hardware

We propose a method of subdivision surface quality assessment by reflection lines on programmable graphics hardware (GPU). Using reflection lines is effective for surface quality assessment because the shapes of these lines are changed according to a slight variance of surface shapes. This fact also implies that reflection lines should be calculated precisely. We introduce […]

Aug, 3

A fast hybrid time-synchronous/event approach to parallel discrete event simulation of queuing networks

The trend in computing architectures has been toward multi-core central processing units (CPUs) and graphics processing units (GPUs). An affordable and highly parallelizable GPU is practical example of Single Instruction, Multiple Data (SIMD) architectures oriented toward stream processing. While the GPU architectures and languages are fairly easily employed for inherently time-synchronous based simulation models, it […]

CUDA

Aug, 3

Simulating 3-D Lung Dynamics Using a Programmable Graphics Processing Unit

Medical simulations of lung dynamics promise to be effective tools for teaching and training clinical and surgical procedures related to lungs. Their effectiveness may be greatly enhanced when visualized in an augmented reality (AR) environment. However, the computational requirements of AR environments limit the availability of the central processing unit (CPU) for the lung dynamics […]

Aug, 3

Programming finite-difference time-domain for graphics processor units using compute unified device architecture

Recently graphic processing units (GPU’s) have become the hardware platforms to perform high performance scientific computing them. The unavailability of high level languages to program graphics cards had prevented the widespread use of GPUs. Relatively recently Compute Unified Device Architecture (CUDA) development environment has been introduced by NVIDIA and made GPU programming much easier. This […]

CUDA

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

High Quality Image Reconstruction of Point Models

Using Graphics Processors to Accelerate the Solution of Out-of-Core Linear Systems

Efficient MIMD architectures for high-performance ray tracing

A Variant of Mersenne Twister Suitable for Graphic Processors

High-Performance Pseudo-Random Number Generation on Graphics Processing Units

Contour-based algorithm for vectorization of satellite images

Exploring reconfigurable architectures for explicit finite difference option pricing models

PIR: PMaC’s Idiom Recognizer

Surface quality assessment of subdivision surfaces on programmable graphics hardware

A fast hybrid time-synchronous/event approach to parallel discrete event simulation of queuing networks

Simulating 3-D Lung Dynamics Using a Programmable Graphics Processing Unit

Programming finite-difference time-domain for graphics processor units using compute unified device architecture

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)