high performance computing on graphics processing units: hgpu.org

Posts

Mar, 17

Accelerated ray tracing for radiotherapy dose calculations on a GPU

PURPOSE: The graphical processing unit (GPU) on modern graphics cards offers the possibility of accelerating arithmetically intensive tasks. By splitting the work into a large number of independent jobs, order-of-magnitude speedups are reported. In this article, the possible speedup of PLATO’s ray tracing algorithm for dose calculations using a GPU is investigated. METHODS: A GPU […]

CUDA

Mar, 17

Task Scheduling of Parallel Processing in CPU-GPU Collaborative Environment

With the rapid development of GPU (Graphics Processor Unit) in recent years, GPGPU (General-Purpose computation on GPU) has become an important technique in scientific research. However GPU might well be seen more as a cooperator than a rival to CPU. Therefore, we focus on exploiting the power of CPU and GPU in solving generic problems […]

Mar, 17

Language virtualization for heterogeneous parallel computing

As heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatiblemix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these models do little to shield developers from the difficult problems of parallelization, data decomposition and machine-specific details. Most programmersare having a difficult time using these programming models […]

Mar, 17

Implementation of algorithms with a fine-grained parallelism on GPUs

The efficiency of implementations of algorithms with a fine-grained parallelism on GPUs that support the CUDA architecture is studied. Cellular automata and difference schemes are used for testing. Several versions of implementations are proposed and their efficiency is analyzed. An example of GPU application for modeling the process of carbon dioxide oxidation on the catalyst […]

CUDA

Mar, 17

RDMA-Based Job Migration Framework for MPI over InfiniBand

Coordinated checkpoint and recovery is a common approach to achieve fault tolerance on large-scale systems. The traditional mechanism dumps the process image to a local disk or a central storage area of all the processes involved in the parallel job. When a failure occurs, the processes are restarted and restored to the latest checkpoint image. […]

Mar, 17

Live, Video-Rate Super-Resolution Microscopy Using Structured Illumination and Rapid GPU-Based Parallel Processing

Structured illumination fluorescence microscopy is a powerful super-resolution method that is capable of achieving a resolution below 100 nm. Each super-resolution image is computationally constructed from a set of differentially illuminated images. However, real-time application of structured illumination microscopy (SIM) has generally been limited due to the computational overhead needed to generate super-resolution images. Here, […]

CUDA

Mar, 17

Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters

The lattice-Boltzmann method is well suited for implementation in single-instruction multiple-data (SIMD) environments provided by general purpose graphics processing units (GPGPUs). This paper discusses the integration of these GPGPU programs with OpenMP to create lattice-Boltzmann applications for multi-GPU clusters. In addition to the standard single-phase single-component lattice-Boltzmann method, the performances of more complex multiphase, multicomponent […]

CUDA

Mar, 17

Memory-Scalable GPU Spatial Hierarchy Construction

Recent GPU algorithms for constructing spatial hierarchies achieve promising performance for moderately complex models by using the BFS (breadth-first search) construction order. While being able to exploit the massive parallelism on the GPU, the BFS order consumes excessive GPU memory, which becomes a serious issue. In this paper, we propose to use the PBFS (partial […]

CUDA

Mar, 17

CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography

This paper presents a study of the efficiency in applying modern Graphics Processing Units in symmetric key cryptographic solutions. It describes both traditional style approaches based on the OpenGL graphics API and new ones based on the recent technology trends of major hardware vendors. It presents an efficient implementation of the Advanced Encryption Standard (AES) […]

CUDA

•

OpenGL

Mar, 17

High-Throughput Transaction Executions on Graphics Processors

OLTP (On-Line Transaction Processing) is an important business system sector in various traditional and emerging online services. Due to the increasing number of users, OLTP systems require high throughput for executing tens of thousands of transactions in a short time period. Encouraged by the recent success of GPGPU (General-Purpose computation on Graphics Processors), we propose […]

CUDA

Mar, 16

Mutual information computation and maximization using GPU

We present a GPU implementation to compute both mutual information and its derivatives. Mutual information computation is a highly demanding process due to the enormous number of exponential computations. It is therefore the bottleneck in many image registration applications. However, we show that these computations are fully parallizable and can be efficiently ported onto the […]

CUDA

Mar, 16

Direct evaluation of NURBS curves and surfaces on the GPU

This paper presents a new method to evaluate and display trimmed NURBS surfaces using the Graphics Processing Unit (GPU). Trimmed NURBS surfaces, the de facto standard in commercial 3D CAD modeling packages, are currently tessellated into triangles before being sent to the graphics card for display since there is no native hardware support for NURBS. […]

OpenGL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Accelerated ray tracing for radiotherapy dose calculations on a GPU

Task Scheduling of Parallel Processing in CPU-GPU Collaborative Environment

Language virtualization for heterogeneous parallel computing

Implementation of algorithms with a fine-grained parallelism on GPUs

RDMA-Based Job Migration Framework for MPI over InfiniBand

Live, Video-Rate Super-Resolution Microscopy Using Structured Illumination and Rapid GPU-Based Parallel Processing

Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters

Memory-Scalable GPU Spatial Hierarchy Construction

CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography

High-Throughput Transaction Executions on Graphics Processors

Mutual information computation and maximization using GPU

Direct evaluation of NURBS curves and surfaces on the GPU

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)