high performance computing on graphics processing units: hgpu.org

Posts

Mar, 28

GPU architecture evaluation for multispectral and hyperspectral image analysis

Graphical Processing Units (GPU) architectures are massively used for resource-intensive computation. Initially dedicated to imaging, vision and graphics, these architectures serve nowadays a wide range of multi-purpose applications. The GPU structure, however, does not suit all applications. This can lead to performance shortage. Among several applications, the aim of this work is to analyze GPU […]

Mar, 28

GPU accelerated real time polarimetric image processing through the use of CUDA

Recent advancements in semi-conductor fabrication has led to a dramatic increase in the size of data sets of advanced imaging sensors. While increased pixel counts leads to greater area coverage and higher resolution, it also results in higher image processing time. If real-time image processing is required, power and size requirements go up as large […]

CUDA

Mar, 28

GPU Based Spot Noise Parallel Algorithm for 2D Vector Field Visualization

Graphic Processing Unit (GPU) has involved into a parallel computation for it’s massively multi threaded architecture. Due to its high computational power, GPU has been used to deal with many problems that can be easily parallelized. This paper will present a GPU based spot noise parallel algorithm for 2D vector field visualization. It uses spot […]

CUDA

Mar, 28

A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU

Calculating Euclidean distance matrix is a data intensive operation and becomes computationally prohibitive for large datasets. Recent development of Graphics Processing Units (GPUs) has produced superb performance on scientific computing problems using massive parallel processing cores. However, due to the limited size of device memory, many GPU based algorithms have low capability in solving problems […]

Mar, 28

GPU-Based Fast Minimum Spanning Tree Using Data Parallel Primitives

Minimum spanning tree is a classical problem in graph theory that plays a key role in a broad domain of applications. This paper proposes a minimum spanning tree algorithm using Prim’s approach on Nvidia GPU under CUDA architecture. By using new developed GPU-based Min-Reduction data parallel primitive in the key step of the algorithm, higher […]

CUDA

Mar, 28

A Batched GPU Algorithm for Set Intersection

Intersection of inverted lists is a frequently used operation in search engine systems. Efficient CPU and GPU intersection algorithms for large problem size are well studied. We propose an efficient GPU algorithm for high performance intersection of inverted index lists on CUDA platform. This algorithm feeds queries to GPU in batches, thus can take full […]

CUDA

Mar, 28

GMH: A Message Passing Toolkit for GPU Clusters

Driven by the market demand for high-definition 3D graphics, commodity graphics processing units (GPUs) have evolved into highly parallel, multi-threaded, many-core processors, which are ideal for data parallel computing. Many applications have been ported to run on a single GPU with tremendous speedups using general C-style programming languages such as CUDA. However, large applications require […]

CUDA

Mar, 28

Two improved GPU acceleration strategies for force-directed graph layout

Force directed approach is one of the most widely used methods in graph drawing research. However, the running time is increased intolerablely along with the enlargement of the graph size, which restricts the algorithm’s practicability. By the aid of GPU (graphics processing unit) computing platform, we can speed-up the graph layout with low cost, but […]

Mar, 28

Acceleration of Hessenberg Reduction for Nonsymmetric Eigenvalue Problems Using GPU

Solution of large-scale dense nonsymmetric eigenvalue problem is required in many areas of scientific and engineering computing, such as vibration analysis of automobiles and analysis of electronic diffraction patterns. In this study, we focus on the Hessenberg reduction step and consider accelerating it using GPU. Our main strategy is to use the CUBLAS, an optimized […]

CUDA

Mar, 28

Efficient Discrete Range Searching primitives on the GPU with applications

Graphics processing units provide a large computational power at a very low price which position them as an ubiquitous accelerator. Efficient primitives that can expand the range of operations performed on the GPU are thus important. Discrete Range Searching(DRS) is one such primitive with direct applications to string processing, document and text retrieval systems, and […]

Mar, 28

Graphical Processing Units (GPU) acceleration of finite-difference frequency-domain (FDFD) technique

The evolution of the graphics processing units (GPU) driven by the computer games business brought a graphics hardware as a high performance, programmable and non-expensive chips. Nowadays, the graphic card has a truly programmable architecture which allows to process data with high parallelism and high memory access rate. That is the key motivation fact for […]

Mar, 28

Overcoming the GPU memory limitation on FDTD through the use of overlapping subgrids

The method Finite Difference Time Domain (FDTD) is widely used in electromagnetic simulations. Since this method is a data intensive and computation intensive problem, there are a lot of initiatives to improve the scalability and the performance of the FDTD. Specifically the use of GPU to accelerate the FDTD is in focus, which has a […]

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

GPU architecture evaluation for multispectral and hyperspectral image analysis

GPU accelerated real time polarimetric image processing through the use of CUDA

GPU Based Spot Noise Parallel Algorithm for 2D Vector Field Visualization

A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU

GPU-Based Fast Minimum Spanning Tree Using Data Parallel Primitives

A Batched GPU Algorithm for Set Intersection

GMH: A Message Passing Toolkit for GPU Clusters

Two improved GPU acceleration strategies for force-directed graph layout

Acceleration of Hessenberg Reduction for Nonsymmetric Eigenvalue Problems Using GPU

Efficient Discrete Range Searching primitives on the GPU with applications

Graphical Processing Units (GPU) acceleration of finite-difference frequency-domain (FDFD) technique

Overcoming the GPU memory limitation on FDTD through the use of overlapping subgrids

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)