high performance computing on graphics processing units: hgpu.org

Posts

Nov, 7

Connectivity-Based Segmentation for GPU-Accelerated Mesh Decompression

We present a novel algorithm to partition large 3D meshes for GPU-accelerated decompression. Our formulation focuses on minimizing the replicated vertices between patches, and balancing the numbers of faces of patches for efficient parallel computing. First we generate a topology model of the original mesh and remove vertex positions. Then we assign the centers of […]

CUDA

Nov, 7

GPU Virtualization

In modern computing, the Graphical Processing Unit (GPU) has proven its worth beyond that of graphics rendering. Its usage is extended into the field of general purpose computing, where applications exploit the GPU’s massive parallelism to accelerate their tasks. Meanwhile, Virtual Machines (VM) continue to provide utility and security by emulating entire computer hardware platforms […]

CUDA

Nov, 6

Adapting Irregular Computations to Large CPU-GPU Clusters in the MADNESS Framework

Graphics Processing Units (GPUs) are becoming the workhorse of scalable computations. MADNESS is a scientific framework used especially for computational chemistry. Most MADNESS applications use operators that involve many small tensor computations, resulting in a less regular organization of computations on GPUs. A single GPU kernel may have to multiply by hundreds of small square […]

CUDA

Nov, 6

A Framework for Automated Generation of Specialized Function Variants

Efficient large-scale scientific computing requires efficient code, yet optimizing code to render it efficient simultaneously renders the code less readable, less maintainable, less portable, and requires detailed knowledge of low-level computer architecture, which the developers of scientific applications may lack. The necessary knowledge is subject to change over time as new architectures, such as GPGPU […]

CUDA

Nov, 6

All-Pairs Shortest Path Algorithms Using CUDA

Utilising graph theory is a common activity in computer science. Algorithms that perform computations on large graphs are not always cost effective, requiring supercomputers to achieve results in a practical amount of time. Graphics Processing Units provide a cost effective alternative to supercomputers, allowing parallel algorithms to be executed directly on the Graphics Processing Unit. […]

CUDA

Nov, 6

Design and Development of an Efficient H. 264 Video Encoder for CPU/GPU using OpenCL

Video codecs have undergone dramatic improvements and increased in complexity over the years owing to various commercial products like mobiles and Tablet PCs. With the emergence of standards, such H.264 which has emerged as the de facto standard for video, uniformity in the delivery of video is observed. With constraints of memory and transmission bandwidth, […]

OpenCL

Nov, 6

High-precision Monte Carlo study of the three-dimensional XY model on GPU

We perform large-scale Monte Carlo simulations of the classical XY model on a three-dimensional $Ltimes L times L$ cubic lattice using the graphics processing unit (GPU). By the combination of Metropolis single-spin flip, over-relaxation and parallel-tempering methods, we simulate systems up to L=160. Performing the finite-size scaling analysis, we obtain estimates of the critical exponents […]

CUDA

Nov, 5

CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs

If you need to learn CUDA but don’t have experience with parallel computing, CUDA Programming: A Developer’s Introduction offers a detailed guide to CUDA with a grounding in parallel fundamentals. It starts by introducing CUDA and bringing you up to speed on GPU parallelism and hardware, then delving into CUDA installation. Chapters on core concepts […]

CUDA

Nov, 5

A survey of GPU-based medical image computing techniques

Medical imaging currently plays a crucial role throughout the entire clinical applications from medical scientific research to diagnostics and treatment planning. However, medical imaging procedures are often computationally demanding due to the large three-dimensional (3D) medical datasets to process in practical clinical applications. With the rapidly enhancing performances of graphics processors, improved programming support, and […]

Nov, 5

Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures

Markov-Chain Monte-Carlo (MCMC) methods are an important class of simulation techniques, which execute a sequence of simulation steps, where each new step depends on the previous ones. Due to this fundamental dependency, MCMC methods are inherently hard to parallelize on any architecture. The upcoming generations of hybrid CPU/GPGPU architectures with their multi-core CPUs and tightly […]

CUDA

Nov, 5

Load-Balanced Multi-GPU Ambient Occlusion for Direct Volume Rendering

Ambient occlusion techniques were introduced to improve data comprehension by bringing soft fading shadows to the visualization of 3D datasets. They consist in attenuating light by considering the occlusion resulting from the presence of neighboring structures. Nevertheless they often come with an important precomputation cost, which prevents their use in interactive applications based on transfer […]

CUDA

Nov, 5

cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions, structures or objects are hard to map onto modern processor architectures efficiently. The work in this paper introduces a new abstract […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Connectivity-Based Segmentation for GPU-Accelerated Mesh Decompression

GPU Virtualization

Adapting Irregular Computations to Large CPU-GPU Clusters in the MADNESS Framework

A Framework for Automated Generation of Specialized Function Variants

All-Pairs Shortest Path Algorithms Using CUDA

Design and Development of an Efficient H. 264 Video Encoder for CPU/GPU using OpenCL

High-precision Monte Carlo study of the three-dimensional XY model on GPU

CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs

A survey of GPU-based medical image computing techniques

Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures

Load-Balanced Multi-GPU Ambient Occlusion for Direct Volume Rendering

cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)