high performance computing on graphics processing units: hgpu.org

Posts

Dec, 22

Improving the usability of hierarchical representations for interactively labeling large image data sets

Image recognition systems require large image data sets for the training process. The annotation of such data sets through users requires a lot of time and effort, and thereby presents the bottleneck in the development of recognition systems. In order to simplify the creation of image recognition systems it is necessary to develop interaction concepts […]

Dec, 22

Beyond Amdahl’s Law: An Objective Function That Links Multiprocessor Performance Gains To Delay and Energy

Beginning with Amdahl’s law, we derive a general objective function that links parallel processing performance gains at the system level, to energy and delay in the sub-system microarchitecture structures. The objective function employs parameterized models of computation and communication to represent the characteristics of processors, memories, and communications networks. The interaction of the latter microarchitectural […]

Dec, 22

A block-asynchronous relaxation method for graphics processing units

In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida […]

CUDA

Dec, 22

AES Encryption and Decryption Using Direct3D 10 API

Current video cards (GPUs – Graphics Processing Units) are very programmable, have become much more powerful than the CPUs and they are very affordable. In this paper, we present an implementation for the AES algorithm using Direct3D 10 certified GPUs. The graphics API Direct3D 10 is the first version that allows the use of integer […]

CUDA

Dec, 22

GPU-based parallel collision detection for real-time motion planning

We present parallel algorithms to accelerate collision queries for sample-based motion planning. Our approach is designed for current many-core GPUs and exploits the data-parallelism and multi-threaded capabilities. In order to take advantage of high number of cores, we present a clustering scheme and collision-packet traversal to perform efficient collision queries on multiple configurations simultaneously. Furthermore, […]

CUDA

Dec, 22

Efficient shallow water simulations on GPUs: Implementation, visualization, verification, and validation

In this paper, we present an efficient implementation of a state-of-the-art high-resolution explicit scheme for the shallow water equations on graphics processing units. The selected scheme is well-balanced, supports dry states, and is particularly suitable for implementation on graphics processing units. We verify and validate our implementation, and show that use of efficient single precision […]

CUDA

Dec, 22

Generative programming methods for parallel partial differential field equation solvers

This thesis describes a generative programming system that automatically constructs parallel simulations of complex systems that are based on field equations using finite differencing and explicit Runge-Kutta integration methods. Programming computational simulations by hand for different parallel architectures is both tedious and time consuming. Simulation frameworks struggle to target different architectures without losing performance. Automating […]

CUDA

Dec, 22

A GPU framework for parallel segmentation of volumetric images using discrete deformable model

Despite the ability of current GPU processors to treat heavy parallel computation tasks, its use for solving medical image segmentation problems is still not fully exploited and remains challenging. A lot of difficulties may arise related to, for example, the different image modalities, noise and artifacts of source images, or the shape and appearance variability […]

CUDA

Dec, 22

Extending adaptive sparse grids for stochastic collocation to hybrid parallel architectures

We are developing an adaptive sparse grid library tailored for emerging architectures that will allow the solution of stochastic problems of unprecedented size. This paper gives a brief overview of the problem at hand and presents initial results for a small GPU-based cluster. An outlook on large-scale distributed memory parallelization and our hybrid design approach […]

CUDA

Dec, 22

Solving Quadratic Programming Problems on Graphics Processing Unit

Quadratic Programming (QP) problems frequently appear as core component when solving constrained optimal control or estimation problems. The focus of this paper is on accelerating an existing Interior Point Method (IPM) for solving QP problems by exploiting the parallel computing characteristics of GPU. We compare the so-called data-parallel and the problem-parallel approaches to achieve speed […]

CUDA

Dec, 21

Porting and optimizing MAGFLOW on CUDA

The MAGFLOW lava simulation model is a cellular automaton developed by the Sezione di Catania of the Istituto Nazionale di Geofisica e Vulcanologia (INGV) and it represents the peak of the evolution of cell-based models for lava-flow simulation. The accuracy and adherence to reality achieved by the physics-based cell evolution of MAGFLOW comes at the […]

CUDA

Dec, 21

Computational Fluid Dynamics Simulations using Many Graphics Processors

Unsteady computational fluid dynamics simulations of turbulence are performed using up to 64 graphics processors. The results from two GPU clusters and a CPU cluster are compared. A second-order staggered-mesh spatial discretization is coupled with a low storage three-step Runge-Kutta time advancement and pressure projection at each substep. The pressure Poisson equation dominates the solution […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Improving the usability of hierarchical representations for interactively labeling large image data sets

Beyond Amdahl’s Law: An Objective Function That Links Multiprocessor Performance Gains To Delay and Energy

A block-asynchronous relaxation method for graphics processing units

AES Encryption and Decryption Using Direct3D 10 API

GPU-based parallel collision detection for real-time motion planning

Efficient shallow water simulations on GPUs: Implementation, visualization, verification, and validation

Generative programming methods for parallel partial differential field equation solvers

A GPU framework for parallel segmentation of volumetric images using discrete deformable model

Extending adaptive sparse grids for stochastic collocation to hybrid parallel architectures

Solving Quadratic Programming Problems on Graphics Processing Unit

Porting and optimizing MAGFLOW on CUDA

Computational Fluid Dynamics Simulations using Many Graphics Processors

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)