high performance computing on graphics processing units: hgpu.org

Posts

Nov, 8

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier-Stokes solver for multi-GPU workstation platforms. A […]

CUDA

Nov, 8

A GPU-based matting Laplacian solver for high resolution image matting

The recently proposed matting Laplacian (Levin et al., IEEE Trans. Pattern Anal. Mach. Intell. 30(2):228-242, 2008) has been proven to be a state-of-the-art method for solving the image matting problem. Using this method, matting is formulated as solving a high-order linear system which is hard-constrained by the input trimap. The main drawback of this method, […]

CUDA

Nov, 8

Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case

In many numerical applications resulting from computational science and engineering problems, the solution of sparse linear systems is the most prohibitively compute intensive task. Consequently, the linear solvers need to be carefully chosen and efficiently implemented in order to harness the available computing resources. Krylov subspace based iterative solvers have been widely used for solving […]

CUDA

Nov, 8

Parallel medical image reconstruction: from graphics processing units (GPU) to Grids

We present and compare a variety of parallelization approaches for a real-world case study on modern parallel and distributed computer architectures. Our case study is a production-quality, time-intensive algorithm for medical image reconstruction used in computer tomography (PET). We parallelize this algorithm for the main kinds of contemporary parallel architectures: shared-memory multiprocessors, distributed-memory clusters, graphics […]

CUDA

Nov, 8

Power Efficient Large Matrices Multiplication by Load Scheduling on Multi-core and GPU Platform with CUDA

Power efficiency is one of the most important issues in high performance computing (HPC) interrelated to both software and hardware. Power dissipation of a program lies on algorithm design and power features of the computer components on which the program runs. In this work, we measure and model the power consumption of large matrices multiplication […]

CUDA

Nov, 8

GPU-based Real-Time Soft Tissue Deformation with Cutting and Haptic Feedback

This article describes a series of contributions in the field of real-time simulation of soft tissue biomechanics. These contributions address various requirements for interactive simulation of complex surgical procedures. In particular, this article presents results in the areas of soft tissue deformation, contact modelling, simulation of cutting, and haptic rendering, which are all relevant to […]

Nov, 8

Phase diagram and critical behavior of the square-lattice Ising model with competing nearest- and next-nearest-neighbor interactions

Using the parallel tempering algorithm and GPU accelerated techniques, we have performed large-scale Monte Carlo simulations of the Ising model on a square lattice with antiferromagnetic (repulsive) nearest-neighbor(NN) and next-nearest-neighbor(NNN) interactions of the same strength and subject to a uniform magnetic field. Both transitions from the (2×1) and row-shifted (2×2) ordered phases to the paramagnetic […]

CUDA

Nov, 8

High-performance cone beam reconstruction using CUDA compatible GPUs

Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper presents an acceleration method for cone beam reconstruction using CUDA compatible GPUs. The proposed method accelerates the Feldkamp, Davis, and Kress (FDK) algorithm using three techniques: (1) off-chip memory […]

CUDA

Nov, 8

Accelerating glassy dynamics using graphics processing units

Modern graphics hardware offers peak performances close to 1 Tflop/s, and NVIDIA’s CUDA provides a flexible and convenient programming interface to exploit these immense computing resources. We demonstrate the ability of GPUs to perform high-precision molecular dynamics simulations for nearly a million particles running stably over many days. Particular emphasis is put on the numerical […]

CUDA

Nov, 8

The CUBLAS and CULA based GPU acceleration of adaptive finite element framework for bioluminescence tomography

In molecular imaging (MI), especially the optical molecular imaging, bioluminescence tomography (BLT) emerges as an effective imaging modality for small animal imaging. The finite element methods (FEMs), especially the adaptive finite element (AFE) framework, play an important role in BLT. The processing speed of the FEMs and the AFE framework still needs to be improved, […]

CUDA

Nov, 8

Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU

Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It […]

CUDA

Nov, 8

Theory of square, rectangular, and microband electrodes through explicit GPU simulation

The use of microband electrodes in electrochemistry has expanded in recent years due to enhanced current densities, ease of fabrication, and available theory. This paper, through explicit three-dimensional finite difference GPU simulation, simulates mass transport to square and rectangular (finite band) microelectrodes and quantifies the response of a finite band at any given length to […]

high performance computing on graphics processing units: hgpu.org

Posts

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

A GPU-based matting Laplacian solver for high resolution image matting

Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case

Parallel medical image reconstruction: from graphics processing units (GPU) to Grids

Power Efficient Large Matrices Multiplication by Load Scheduling on Multi-core and GPU Platform with CUDA

GPU-based Real-Time Soft Tissue Deformation with Cutting and Haptic Feedback

Phase diagram and critical behavior of the square-lattice Ising model with competing nearest- and next-nearest-neighbor interactions

High-performance cone beam reconstruction using CUDA compatible GPUs

Accelerating glassy dynamics using graphics processing units

The CUBLAS and CULA based GPU acceleration of adaptive finite element framework for bioluminescence tomography

Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU

Theory of square, rectangular, and microband electrodes through explicit GPU simulation

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)