high performance computing on graphics processing units: hgpu.org

Posts

Jan, 14

Efficient spectral and pseudospectral algorithms for 3D simulations of whistler-mode waves in a plasma

Efficient spectral and pseudospectral algorithms for simulation of linear and nonlinear 3D whistler waves in a cold electron plasma are developed. These algorithms are applied to the simulation of whistler waves generated by loop antennas and spheromak-like stationary waves of considerable amplitude. The algorithms are linearly stable and show good stability properties for computations of […]

Jan, 14

GPU-Friendly Multi-View Stereo Reconstruction Using Surfel Representation and Graph Cuts

In this paper, we present a new surfel (surface element) based multi-view stereo algorithm that runs entirely on GPU. We utilize the flexibility of surfel-based 3D shape representation and global optimization by graph cuts in the same framework. Unlike previous works, the algorithm is optimized to massive parallel processing on GPU. First, we construct surfel […]

CUDA

Jan, 14

FSAI preconditioned CG algorithm combined with GPU technique for the finite element analysis of electromagnetic scattering problems

In order to efficiently solve the large sparse complex linear system arising from the vector finite element method (vector FEM) in electromagnetic scattering problems, the factorized sparse approximate inverse (FSAI) algorithm and the programmable graphics processing unit (GPU) are employed in the context of the conjugate gradient (CG) iterative method. The combination of the FSAI […]

Jan, 14

Parallel Deblocking Filtering in MPEG-4 AVC/H.264 on Massively-Parallel Architectures

The deblocking filter in the MPEG-4 AVC/H.264 standard is computationally complex because of its high content adaptivity, resulting in a significant number of data dependencies. These data dependencies interfere with parallel filtering of multiple macroblocks on massively-parallel architectures. In this paper, we introduce a novel macroblock partitioning scheme for concurrent deblocking in the MPEG-4 AVC/H.264 […]

CUDA

Jan, 14

Parallel Processing of the Building-Cube Method on a GPU Platform

The Building-Cube Method (BCM) based on equally-spaced Cartesian meshes has been proposed as a next generation CFD method. Due to the equally-spaced meshes, it is well suited for highly parallel computation. This paper proposes a parallel implementation scheme of BCM on a GPU cluster system, which needs efficient hierarchical parallel processing to exploit the potential […]

Jan, 13

Taming irregular EDA applications on GPUs

Recently general purpose computing on graphic processing units (GPUs) is rising as an exciting new trend in high-performance computing. Thus it is appealing to study the potential of GPU for Electronic Design Automation (EDA) applications. However, EDA generally involves irregular data structures such as sparse matrix and graph operations, which pose significant challenges for efficient […]

Jan, 13

Simulating Lattice Spin Models on Graphics Processing Units

Lattice spin models are useful for studying critical phenomena and allow the extraction of equilibrium and dynamical properties. Simulations of such systems are usually based on Monte Carlo (MC) techniques, and the main difficulty is often the large computational effort needed when approaching critical points. In this work, it is shown how such simulations can […]

CUDA

Jan, 13

GPU accelerated biochemical network simulation

MOTIVATION: Mathematical modelling is central to systems and synthetic biology. Using simulations to calculate statistics or to explore parameter space is a common means for analysing these models and can be computationally intensive. However, in many cases, the simulations are easily parallelizable. Graphics processing units (GPUs) are capable of efficiently running highly parallel programs and […]

CUDA

Jan, 13

Cardiac simulation on multi-GPU platform

The cardiac bidomain model is a popular approach to study electrical behavior of tissues and simulate interactions between the cells by solving partial differential equations. The iterative and data parallel model is an ideal match for the parallel architecture of Graphic Processing Units (GPUs). In this study, we evaluate the effectiveness of architecture-specific optimizations and […]

Jan, 13

190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs

We present the results of a hierarchical N-body simulation on DEGIMA, a cluster of PCs with 576 graphic processing units (GPUs) and using an InfiniBand interconnect. DEGIMA stands for DEstination for GPU Intensive MAchine, and is located at Nagasaki Advanced Computing Center (NACC), Nagasaki University. In this work, we have upgraded DEGIMA_s interconnect using InfiniBand. […]

CUDA

Jan, 13

Fitting Galaxies on GPUs

Structural parameters are normally extracted from observed galaxies by fitting analytic light profiles to the observations. Obtaining accurate fits to high-resolution images is a computationally expensive task, requiring many model evaluations and convolutions with the imaging point spread function. While these algorithms contain high degrees of parallelism, current implementations do not exploit this property. With […]

CUDA

Jan, 13

Hardware-Assisted Projected Tetrahedra

We present a flexible and highly efficient hardware-assisted volume renderer grounded on the original Projected Tetrahedra (PT) algorithm. Unlike recent similar approaches, our method is exclusively based on the rasterization of simple geometric primitives and takes full advantage of graphics hardware. Both vertex and geometry shaders are used to compute the tetrahedral projection, while the […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Efficient spectral and pseudospectral algorithms for 3D simulations of whistler-mode waves in a plasma

GPU-Friendly Multi-View Stereo Reconstruction Using Surfel Representation and Graph Cuts

FSAI preconditioned CG algorithm combined with GPU technique for the finite element analysis of electromagnetic scattering problems

Parallel Deblocking Filtering in MPEG-4 AVC/H.264 on Massively-Parallel Architectures

Parallel Processing of the Building-Cube Method on a GPU Platform

Taming irregular EDA applications on GPUs

Simulating Lattice Spin Models on Graphics Processing Units

GPU accelerated biochemical network simulation

Cardiac simulation on multi-GPU platform

190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs

Fitting Galaxies on GPUs

Hardware-Assisted Projected Tetrahedra

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)