high performance computing on graphics processing units: hgpu.org

Posts

Nov, 15

Efficient Graph Comparison and Visualization Using GPU

This paper presents application of several graph algorithms for comparison and visualization of real-world networks. In order to obtain interactive and robust framework for analysis of large graphs we use CUDA implementations of all-shortest-paths (APSP) and breadth-first-search (BFS) algorithms along with CULA matrix decomposition routines. Such an approach allows for efficient computation of graph feature […]

CUDA

Nov, 14

A capabilities-aware framework for using computational accelerators in data-intensive computing

Multicore computational accelerators such as GPUs are now commodity components for high-performance computing at scale. While such accelerators have been studied in some detail as stand-alone computational engines, their integration in large-scale distributed systems raises new challenges and trade-offs. In this paper, we present an exploration of resource management alternatives for building asymmetric accelerator-based distributed […]

CUDA

Nov, 14

Large Scale Plane Wave Pseudopotential Density Functional Theory Calculations on GPU Clusters

In this work, we present our implementation of the density functional theory (DFT) plane wave pseudopotential (PWP) calculations on GPU clusters. This GPU version is developed based on a CPU DFT-PWP code: PEtot, which can calculate up to a thousand atoms on thousands of processors. Our test indicates that the GPU version can have a […]

CUDA

Nov, 14

Toward improved aeromechanics simulations using recent advancements in scientific computing

The proposed paper will present details on recent advancements in scientific computing in terms of integrating new hardware and software to greatly enhance the computational efficiency of comprehensive rotorcraft analysis. The focus will be on showing the tremendous computational accelerations that are possible (i.e., orders of magnitude speed up) by using software developments in the […]

CUDA

Nov, 14

Solving Incompressible Two-Phase Flows on Massively Parallel Multi-GPU Clusters

We present a fully multi-GPU-based double-precision solver for the three-dimensional two-phase incompressible Navier-Stokes equations. An in-depth performance analysis shows a realistic speed-up of the order of three by comparing equally priced GPUs and CPUs and more than a doubling in energy efficiency for GPUs. We observe profound strong and weak scaling on a multi-GPU cluster.

CUDA

Nov, 14

Pseudoscalar Meson in Two Flavors QCD with the Optimal Domain-Wall Fermion

We perform hybrid Monte Carlo (HMC) simulatons of two flavors QCD with the optimal domain-wall fermion (ODWF) on the $ 16^3 times 32 $ lattice (with lattice spacing $ a sim 0.1 $ fm), for eight sea-quark masses corresponding to pion masses in the range 230-580 MeV. We calculate the mass and the decay constant […]

CUDA

Nov, 14

Multi GPU Implementation of the Simplex Algorithm

The Simplex algorithm is a well known method to solve linear programming (LP) problems. In this paper, we propose an implementation via CUDA of the Simplex method on a multi GPU architecture. Computational tests have been carried out on randomly generated instances for non-sparse LP problems. The tests show a maximum speedup of 24:5 with […]

CUDA

Nov, 14

GPU-accelerated power pattern synthesis of aperiodic linear arrays

We deal with the development of a computationally effective approach for the synthesis of equivalently tapered, aperiodic linear arrays, i.e. arrays matching the requirements on the power pattern by acting only on the element positions and excitation phases. The computational effectiveness of the algorithm is reached by the development of a parallel Non Uniform Fast […]

CUDA

Nov, 14

AVSS2011 demo session: GPU enabled Smart Video Node

This paper presents an All-in-One video analytics system, a compact, multi-channel, real-time, video monitoring, event detection, alarm notification, event recording and browsing solution implemented on low cost hardware, taking advantage of NVIDIA’s GPU CUDA platform. An inventive distribution of video object detection and tracking processing chain between the GPUs and the CPU provides maximum efficiency […]

CUDA

Nov, 14

Seismic Wave Propagation Simulation Using Accelerated Support Operator Rupture Dynamics on Multi-GPU

The Support Operator Method (SOM) is a numerical method based on finite difference method. The Support Operator Rupture Dynamics (SORD) is an application based on it. It can be used in simulation of 3D elastic wave propagation and spontaneous rupture on hexahedral mesh. It can be applied to various surface boundary conditions. The original application […]

CUDA

Nov, 14

Fast RCS prediction using multiresolution shooting and bouncing ray method on the GPU

This paper presents a GPU-based multiresolution shooting and bouncing ray (MSBR) method with the kd-tree acceleration structure for the fast radar cross section (RCS) prediction of electrically large and complex targets. The multiresolution grid algorithm can greatly reduce the total number of ray tubes, as it adaptively adjusts the density of ray tubes for regions […]

CUDA

Nov, 14

Efficient Implementation of the Simplex Method on a CPU-GPU System

The Simplex algorithm is a well known method to solve linear programming (LP) problems. In this paper, we propose a parallel implementation of the Simplex on a CPU-GPU systems via CUDA. Double precision implementation is used in order to improve the quality of solutions. Computational tests have been carried out on randomly generated instances for […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Efficient Graph Comparison and Visualization Using GPU

A capabilities-aware framework for using computational accelerators in data-intensive computing

Large Scale Plane Wave Pseudopotential Density Functional Theory Calculations on GPU Clusters

Toward improved aeromechanics simulations using recent advancements in scientific computing

Solving Incompressible Two-Phase Flows on Massively Parallel Multi-GPU Clusters

Pseudoscalar Meson in Two Flavors QCD with the Optimal Domain-Wall Fermion

Multi GPU Implementation of the Simplex Algorithm

GPU-accelerated power pattern synthesis of aperiodic linear arrays

AVSS2011 demo session: GPU enabled Smart Video Node

Seismic Wave Propagation Simulation Using Accelerated Support Operator Rupture Dynamics on Multi-GPU

Fast RCS prediction using multiresolution shooting and bouncing ray method on the GPU

Efficient Implementation of the Simplex Method on a CPU-GPU System

Recent source codes

DITRON: Distributed Compiler based on Triton for Parallel Systems

IntelliKit: Agent-first tooling for AMD hardware

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)