high performance computing on graphics processing units: hgpu.org

Posts

Nov, 15

High-performance Blob-based iterative reconstruction of electron tomography on multi-GPUs

Three-dimensional (3D) reconstruction of electron tomography (ET) has emerged as a leading technique to elucidate the molecular structures of complex biological specimens. Blob-based iterative methods are advantageous reconstruction methods for 3D reconstruction of ET, but demand huge computational costs. Multiple Graphic processing units (multi-GPUs) offer an affordable platform to meet these demands, nevertheless, are not […]

CUDA

Nov, 15

An Ultrafast Scalable Many-core Motif Discovery Algorithm for Multiple GPUs

The identification of genome-wide transcription factor binding sites is a fundamental and crucial problem to fully understand the transcriptional regulatory processes. However, the high computational cost of many motif discovery algorithms heavily constraints their application for large-scale datasets. The rapid growth of genomic sequences and gene transcription data further deteriorates the situation and establishes a […]

CUDA

Nov, 15

Design of MILC Lattice QCD Application for GPU Clusters

We present an implementation of the improved staggered quark action lattice QCD computation designed for execution on a GPU cluster. The parallelization strategy is based on dividing the space-time lattice along the time dimension and distributing the sub-lattices among the GPU cluster nodes. We provide a mixed-precision floating-point GPU implementation of the multi-mass conjugate gradient […]

CUDA

Nov, 15

GStream: A General-Purpose Data Streaming Framework on GPU Clusters

Emerging accelerating architectures, such as GPUs, have proved successful in providing significant performance gains to various application domains. However, their viability to operate on general streaming data is still ambiguous. In this paper, we propose GStream, a general-purpose, scalable data streaming framework on GPUs. The contributions of GStream are as follows: (1) We provide powerful, […]

CUDA

Nov, 15

An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems

The numerical solution of two-layer shallow water systems is required to simulate accurately stratified fluids, which are ubiquitous in nature: they appear in atmospheric flows, ocean currents, oil spills, etc. Moreover, the implementation of the numerical schemes to solve these models in realistic scenarios imposes huge demands of computing power. In this paper, we tackle […]

CUDA

Nov, 15

gSLIC: a real-time implementation of SLIC superpixel segmentation

We introduce a parallel implementation of the Simple Linear Iterative Clustering (SLIC) superpixel segmentation. Our implementation uses GPU and the NVIDIA CUDA framework. Using a single graphic card, our implementation achieves speedups of 10x~20x from the sequential implementation. This allow us to use the superpixel segmentation method in real-time performance. Our implementation is compatible with […]

CUDA

Nov, 15

Natural HPC substrate: Exploitation of mixed multicore CPU and GPUs

Recent GPU developments have attracted much interest in the HPC community. Since each GPU interface requires a dedicated host processor, the unused high performance non-GPU processors are simply wasted. GPUs are energy intensive and are more likely to fail than CPUs, we are interested in using all processors to a) boosting application performance, and b) […]

CUDA

Nov, 15

Efficient Graph Comparison and Visualization Using GPU

This paper presents application of several graph algorithms for comparison and visualization of real-world networks. In order to obtain interactive and robust framework for analysis of large graphs we use CUDA implementations of all-shortest-paths (APSP) and breadth-first-search (BFS) algorithms along with CULA matrix decomposition routines. Such an approach allows for efficient computation of graph feature […]

CUDA

Nov, 14

A capabilities-aware framework for using computational accelerators in data-intensive computing

Multicore computational accelerators such as GPUs are now commodity components for high-performance computing at scale. While such accelerators have been studied in some detail as stand-alone computational engines, their integration in large-scale distributed systems raises new challenges and trade-offs. In this paper, we present an exploration of resource management alternatives for building asymmetric accelerator-based distributed […]

CUDA

Nov, 14

Large Scale Plane Wave Pseudopotential Density Functional Theory Calculations on GPU Clusters

In this work, we present our implementation of the density functional theory (DFT) plane wave pseudopotential (PWP) calculations on GPU clusters. This GPU version is developed based on a CPU DFT-PWP code: PEtot, which can calculate up to a thousand atoms on thousands of processors. Our test indicates that the GPU version can have a […]

CUDA

Nov, 14

Toward improved aeromechanics simulations using recent advancements in scientific computing

The proposed paper will present details on recent advancements in scientific computing in terms of integrating new hardware and software to greatly enhance the computational efficiency of comprehensive rotorcraft analysis. The focus will be on showing the tremendous computational accelerations that are possible (i.e., orders of magnitude speed up) by using software developments in the […]

CUDA

Nov, 14

Solving Incompressible Two-Phase Flows on Massively Parallel Multi-GPU Clusters

We present a fully multi-GPU-based double-precision solver for the three-dimensional two-phase incompressible Navier-Stokes equations. An in-depth performance analysis shows a realistic speed-up of the order of three by comparing equally priced GPUs and CPUs and more than a doubling in energy efficiency for GPUs. We observe profound strong and weak scaling on a multi-GPU cluster.

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

High-performance Blob-based iterative reconstruction of electron tomography on multi-GPUs

An Ultrafast Scalable Many-core Motif Discovery Algorithm for Multiple GPUs

Design of MILC Lattice QCD Application for GPU Clusters

GStream: A General-Purpose Data Streaming Framework on GPU Clusters

An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems

gSLIC: a real-time implementation of SLIC superpixel segmentation

Natural HPC substrate: Exploitation of mixed multicore CPU and GPUs

Efficient Graph Comparison and Visualization Using GPU

A capabilities-aware framework for using computational accelerators in data-intensive computing

Large Scale Plane Wave Pseudopotential Density Functional Theory Calculations on GPU Clusters

Toward improved aeromechanics simulations using recent advancements in scientific computing

Solving Incompressible Two-Phase Flows on Massively Parallel Multi-GPU Clusters

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)