high performance computing on graphics processing units: hgpu.org

Posts

Apr, 7

A Neighborhood Grid Data Structure for Massive 3D Crowd Simulation on GPU

Simulation and visualization of emergent crowd in real-time is a computationally intensive task. This intensity mostly comes from the O(n2) complexity of the traversal algorithm, necessary for the proximity queries of all pair of entities in order to compute the relevant mutual interactions. Previous works reduced this complexity by considerably factors, using adequate data structures […]

CUDA

Apr, 7

Context-aware volume navigation

The trackball metaphor is exploited in many applications where volumetric data needs to be explored. Although it provides an intuitive way to inspect the overall structure of objects of interest, an in-detail inspection can be tedious – or when cavities occur even impossible. Therefore we propose a context-aware navigation technique for the exploration of volumetric […]

OpenCL

Apr, 7

Practical examples of GPU computing optimization principles

In this paper, we provide examples to optimize signal processing or visual computing algorithms written for SIMT-based GPU architectures. These implementations demonstrate the optimizations for CUDA or its successors OpenCL and DirectCompute. We discuss the effect and optimization principles of memory coalescing, bandwidth reduction, processor occupancy, bank conflict reduction, local memory elimination and instruction optimization. […]

CUDA

•

OpenCL

Apr, 7

ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere

We describe a hybrid Fourier/direct space convolution algorithm for compact radial (azimuthally symmetric) kernels on the sphere. For high resolution maps covering a large fraction of the sky, our implementation takes advantage of the inexpensive massive parallelism afforded by consumer graphics processing units (GPUs). Applications involve modeling of instrumental beam shapes in terms of compact […]

CUDA

Apr, 7

Scaling Hierarchical N-body Simulations on GPU Clusters

This paper focuses on the use of GPGPU-based clusters for hierarchical N-body simulations. Whereas the behavior of these hierarchical methods has been studied in the past on CPU-based architectures, we investigate key performance issues in the context of clusters of GPUs. These include kernel organization and efficiency, the balance between tree traversal and force computation […]

CUDA

Apr, 6

Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs

To exploit the full potential of GPGPUs for general purpose computing, DOACR parallelism abundant in scientific and engineering applications must be harnessed. However, the presence of cross-iteration data dependences in DOACR loops poses an obstacle to execute their computations concurrently using a massive number of fine-grained threads. This work focuses on iterative PDE solvers rich […]

Apr, 6

An algorithmic incremental and iterative development method to parallelize dusty-deck FORTRAN HPC codes in GPGPUs using CUDA

State-of-the-art high-speed and economical graphic card processors (GPUs) provide high multiprocessing power for high performance computing (HPC). But software development for high performance computing is profound and requires a good comprehension of algorithms, applications, and architectures. This paper outlines an incremental and iterative software development process for porting dusty-deck HPC application source codes to a […]

CUDA

Apr, 6

Heuristic Optimization Methods for Improving Performance of Recursive General Purpose Applications on GPUs

Due to the demand of high definition graphics presentation in gaming and video market, graphics processing units (GPUs) have drastically increased their computational capacities. General-purpose computation on GPUs uses the fragment shader multicore of these processing units to concurrently process data streams. However, the I/O overheads in recursive GPGPU applications have a negative impact in […]

Apr, 6

High Performance Hybrid Functional Petri Net Simulations of Biological Pathway Models on CUDA

Hybrid functional Petri nets are a wide-spread tool for representing and simulating biological models. Due to their potential of providing virtual drug testing environments, biological simulations have a growing impact on pharmaceutical research. Continuous research advancements in biology and medicine lead to exponentially increasing simulation times thus raising the demand for performance accelerations by efficient […]

CUDA

Apr, 6

A Hybrid Computational Grid Architecture for Comparative Genomics

Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved among species, as well as genes that give each organism its unique characteristics. However, the huge datasets involved makes this approach impractical on traditional computer architectures leading to prohibitively long runtimes. In this paper, we present […]

Apr, 6

Barnes-hut treecode on GPU

General-purpose computation on graphics processing units (GPGPU) has become a popular field of study. Due to its high computing capacity and relatively low price, GPU has been an ideal processing unit for many scientific applications, among which is N-body simulation. According to the published papers, a simple O(N^2) algorithm of N-body simulation has achieved some […]

CUDA

Apr, 6

Parallel and distributed seismic wave field modeling with combined Linux clusters and graphics processing units

General-purpose computing on graphics processing units (GPGPU) is a fast developing method of high performance computing (HPC). In some cases even a low-end video card can be several to dozens times faster than a modem CPU core. Seismic wave filed modeling is one of the problems of this kind. But in some modern methods of […]

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Posts

A Neighborhood Grid Data Structure for Massive 3D Crowd Simulation on GPU

Context-aware volume navigation

Practical examples of GPU computing optimization principles

ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere

Scaling Hierarchical N-body Simulations on GPU Clusters

Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs

An algorithmic incremental and iterative development method to parallelize dusty-deck FORTRAN HPC codes in GPGPUs using CUDA

Heuristic Optimization Methods for Improving Performance of Recursive General Purpose Applications on GPUs

High Performance Hybrid Functional Petri Net Simulations of Biological Pathway Models on CUDA

A Hybrid Computational Grid Architecture for Comparative Genomics

Barnes-hut treecode on GPU

Parallel and distributed seismic wave field modeling with combined Linux clusters and graphics processing units

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)