high performance computing on graphics processing units: hgpu.org

Posts

Jan, 25

Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures

Most recent HPC platforms have heterogeneous nodes composed of a combination of multi-core CPUs and accelerators, like GPUs. Scheduling on such architectures relies on a static partitioning and cost model. In this paper, we present a locality-aware work stealing scheduler for multi-CPU and multi-GPU architectures, which relies on the XKaapi runtime system. We show performance […]

CUDA

Jan, 25

Vlasov on GPU (VOG Project)

This work concerns the numerical simulation of the Vlasov-Poisson set of equations using semi- Lagrangian methods on Graphical Processing Units (GPU). To accomplish this goal, modifications to traditional methods had to be implemented. First and foremost, a reformulation of semi-Lagrangian methods is performed, which enables us to rewrite the governing equations as a circulant matrix […]

OpenCL

Jan, 25

A GPU-accelerated Direct-sum Boundary Integral Poisson-Boltzmann Solver

In this paper, we present a GPU-accelerated direct-sum boundary integral method to solve the linear Poisson-Boltzmann (PB) equation. In our method, a well-posed boundary integral formulation is used to ensure the fast convergence of Krylov subspace based linear algebraic solver such as the GMRES. The molecular surfaces are discretized with flat triangles and centroid collocation. […]

CUDA

Jan, 24

High Performance Lattice Boltzmann Solvers on Massively Parallel Architectures with Applications to Building Aeraulics

With the advent of low-energy buildings, the need for accurate building performance simulations has significantly increased. However, for the time being, the thermo-aeraulic effects are often taken into account through simplified or even empirical models, which fail to provide the expected accuracy. Resorting to computational fluid dynamics seems therefore unavoidable, but the required computational effort […]

CUDA

Jan, 24

Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems

We discuss three complementary approaches that can provide both portability and an increased level of abstraction for the programming of heterogeneous multicore systems. Together, these approaches also support performance portability, as currently investigated in the EU FP7 project PEPPHER. In particular, we consider (1) a library-based approach, here represented by the integration of the SkePU […]

CUDA

Jan, 24

Hybrid Single/Double Precision Floating-Point Computation on GPU Accelerators for 2-D FDTD

Acceleration of FDTD (Finite-Difference TimeDomain) is very important in computational electromagnetic. We propose a hybrid single/double precision floating-point computation to accelerate FDTD on GPUs. We apply single-precision when the dynamic range of the electromagnetic field is low and double-precision when the dynamic range is high. According to the experimental results, we achieved over 35 times […]

CUDA

Jan, 24

Developing and Evaluating clOpenCL Applications for Heterogeneous Clusters

In the last few years, the computing systems processing capabilities have increased significantly, changing from single-core to multi-core and even many-core systems. Accompanying this evolution, local networks have also become faster, with multi-gigabit technologies like Infiniband, Myrinet and 10G Ethernet. Parallel/distributed programming tools and standards, like POSIX Threads, OpenMP and MPI, have helped to explore […]

CUDA

•

OpenCL

Jan, 24

Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA

High resolution satellite images are now widely used for a variety of mapping applications including photogrammetry, GIS data acquisition and visualization. As the spectral and spatial data size of satellite images increases, a greater processing power is needed to process the images. The solution of these problems is parallel systems. Parallel processing techniques have been […]

CUDA

Jan, 24

GPU-based 3D Wavelet Transform

Wide amount of applications like volumetric medical data compression, video watermarking and video coding use the three-dimensional wavelet transform (3D-DWT) in their algorithms. In this work, we present GPU algorithms, based on both global and shared memory, to compute the 3D-DWT transform on both the GTX280 and the GMT540 platforms. The results obtained show that […]

CUDA

Jan, 23

Reducing GPU Offload Latency via Fine-Grained CPU-GPU Synchronization

GPUs are seeing increasingly widespread use for general purpose computation due to their excellent performance for highly-parallel, throughput-oriented applications. For many workloads, however, the performance benefits of offloading are hindered by the large and unpredictable overheads of launching GPU kernels and of transferring data between CPU and GPU. This paper proposes and evaluates hardware and […]

CUDA

Jan, 23

The effects of nutrient chemotaxis on bacterial aggregation patterns with non-linear degenerate cross diffusion

This paper introduces a reaction-diffusion-chemotaxis model for bacterial aggregation patterns on the surface of thin agar plates. It is based on the non-linear degenerate cross diffusion model proposed by Kawasaki et al. (J. of Theor. Biol. 188(2) 1997) and it includes a suitable nutrient chemotactic term compatible with such type of diffusion. High resolution numerical […]

CUDA

Jan, 23

A survey on various computationally intensive parallel applications in High performance Computing System with OpenCL-MPI

As we are in the development phase of our own super computer, we have identified several applications which are highly computationally intensive applications for a normal desktop computer to achieve the solution. These identified applications are related to multidisciplinary like bio-medical, mathematics, fluid dynamics, genetic algorithms. We are actually identifying the parallel computations involved in […]

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures

Vlasov on GPU (VOG Project)

A GPU-accelerated Direct-sum Boundary Integral Poisson-Boltzmann Solver

High Performance Lattice Boltzmann Solvers on Massively Parallel Architectures with Applications to Building Aeraulics

Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems

Hybrid Single/Double Precision Floating-Point Computation on GPU Accelerators for 2-D FDTD

Developing and Evaluating clOpenCL Applications for Heterogeneous Clusters

Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA

GPU-based 3D Wavelet Transform

Reducing GPU Offload Latency via Fine-Grained CPU-GPU Synchronization

The effects of nutrient chemotaxis on bacterial aggregation patterns with non-linear degenerate cross diffusion

A survey on various computationally intensive parallel applications in High performance Computing System with OpenCL-MPI

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)