high performance computing on graphics processing units: hgpu.org

Posts

Jul, 18

Robust GPGPU plugin development for RapidMiner

In recent years, significant number of papers [1][2] have been published about general-purpose graphical processing unit (GPGPU) programs which are able to accelerate computationally intensive applications by several times over conventional CPU programs. These papers raise an important question: With the current developer tools is it possible to integrate these GPU programs into a major […]

CUDA

Jul, 18

Temporal Blending for Adaptive SPH

In this paper we introduce a fast and consistent Smoothed Particle Hydrodynamics (SPH) technique which is suitable for convection-diffusion simulations of incompressible fluids. We apply our temporal blending technique to reduce the number of particles in the simulation while smoothly changing quantity fields. Our approach greatly reduces the error introduced in the pressure term when […]

Jul, 18

Fluid Simulation on Surfaces in the GPU

In this paper we present a method to simulate fluids on smooth surfaces of arbitrary topology using a graphics processing unit (GPU). To do this we use the parametrization of Catmull-Clark subdivision surfaces, and obtain the metric information of the distortion caused by this parametrization, so we can calculate differential operators of functions defined on […]

CUDA

Jul, 17

CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures

Data-parallel languages feature fine-grained parallel primitives that can be supported by compilers targeting modern many-core architectures where data parallelism must be exploited to fully utilize the hardware. Previous research has focused on converting data-parallel languages for SIMD (single instruction multiple data) architectures. However, directly applying them to today’s SIMT (single instruction multiple thread) architectures does […]

CUDA

Jul, 17

Interactively Simulating Fluid based on SPH and CUDA

In this paper, we propose a novel method of interactive fluid simulating based on SPH, and implement it on CUDA (Compute Unified Device Architecture). Firstly we use SPH (Smoothed Particle Hydrodynamics) theory to simulate the motion of fluids. Secondly we propose an interactive method between fluid and rigid objects. We treat the rigid objects as […]

CUDA

•

OpenGL

Jul, 17

CUSIMANN: An optimized simulated annealing software for GPUs

CUSIMANN (CUDA SIMULATED ANNEALING) is a free/open-source library for global optimization that provides a parallel implementation of the simulated annealing algorithm in CUDA.

CUDA

Jul, 17

Scalable Molecular Dynamics Simulation Using FPGAs and Multicore Processors

While Molecular Dynamics Simulation (MD) uses a large fraction of the world’s High Performance Compute cycles, the modeling of many physical phenomena remains far out of reach. Improving the cost-effectiveness of MD has therefore received much attention, especially in using accelerators or modifying the computation itself. While both approaches have demonstrated great potential, scalability has […]

Jul, 17

Multicore and Manycore Algorithms for Octrees

Octrees and compressed octrees are frequently used to represent data in an hierarchical form for high performance computing, graphics and database applications. Applications like N-body problems require building octrees multiple times. Therefore, efficient construction of octrees is critical to the efficiency of the entire applications. With ever increasing data size, there is a requirement to […]

CUDA

Jul, 16

Optimizing MapReduce for GPUs with effective shared memory usage

Accelerators and heterogeneous architectures in general, and GPUs in particular, have recently emerged as major players in high performance computing. For many classes of applications, MapReduce has emerged as the framework for easing parallel programming and improving programmer productivity. There have already been several efforts on implementing MapReduce on GPUs. In this paper, we propose […]

Jul, 16

Sparse Matrix-Vector Multiplication on NVIDIA GPU

In this paper, we present our work on developing a new matrix format and a new sparse matrix-vector multiplication algorithm. The matrix format is HEC, which is a hybrid format. This matrix format is efficient for sparse matrix-vector multiplication and is friendly to preconditioner. Numerical experiments show that our sparse matrix-vector multiplication algorithm is efficient […]

CUDA

Jul, 16

Sparse Matrix Matrix Multiplication on Hybrid CPU+GPU Platforms

Sparse matrix-sparse/dense matrix multiplications, spgemm and csrmm, among other applications find usage in various matrix formulations of graph problems. GPU based supercomputers are presently experiencing severe performance issues on the Graph-500 benchmarks, a new HPC benchmark suite focusing on graph algorithms. Considering the difficulties in executing graph problems and the duality between graphs and matrices, […]

CUDA

Jul, 16

A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing

Large, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint but most graph processing algorithms entail memory access patterns with poor locality, data-dependent parallelism, and a low compute-to-memory access ratio. Additionally, most real-world graphs have a low diameter and a highly heterogeneous node degree distribution. Partitioning these graphs […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Robust GPGPU plugin development for RapidMiner

Temporal Blending for Adaptive SPH

Fluid Simulation on Surfaces in the GPU

CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures

Interactively Simulating Fluid based on SPH and CUDA

CUSIMANN: An optimized simulated annealing software for GPUs

Scalable Molecular Dynamics Simulation Using FPGAs and Multicore Processors

Multicore and Manycore Algorithms for Octrees

Optimizing MapReduce for GPUs with effective shared memory usage

Sparse Matrix-Vector Multiplication on NVIDIA GPU

Sparse Matrix Matrix Multiplication on Hybrid CPU+GPU Platforms

A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)