high performance computing on graphics processing units: hgpu.org

Posts

Dec, 26

Streaming GPU Singular Value and Dynamic Mode Decompositions

This work develops a parallelized algorithm to compute the dynamic mode decomposition (DMD) on a graphics processing unit using the streaming method of snapshots singular value decomposition. This allows the algorithm to operate efficiently on streaming data by avoiding redundant inner-products as new data becomes available. In addition, it is possible to leverage the native […]

CUDA

Dec, 22

High productivity multi-device exploitation with the Heterogeneous Programming Library

Heterogeneous devices require much more work from programmers than traditional CPUs, particularly when there are several of them, as each one has its own memory space. Multi-device applications require to distribute kernel executions and, even worse, arrays portions that must be kept coherent among the different device memories and the host memory. In addition, when […]

OpenCL

Dec, 22

Log File Regular Expression Pattern Matching And Capture With GPUs

The information contained in a system is normally stored into log files. Most of the time, these files store the information in plain text with many not formatted information. It is then necessary to extract parts of this information to be able to understand what is going on such system. Currently, such information can be […]

CUDA

Dec, 20

Fluid Simulation: Smoothed Particle Hydrodynamics on the GPU

This report describes the physical concept of fluids as well as a mathematical model for fluids governed by the Navier-Stokes equations. The Smoothed Particle Hydrodynamics method (SPH) for simulating fluids is described, and implementation details of the method are explained. Numerical integration methods such as Euler and Leap-Frog integration are discussed. The presented result is […]

OpenCL

•

OpenGL

Dec, 20

An EoS-meter of QCD transition from deep learning

Supervised learning with a deep convolutional neural network is used to identify the QCD equation of state (EoS) employed in relativistic hydrodynamic simulations of heavy-ion collisions. The final-state particle spectra $rho(p_T,Phi)$ provide directly accessible information from experiments. High-level correlations of $rho(p_T,Phi)$ learned by the neural network act as an "EoS-meter", effective in detecting the nature […]

CUDA

Dec, 20

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core Intel architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include […]

Dec, 20

Accelerating Spark RDD Operations with Local and Remote GPU Devices

Apache Spark is a distributed processing framework for large-scale data sets, where intermediate data sets are represented as RDDs (Resilient Distributed Datasets) and stored in memory distributed over machines. To accelerate its various computation intensive operations, such as reduction and sort, we focus on GPU devices. We modified Spark framework to invoke CUDA kernels when […]

CUDA

Dec, 20

gpuSPHASE – A shared memory caching implementation for 2D SPH using CUDA

Smoothed particle hydrodynamics (SPH) is a meshless Lagrangian method that has been successfully applied to computational fluid dynamics (CFD), solid mechanics and many other multi-physics problems. Using the method to solve transport phenomena in process engineering requires the simulation of several days to weeks of physical time. Based on the high computational demand of CFD […]

CUDA

Dec, 18

The 5th International Workshop on OpenCL (IWOCL), 2017

The International Workshop on OpenCL (IWOCL) is an annual meeting of OpenCL users, researchers, developers and suppliers to share OpenCL best practise, and to promote the evolution and advancement of the OpenCL standard. The meeting is open to anyone who is interested in contributing to, and participating in the OpenCL community. IWOCL is the premier […]

Dec, 18

International Conference on Biomacromolecules and Biomimetic Materials (ICBBM), 2017

2017 International Conference on Biomacromolecules and Biomimetic Materials (ICBBM 2017) will be held in Boracay, Philippine during March 6-12, 2017.The objective of ICBBM 2017 is to present the latest research and results of scientists related to Biomacromolecules and Biomimetic Materials topics. This conference provides opportunities for the different areas delegates to exchange new ideas and […]

Dec, 17

GPU-Based Nonlocal Filtering for Large Scale SAR Processing

In the past few years nonlocal filters have emerged as a serious contender for denoising synthetic aperture radar (SAR) images, offering superior noise reduction and detail preservation compared to many other filters. In this manuscript we analyze how nonlocal filters, whose computational costs were so far prohibitive for large scale processing, can be implemented efficiently […]

OpenCL

Dec, 17

Efficient Realization of Householder Transform through Algorithm-Architecture Co-design for Acceleration of QR Factorization

We present efficient realization of Householder Transform (HT) based QR factorization through algorithm-architecture co-design where we achieve performance improvement of 3-90x in-terms of Gflops/watt over state-of-the-art multicore, General Purpose Graphics Processing Units (GPGPUs), Field Programmable Gate Arrays (FPGAs), and ClearSpeed CSX700. Theoretical and experimental analysis of classical HT is performed for opportunities to exhibit higher […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Streaming GPU Singular Value and Dynamic Mode Decompositions

High productivity multi-device exploitation with the Heterogeneous Programming Library

Log File Regular Expression Pattern Matching And Capture With GPUs

Fluid Simulation: Smoothed Particle Hydrodynamics on the GPU

An EoS-meter of QCD transition from deep learning

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

Accelerating Spark RDD Operations with Local and Remote GPU Devices

gpuSPHASE – A shared memory caching implementation for 2D SPH using CUDA

The 5th International Workshop on OpenCL (IWOCL), 2017

International Conference on Biomacromolecules and Biomimetic Materials (ICBBM), 2017

GPU-Based Nonlocal Filtering for Large Scale SAR Processing

Efficient Realization of Householder Transform through Algorithm-Architecture Co-design for Acceleration of QR Factorization

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)