high performance computing on graphics processing units: hgpu.org

Posts

Feb, 4

Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that […]

CUDA

•

OpenCL

Feb, 4

A Real-Time, GPU-Based, Non-Imaging Back-End for Radio Telescopes

Since the discovery of RRATs, interest in single pulse radio searches has increased dramatically. Due to the large data volumes generated by these searches, especially in planned surveys for future radio telescopes, such searches have to be conducted in real-time. This has led to the development of a multitude of search techniques and real-time pipeline […]

CUDA

Feb, 4

A Scalable Hybrid FPGA/GPU FX Correlator

Radio astronomical imaging arrays comprising large numbers of antennas, O(10^2-10^3) have posed a signal processing challenge because of the required O(N^2) cross correlation of signals from each antenna and requisite signal routing. This motivated the implementation of a Packetized Correlator architecture that applies Field Programmable Gate Arrays (FPGAs) to the O(N) "F-stage" transforming time domain […]

CUDA

Feb, 2

Parallelization of the Algorithm WHAM with NVIDIA CUDA

The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molecular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing […]

CUDA

Feb, 2

Efficient Virtual Shadow Maps for Many Lights

Recently, several algorithms have been introduced that enable real-time performance for many lights in applications such as games. In this paper, we explore the use of hardware-supported virtual cube-map shadows to efficiently implement high-quality shadows from hundreds of light sources in real time and within a bounded memory footprint. In addition, we explore the utility […]

CUDA

•

OpenGL

Feb, 2

Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster

Nowadays computer applications are becoming heavier and require, at the same time, real-time results. The Heterogeneous clusters with their computing power represent a good solution to this request. However, it is possible that during the execution, a computing element of the cluster becomes defaulting, needs maintenance, or that the load needs to be re-balanced. In […]

CUDA

Feb, 2

Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform

In this paper, we introduce an optimized deep learning architecture with flexible layer structures and fast matrix operation kernels on parallel computing platform (e.g. NIVDIA’s GPU). Carefully layer-wise designed strategies are conducted to integrate different kinds of deep architectures into a uniform neural training-testing system. Our fast matrix operation kernels are implemented in deep architectures’ […]

CUDA

Feb, 2

High energy electromagnetic particle transportation on the GPU

We present massively parallel high energy electromagnetic particle transportation through a finely segmented detector on a Graphics Processing Unit (GPU). Simulating events of energetic particle decay in a general-purpose high energy physics (HEP) detector requires intensive computing resources, due to the complexity of the geometry as well as physics processes applied to particles copiously produced […]

CUDA

Feb, 1

A TBB-CUDA Implementation for Background Removal in a video-based Fire Detection System

This paper presents a parallel TBB-CUDA implementation for the acceleration single-Gaussian distribution model, which is effective for background removal in the video-based Fire Detection System. In this framework, TBB mainly deals with initializing work of the estimated Gaussian model running on CPU, and CUDA performs background removal and adaption of the model running on GPU. […]

CUDA

Feb, 1

Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs

We present a new approach for combining k-d trees and graphics processing units for nearest neighbor search. It is well known that a direct combination of these tools leads to a non-satisfying performance due to conditional computations and suboptimal memory accesses. To alleviate these problems, we propose a variant of the classical k-d tree data […]

OpenCL

Feb, 1

Speeding Up Object Detection: Fast Resizing in the Integral Image Domain

In this paper, we present an approach to resize integral images directly in the integral image domain. For the special case of resizing by a power of two, we propose a highly parallelizable variant of our approach, which is identical to bilinear resizing in the image domain in terms of results, but requires fewer operations […]

CUDA

Feb, 1

High Performance Computing of Dynamic Structural Response Analysis for the Integrated Earthquake Simulation

This paper proposes an application of high performance computing (HPC) to dynamic structural response analysis (DSRA) in order to enhance the capability and increase the efficiency of integrated earthquake simulation (IES). Object Based Structural Analysis (OBASAN) is a candidate DSRA program for IES. With OBASAN, the reliability of structural damage prediction can be increased by […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

A Real-Time, GPU-Based, Non-Imaging Back-End for Radio Telescopes

A Scalable Hybrid FPGA/GPU FX Correlator

Parallelization of the Algorithm WHAM with NVIDIA CUDA

Efficient Virtual Shadow Maps for Many Lights

Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster

Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform

High energy electromagnetic particle transportation on the GPU

A TBB-CUDA Implementation for Background Removal in a video-based Fire Detection System

Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs

Speeding Up Object Detection: Fast Resizing in the Integral Image Domain

High Performance Computing of Dynamic Structural Response Analysis for the Integrated Earthquake Simulation

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)