high performance computing on graphics processing units: hgpu.org

Posts

Feb, 5

Bone structure analysis on multiple GPGPUs

Osteoporosis is a disease that affects a growing number of people by increasing the fragility of their bones. To improve the understanding of the bone quality, large scale computer simulations are applied. A fast, scalable and memory efficient solver for such problems is ParOSol. It uses the preconditioned conjugate gradient algorithm with a multigrid preconditioner. […]

CUDA

Feb, 4

Strategies for Maximizing Utilization in multi-CPU & multi-GPU Heterogeneous Architectures

This paper explores the possibility of efficiently executing a single application using multicores simultaneously with multiple GPU accelerators under a parallel task programming paradigm. In particular, we address the challenge of extending a parallel for template to allow its exploitation on heterogeneous architectures. Previous task frameworks that offer support for heterogeneous systems implement a variety […]

CUDA

Feb, 4

Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA

Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have […]

CUDA

Feb, 4

Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that […]

CUDA

•

OpenCL

Feb, 4

A Real-Time, GPU-Based, Non-Imaging Back-End for Radio Telescopes

Since the discovery of RRATs, interest in single pulse radio searches has increased dramatically. Due to the large data volumes generated by these searches, especially in planned surveys for future radio telescopes, such searches have to be conducted in real-time. This has led to the development of a multitude of search techniques and real-time pipeline […]

CUDA

Feb, 4

A Scalable Hybrid FPGA/GPU FX Correlator

Radio astronomical imaging arrays comprising large numbers of antennas, O(10^2-10^3) have posed a signal processing challenge because of the required O(N^2) cross correlation of signals from each antenna and requisite signal routing. This motivated the implementation of a Packetized Correlator architecture that applies Field Programmable Gate Arrays (FPGAs) to the O(N) "F-stage" transforming time domain […]

CUDA

Feb, 2

Parallelization of the Algorithm WHAM with NVIDIA CUDA

The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molecular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing […]

CUDA

Feb, 2

Efficient Virtual Shadow Maps for Many Lights

Recently, several algorithms have been introduced that enable real-time performance for many lights in applications such as games. In this paper, we explore the use of hardware-supported virtual cube-map shadows to efficiently implement high-quality shadows from hundreds of light sources in real time and within a bounded memory footprint. In addition, we explore the utility […]

CUDA

•

OpenGL

Feb, 2

Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster

Nowadays computer applications are becoming heavier and require, at the same time, real-time results. The Heterogeneous clusters with their computing power represent a good solution to this request. However, it is possible that during the execution, a computing element of the cluster becomes defaulting, needs maintenance, or that the load needs to be re-balanced. In […]

CUDA

Feb, 2

Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform

In this paper, we introduce an optimized deep learning architecture with flexible layer structures and fast matrix operation kernels on parallel computing platform (e.g. NIVDIA’s GPU). Carefully layer-wise designed strategies are conducted to integrate different kinds of deep architectures into a uniform neural training-testing system. Our fast matrix operation kernels are implemented in deep architectures’ […]

CUDA

Feb, 2

High energy electromagnetic particle transportation on the GPU

We present massively parallel high energy electromagnetic particle transportation through a finely segmented detector on a Graphics Processing Unit (GPU). Simulating events of energetic particle decay in a general-purpose high energy physics (HEP) detector requires intensive computing resources, due to the complexity of the geometry as well as physics processes applied to particles copiously produced […]

CUDA

Feb, 1

A TBB-CUDA Implementation for Background Removal in a video-based Fire Detection System

This paper presents a parallel TBB-CUDA implementation for the acceleration single-Gaussian distribution model, which is effective for background removal in the video-based Fire Detection System. In this framework, TBB mainly deals with initializing work of the estimated Gaussian model running on CPU, and CUDA performs background removal and adaption of the model running on GPU. […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Bone structure analysis on multiple GPGPUs

Strategies for Maximizing Utilization in multi-CPU & multi-GPU Heterogeneous Architectures

Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA

Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

A Real-Time, GPU-Based, Non-Imaging Back-End for Radio Telescopes

A Scalable Hybrid FPGA/GPU FX Correlator

Parallelization of the Algorithm WHAM with NVIDIA CUDA

Efficient Virtual Shadow Maps for Many Lights

Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster

Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform

High energy electromagnetic particle transportation on the GPU

A TBB-CUDA Implementation for Background Removal in a video-based Fire Detection System

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)