high performance computing on graphics processing units: hgpu.org

Posts

Jan, 23

Implementing Open-Source CUDA Runtime

Graphics processing units (GPUs) are the state of the art embracing the concept of many-core technology. Their significant advantage in performance and performanceper-watt compared to traditional microprocessors has facilitated development of GPUs in many compute applications. However, GPUs are often treated as "black-box" devices due to proprietary strategies of hardware vendors. One of the greatest […]

CUDA

Jan, 22

Data parallel patterns on CPU/GPU mix

We propose a model that uses a small set of quite simple parameters to devise a proper partitioning{between CPU and GPU cores{of the tasks deriving from structured data parallel patterns/algorithmic skeletons. The model takes into account both hardware related and application dependent parameters. It eventually computes the percentage of tasks to be executed on CPU […]

CUDA

Jan, 22

A GPU Accelerated Navier-Stokes Solver with Multi-level Granularity for Solving Sparse Implicit Systems

In recent years, researchers have employed a wide array of multi-physics computational tools, of varying sophistication, to simulate brownout conditions [1-3]. Among these tools, compressible high-fidelity Reynolds-Averaged Navier Stokes (RANS) solvers [3] depend the least on empirical assumptions. However, the high computational expense involved in RANS simulations of viscous, rotary environments, makes it less attractive […]

CUDA

Jan, 22

Accelerating Fast Fourier Transforms Using Hadoop and CUDA

There has been considerable research into improving Fast Fourier Transform (FFT) performance through parallelization and optimization for specialized hardware. However, even with those advancements, processing of very large files, over 1TB in size, still remains prohibitively slow. Analysts performing signal processing are forced to wait hours or days for results, which results in a disruption […]

CUDA

Jan, 22

The PEPPHER Composition Tool: Performance-Aware Dynamic Composition of Applications for GPU-based Systems

The PEPPHER component model defines an environment for annotation of native C/C++ based components for homogeneous and heterogeneous multicore and manycore systems, including GPU and multi-GPU based systems. For the same computational functionality, captured as a component, different sequential and explicitly parallel implementation variants using various types of execution units might be provided, together with […]

CUDA

Jan, 22

Analysis of Metallic Nanostructures by a Discontinuous Galerkin Time-Domain Maxwell Solver on Graphics Processing Units

In this thesis, we examine the optical properties of metallic nanostructures with typical feature sizes of the order of visible light. The interaction of light with such structures can be accurately described by classical electrodynamics. Thus, for the analysis of metallic nanostructures within this thesis, we will employ Maxwell’s equations [1] to model the physical […]

CUDA

Jan, 20

Accelerating Image Reconstruction in Dual-Head PET System by GPU and Symmetry Properties

Positron emission tomography (PET) is an important imaging modality in both clinical usage and research studies. We have developed a compact high-sensitivity PET system that consisted of two large-area panel PET detector heads, which produce more than 224 million lines of response and thus request dramatic computational demands. In this work, we employed a state-of-the-art […]

CUDA

Jan, 20

The Fast Multipole Method on the Cell processor

This paper presents the first deployment of the Fast Multipole Method on the Cell processor (PowerXCell 8i). We rely on the matrix formulation with BLAS routines of the FMB code (Fast Multipole with BLAS) in order to directly and efficiently offload the most time consuming operators of both far field and near field computations on […]

Jan, 20

Fast Positron Range Calculation in Heterogeneous Media for 3D PET Reconstruction

This paper presents a fast GPU-based solution to compensate positron range effects in heterogeneous media for iterative PET reconstruction. We assume a factorized approach, where projections are decomposed to phases according to the main physical effects. Positron range is the first effect in this chain, which causes a spatially varying blurring according to local material […]

CUDA

Jan, 20

Parallel Distributed Face Search System for National and Border Security

The CCTV surveillance industry is undergoing a sea change due to the adoption of IP technologies. This is allowing the integration of a plethora of new cameras and other sensors into huge integrated networks. Adoption of IP technologies is presenting opportunities for scalable visual analytics that has the potential to add enormous value to entire […]

OpenCL

Jan, 19

Duality based optical flow algorithms with applications

We consider the popular TV-L^1 optical flow formulation, and the so-called duality based algorithm for minimizing the TV-L^1 energy. The original formulation is extended to allow for vector valued images, and minimization results are given. In addition we consider different definitions of total variation regularization, and related formulations of the optical flow problem that may […]

CUDA

Jan, 18

GPU-accelererated regularisation of large diffusion-tensor volumes

We discuss the benefits, difficulties, and performance of a GPU implementation of the Chambolle-Pock algorithm for TGV (total generalised variation) denoising of medical diffusion tensor images. Whereas we have previously studied the denoising of 2D slices of $2 times 2$ and $3 times 3$ tensors, attaining satisfactory performance on a normal CPU, here we concentrate […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Implementing Open-Source CUDA Runtime

Data parallel patterns on CPU/GPU mix

A GPU Accelerated Navier-Stokes Solver with Multi-level Granularity for Solving Sparse Implicit Systems

Accelerating Fast Fourier Transforms Using Hadoop and CUDA

The PEPPHER Composition Tool: Performance-Aware Dynamic Composition of Applications for GPU-based Systems

Analysis of Metallic Nanostructures by a Discontinuous Galerkin Time-Domain Maxwell Solver on Graphics Processing Units

Accelerating Image Reconstruction in Dual-Head PET System by GPU and Symmetry Properties

The Fast Multipole Method on the Cell processor

Fast Positron Range Calculation in Heterogeneous Media for 3D PET Reconstruction

Parallel Distributed Face Search System for National and Border Security

Duality based optical flow algorithms with applications

GPU-accelererated regularisation of large diffusion-tensor volumes

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)