high performance computing on graphics processing units: hgpu.org

Posts

Feb, 1

Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs

We present a new approach for combining k-d trees and graphics processing units for nearest neighbor search. It is well known that a direct combination of these tools leads to a non-satisfying performance due to conditional computations and suboptimal memory accesses. To alleviate these problems, we propose a variant of the classical k-d tree data […]

OpenCL

Feb, 1

Speeding Up Object Detection: Fast Resizing in the Integral Image Domain

In this paper, we present an approach to resize integral images directly in the integral image domain. For the special case of resizing by a power of two, we propose a highly parallelizable variant of our approach, which is identical to bilinear resizing in the image domain in terms of results, but requires fewer operations […]

CUDA

Feb, 1

High Performance Computing of Dynamic Structural Response Analysis for the Integrated Earthquake Simulation

This paper proposes an application of high performance computing (HPC) to dynamic structural response analysis (DSRA) in order to enhance the capability and increase the efficiency of integrated earthquake simulation (IES). Object Based Structural Analysis (OBASAN) is a candidate DSRA program for IES. With OBASAN, the reliability of structural damage prediction can be increased by […]

CUDA

Feb, 1

Survey on Efficient Linear Solvers for Porous Media Flow Models on Recent Hardware Architectures

In the pastfew years, High Performance Computing (HPC) technologies led to General Purpose Processing on Graphics Processing Units (GPGPU) and many-core architectures. These emerging technologies offer massive processing units and are interesting for porous media flow simulators may used for CO2 geological sequestration or Enhanced Oil Recovery (EOR) simulation. However the crucial point is "are […]

CUDA

Jan, 30

Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of the computing resources. The pressure to maintain reasonable levels of performance and portability, forces the application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic […]

CUDA

Jan, 30

Towards Efficient Risk Quantification-Using GPUs and Variance Reduction Technique

Value-at-Risk (VaR) provides information about global risk in trading. The request for high speed calculation about VaR is rising because financial institutions need to measure the risk in real time. Researchers in HPC also recently turned their attention on this kind of demanding applications. In this master thesis, we introduce two complementary and different strategies […]

CUDA

Jan, 30

A Novel Graphical Processing Unit Method for Power Systems Security Analysis

There is an increasing need for computational power to drive software tools used in power systems planning and operations, since the emergence of modern energy markets and recent renewable generation technology fundamentally alters how energy flows through the existing power grid. While special-purpose hardware, including supercomputers, has been explored for this purpose, inexpensive commodity hardware […]

CUDA

Jan, 30

Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips

Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. We investigate the efficiency of different SIMD-vectorized implementations of the RabbitCT benchmark. RabbitCT performs 3D image reconstruction by back projection, a vital operation in computed […]

CUDA

Jan, 30

GPU-Accelerated BWT Construction for Large Collection of Short Reads

Advances in DNA sequencing technology have stimulated the development of algorithms and tools for processing very large collections of short strings (reads). Short-read alignment and assembly are among the most well-studied problems. Many state-of-the-art aligners, at their core, have used the Burrows-Wheeler transform (BWT) as a main-memory index of a reference genome (typical example, NCBI […]

CUDA

Jan, 30

A GPU accelerated algorithm for 3D Delaunay triangulation

We propose the first algorithm to compute the 3D Delaunay triangulation (DT) on the GPU. Our algorithm uses massively parallel point insertion followed by bilateral flipping, a powerful local operation in computational geometry. Although a flipping algorithm is very amenable to parallel processing and has been employed to construct the 2D DT and the 3D […]

CUDA

Jan, 30

A CUDA Monte Carlo simulator for radiation therapy dosimetry based on Geant4

Geant4 is a large-scale particle physics package that facilitates every aspect of particle transport simulation. This includes, but is not limited to, geometry description, material definition, tracking of particles passing through and interacting with matter, storage of event data, and visualization. As more detailed and complex simulations are required in different application domains, there is […]

CUDA

Jan, 30

A QUDA-branch to compute disconnected diagrams in GPUs

Although QUDA allows for an efficient computation of many QCD quantities, it is surprinsingly lacking tools to evaluate disconnected diagrams, for which GPUs are specially well suited. We aim to fill this gap by creating our own branch of QUDA, which includes new kernels and functions required to calculate fermion loops using several methods and […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs

Speeding Up Object Detection: Fast Resizing in the Integral Image Domain

High Performance Computing of Dynamic Structural Response Analysis for the Integrated Earthquake Simulation

Survey on Efficient Linear Solvers for Porous Media Flow Models on Recent Hardware Architectures

Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

Towards Efficient Risk Quantification-Using GPUs and Variance Reduction Technique

A Novel Graphical Processing Unit Method for Power Systems Security Analysis

Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips

GPU-Accelerated BWT Construction for Large Collection of Short Reads

A GPU accelerated algorithm for 3D Delaunay triangulation

A CUDA Monte Carlo simulator for radiation therapy dosimetry based on Geant4

A QUDA-branch to compute disconnected diagrams in GPUs

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)