high performance computing on graphics processing units: hgpu.org

Posts

Feb, 1

High Performance Computing of Dynamic Structural Response Analysis for the Integrated Earthquake Simulation

This paper proposes an application of high performance computing (HPC) to dynamic structural response analysis (DSRA) in order to enhance the capability and increase the efficiency of integrated earthquake simulation (IES). Object Based Structural Analysis (OBASAN) is a candidate DSRA program for IES. With OBASAN, the reliability of structural damage prediction can be increased by […]

CUDA

Feb, 1

Survey on Efficient Linear Solvers for Porous Media Flow Models on Recent Hardware Architectures

In the pastfew years, High Performance Computing (HPC) technologies led to General Purpose Processing on Graphics Processing Units (GPGPU) and many-core architectures. These emerging technologies offer massive processing units and are interesting for porous media flow simulators may used for CO2 geological sequestration or Enhanced Oil Recovery (EOR) simulation. However the crucial point is "are […]

CUDA

Jan, 30

Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of the computing resources. The pressure to maintain reasonable levels of performance and portability, forces the application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic […]

CUDA

Jan, 30

Towards Efficient Risk Quantification-Using GPUs and Variance Reduction Technique

Value-at-Risk (VaR) provides information about global risk in trading. The request for high speed calculation about VaR is rising because financial institutions need to measure the risk in real time. Researchers in HPC also recently turned their attention on this kind of demanding applications. In this master thesis, we introduce two complementary and different strategies […]

CUDA

Jan, 30

A Novel Graphical Processing Unit Method for Power Systems Security Analysis

There is an increasing need for computational power to drive software tools used in power systems planning and operations, since the emergence of modern energy markets and recent renewable generation technology fundamentally alters how energy flows through the existing power grid. While special-purpose hardware, including supercomputers, has been explored for this purpose, inexpensive commodity hardware […]

CUDA

Jan, 30

Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips

Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. We investigate the efficiency of different SIMD-vectorized implementations of the RabbitCT benchmark. RabbitCT performs 3D image reconstruction by back projection, a vital operation in computed […]

CUDA

Jan, 30

GPU-Accelerated BWT Construction for Large Collection of Short Reads

Advances in DNA sequencing technology have stimulated the development of algorithms and tools for processing very large collections of short strings (reads). Short-read alignment and assembly are among the most well-studied problems. Many state-of-the-art aligners, at their core, have used the Burrows-Wheeler transform (BWT) as a main-memory index of a reference genome (typical example, NCBI […]

CUDA

Jan, 30

A GPU accelerated algorithm for 3D Delaunay triangulation

We propose the first algorithm to compute the 3D Delaunay triangulation (DT) on the GPU. Our algorithm uses massively parallel point insertion followed by bilateral flipping, a powerful local operation in computational geometry. Although a flipping algorithm is very amenable to parallel processing and has been employed to construct the 2D DT and the 3D […]

CUDA

Jan, 30

A CUDA Monte Carlo simulator for radiation therapy dosimetry based on Geant4

Geant4 is a large-scale particle physics package that facilitates every aspect of particle transport simulation. This includes, but is not limited to, geometry description, material definition, tracking of particles passing through and interacting with matter, storage of event data, and visualization. As more detailed and complex simulations are required in different application domains, there is […]

CUDA

Jan, 30

A QUDA-branch to compute disconnected diagrams in GPUs

Although QUDA allows for an efficient computation of many QCD quantities, it is surprinsingly lacking tools to evaluate disconnected diagrams, for which GPUs are specially well suited. We aim to fill this gap by creating our own branch of QUDA, which includes new kernels and functions required to calculate fermion loops using several methods and […]

CUDA

Jan, 29

A Detailed GPU Cache Model Based on Reuse Distance Theory

As modern GPUs rely partly on their on-chip memories to counter the imminent off-chip memory wall, the efficient use of their caches has become important for performance and energy. However, optimising cache locality systematically requires insight into and prediction of cache behaviour. On sequential processors, stack distance or reuse distance theory is a well-known means […]

CUDA

Jan, 29

Hybrid algorithms for efficient Cholesky decomposition and matrix inverse using multicore CPUs with GPU accelerators

The use of linear algebra routines is fundamental to many areas of computational science, yet their implementation in software still forms the main computational bottleneck in many widely used algorithms. In machine learning and computational statistics, for example, the use of Gaussian distributions is ubiquitous, and routines for calculating the Cholesky decomposition, matrix inverse and […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

High Performance Computing of Dynamic Structural Response Analysis for the Integrated Earthquake Simulation

Survey on Efficient Linear Solvers for Porous Media Flow Models on Recent Hardware Architectures

Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

Towards Efficient Risk Quantification-Using GPUs and Variance Reduction Technique

A Novel Graphical Processing Unit Method for Power Systems Security Analysis

Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips

GPU-Accelerated BWT Construction for Large Collection of Short Reads

A GPU accelerated algorithm for 3D Delaunay triangulation

A CUDA Monte Carlo simulator for radiation therapy dosimetry based on Geant4

A QUDA-branch to compute disconnected diagrams in GPUs

A Detailed GPU Cache Model Based on Reuse Distance Theory

Hybrid algorithms for efficient Cholesky decomposition and matrix inverse using multicore CPUs with GPU accelerators

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)