high performance computing on graphics processing units: hgpu.org

Posts

Nov, 11

Analysis of periodic anisotropic media by means of split-field FDTD method and GPU computing

The implementation of the Split-Field Finite Difference Time-Domain (SP-FDTD) method in Graphics Pro- cessing Units is described in this work. This formalism is applied to light wave propagation through periodic media with arbitrary anisotropy. The anisotropic media is modeled by means of a permittivity tensor with non-diagonal elements and absorbing boundary conditions are also considered. […]

CUDA

Nov, 11

Using Graphic Processor Units for the Study of Electric Propagation in Realistic Heart Models

The multi-scale nature of the electrophysiology problem requires the use of fine temporal and spatial resolutions leading to models with millions of degrees of freedom that need to be solved for a thousand time steps. Solution of this problem requires the use of algorithms with higher level of parallelism in multi-core platforms. The newer programmable […]

CUDA

Nov, 11

Numerical Solutions of Heat and Mass Transfer with the Third Kind Boundary and Initial Conditions in Capillary Porous Media Using Programmable Graphics Hardware

Nowadays, a heat and mass transfer simulation plays an important role in various engineering and industrial fields. To analyze physical behaviors of a thermal environment, we have to simulate heat and mass transfer phenomena. However to obtain numerical solutions to heat and mass transfer equations is much time-consuming. In this paper, therefore, one of acceleration […]

CUDA

Nov, 11

Fast Gpu-Based Interpolation for SAR Backprojection

We introduce and discuss a parallel SAR backprojection algorithm using a Non-Uniform FFT (NUFFT) routine implemented on a GPU in CUDA language. The details of a convenient GPU implementation of the NUFFT-based SAR backprojection algorithm, amenable to further generalizations to a multi-GPU architecture, are also given. The performance of the approach is analyzed in terms […]

CUDA

Nov, 10

GPU Acceleration of Pyrosequencing Noise Removal

Amplicon Noise [1], an updated version of Pyronoise [2], is a tool for removing noise from metagenomic data recorded by a 454 pyrosequencer. Amplicon Noise has shown to be effective in reducing overestimation of operational taxonomic units (OTUs) and chimera detection. Amplicon-Noise’s noise removal method relies on clustering a large set of short sequences read […]

CUDA

Nov, 10

Sigma*: Symbolic Learning of Input-Output Specifications

We present Sigma*, a novel technique for learning symbolic models of software behavior. Sigma* addresses the challenge of synthesizing models of software by using symbolic conjectures and abstraction. By combining dynamic symbolic execution to discover symbolic input-output steps of the programs and counterexample guided abstraction refinement to over-approximate program behavior, Sigma* transforms arbitrary source representation […]

CUDA

Nov, 10

Efficient Dynamic Derived Field Generation on Many-Core Architectures Using Python

Derived field generation is a critical aspect of many visualization and analysis systems. This capability is frequently implemented by providing users with a language to create new fields and then translating their "programs" into a pipeline of filters that are combined in sequential fashion. Although this design is highly extensible and practical for development, the […]

OpenCL

Nov, 10

Parallel execution of a parameter sweep for molecular dynamics simulations in a hybrid GPU/CPU environment

Molecular Dynamics (MD) simulations can help to utimingnderstand an immense number of phenomena at the nano and microscale. They often require the exploration of large parameter space, and a possible parallelization strategy consists of sending different parameter sets to different processors. Here we present such approach using a hybrid environment of Graphic Processing Units (GPUs) […]

CUDA

Nov, 10

Studies of quantum dots: Ab initio coupled-cluster analysis using OpenCL and GPU programming

Quantum dots, that is, strongly confined electrons, show a variety of interesting properties. Of relevance in both experiments and various technical components, is the possibility to fine tune their electrical and optical properties. Quantum dots can be manufactured by a number of different techniques in practice, but we have in this thesis employed computer simulations […]

OpenCL

Nov, 8

Architectural Considerations for Compiler-guided Unroll-and-Jam of CUDA Kernels

Hundreds of cores per chip and support for fine-grain multithreading have made GPUs a central player in todays HPC world. Much of the responsibility of achieving high performance on these complex systems lies with software like the compiler. This paper describes a compiler-based strategy for automatic and profitable application of the unroll-and-jam transformation to CUDA […]

CUDA

Nov, 8

Flexible N-Way MIMO Detector on GPU

This paper proposes a flexible Multiple-Input Multiple-Output (MIMO) detector on graphics processing units (GPU). MIMO detection is a key technology in broadband wireless system such as LTE,WiMAX, and 802.11n. Existing detectors either use costly sorting for better performance or sacrifice sorting for higher throughput. To achieve good performance with high thoughput, our detector runs multiple […]

CUDA

Nov, 8

Reusable OpenCL FPGA Infrastructure

OpenCL has emerged as a standard programming model for heterogeneous systems. Recent work combining OpenCL and FPGAs has focused on high-level synthesis. Building a complete OpenCL FPGA system requires more than just high-level synthesis. This work introduces a reusable OpenCL infrastructure for FPGAs that complements previous work and specifically targets a key architectural element – […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Analysis of periodic anisotropic media by means of split-field FDTD method and GPU computing

Using Graphic Processor Units for the Study of Electric Propagation in Realistic Heart Models

Numerical Solutions of Heat and Mass Transfer with the Third Kind Boundary and Initial Conditions in Capillary Porous Media Using Programmable Graphics Hardware

Fast Gpu-Based Interpolation for SAR Backprojection

GPU Acceleration of Pyrosequencing Noise Removal

Sigma*: Symbolic Learning of Input-Output Specifications

Efficient Dynamic Derived Field Generation on Many-Core Architectures Using Python

Parallel execution of a parameter sweep for molecular dynamics simulations in a hybrid GPU/CPU environment

Studies of quantum dots: Ab initio coupled-cluster analysis using OpenCL and GPU programming

Architectural Considerations for Compiler-guided Unroll-and-Jam of CUDA Kernels

Flexible N-Way MIMO Detector on GPU

Reusable OpenCL FPGA Infrastructure

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)