Papers on hgpu.org (.txt-file)
Pseudoscalar Meson in Two Flavors QCD with the Optimal Domain-Wall Fermion

pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations

PTask: Operating System Abstractions To Manage GPUs as Compute Devices

PTX2Kernel: Converting PTX Code into Compilable Kernels

PUGACE, a cellular Evolutionary Algorithm framework on GPUs

Pulsar Acceleration Searches on the GPU for the Square Kilometre Array

Pulsar search acceleration using FPGAs and OpenCL templates

Pulse-coupled neural network performance for real-time identification of vegetation during forced landing

Purine: A bi-graph based deep learning framework

Pushing the Envelope: Extreme Network Coding on the GPU

Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning

Pushing the limits for medical image reconstruction on recent standard multicore processors

Putting Automatic Polyhedral Compilation for GPGPU to Work

pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments

PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI

pyATF: Constraint-Based Auto-Tuning in Python

PyCOOL – a Cosmological Object-Oriented Lattice code written in Python

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

PyCUDA: GPU Run-Time Code Generation for High-Performance Computing

PyFAI, a versatile library for azimuthal regrouping

PyFAI: a Python library for high performance azimuthal integration on GPU

PyFR: An Open Source Framework for Solving Advection-Diffusion Type Problems on Streaming Architectures using the Flux Reconstruction Approach

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch

pyGSL: A Graph Structure Learning Toolkit

pyJac: analytical Jacobian generator for chemical kinetics

PyMatting: A Python Library for Alpha Matting

pyMIC: A Python Offload Module for the Intel Xeon Phi Coprocessor

PyOMP: Parallel programming for CPUs and GPUs with OpenMP and Python

pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment

PyPs, a programmable pass manager

Pyramid Methods in GPU-Based Image Processing

Pyramidal Image Blending Using CUDA Framework

PySAGES: flexible, advanced sampling methods accelerated with GPUs

PySchedCL: Leveraging Concurrency in Heterogeneous Data-Parallel Systems

PySPH: A Python framework for SPH

PySPH: a Python-based framework for smoothed particle hydrodynamics

PystachIO: Efficient Distributed GPU Query Processing with PyTorch over Fast Networks & Fast Storage

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data

Python Non-Uniform Fast Fourier Transform (PyNUFFT): An Accelerated Non-Cartesian MRI Package on a Heterogeneous Platform (CPU/GPU)

Python Workflows on HPC Systems

Python-Based Quantum Chemistry Calculations with GPU Acceleration

PyTorch Hyperparameter Tuning – A Tutorial for spotPython

PyTorch: An Imperative Style, High-Performance Deep Learning Library

PyTorchPipe: a framework for rapid prototyping of pipelines combining language and vision

PyTransit: Fast and Easy Exoplanet Transit Modelling in Python

q-state Potts model metastability study using optimized GPU-based Monte Carlo algorithms

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays

QCD on GPUs: cost effective supercomputing

QCD simulations with staggered fermions on GPUs

QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers

QGTC: Accelerating Quantized GNN via GPU Tensor Core

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

QMCPACK: An open source ab initio Quantum Monte Carlo package for the electronic structure of atoms, molecules, and solids

QP: A Heterogeneous Multi-Accelerator Cluster

QPACE 2 and Domain Decomposition on the Intel Xeon Phi

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators

QSL Squasher: A Fast Quasi-Separatrix Layer Map Calculator

Quadratic Pseudo-Boolean Optimization for Scene Analysis using CUDA

Qualcomm Snapdragon Mobile Platform OpenCL General Programming and Optimization

Quality comparison and acceleration for digital hologram generation method based on segmentation

Quality-score guided error correction for short-read sequencing data using CUDA
Quantifying NUMA and contention effects in multi-GPU systems

Quantifying OpenMP: Statistical Insights into Usage and Adoption

Quantifying the Energy Efficiency of FFT on Heterogeneous Platforms

Quantifying the Energy Efficiency of Object Recognition and Optical Flow

Quantifying the Impact of GPUs on Performance and Energy Efficiency in HPC Clusters

Quantile Mechanics II: Changes of Variables in Monte Carlo methods and a GPU-Optimized Normal Quantile

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

Quantum Boolean Image Denoising

Quantum chemical many-body theory on heterogeneous nodes

Quantum Chemistry for Solvated Molecules on Graphical Processing Units (GPUs) using Polarizable Continuum Models

Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation

Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation

Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics

Quantum computer simulation using the CUDA programming model
Quantum Monte Carlo on graphical processing units

Quantum.Ligand.Dock: protein-ligand docking with quantum entanglement refinement on a GPU system

Quartile and Outlier Detection on Heterogeneous Clusters Using Distributed Radix Sort

Quasars spectra classification with the help of GPU computing

Quasi-maximum Accuracy Floating-point Computations with GPGPU for Applications in Digital Signal Processing

Quasi-real-time analysis of dynamic near field scattering data using a graphics processing unit

QUDA programming for staggered quarks

Query Optimization in Heterogeneous CPU/GPU Environment for Time Series Databases

Query Processing on Tensor Computation Runtimes

Query-Driven Visualization of Time-Varying Adaptive Mesh Refinement Data

Quick-CULLIDE: fast inter- and intra-object collision culling using graphics hardware

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

QuickProbs – A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

Quine-McCluskey algorithm on GPGPU

QYMSYM: A GPU-Accelerated Hybrid Symplectic Integrator That Permits Close Encounters

R2GUESS: A Graphics Processing Unit-Based R Package for Bayesian Variable Selection Regression of Multivariate Responses

Radeon PRO Solid State Graphics (SSG) API User Manual

Radial Basis Function Networks GPU-Based Implementation
Radiation Modeling Using the Uintah Heterogeneous CPU/GPU Runtime System

Radiative Heat Transfer Simulation Using Programmable Graphics Hardware
Radio astronomy beam forming on GPUs

Radio Astronomy Beam Forming on Many-Core Architectures

Titles: 100
open PDFs: 96
packages: 42
