Papers on hgpu.org (.txt-file)
Programming NVIDIA cards by means of transitive closure based parallelization algorithms

Programming of shared memory GPUs shared memory systems

Programming on Parallel Machines: GPU, Multicore, Clusters and More

Programming video cards for computational electromagnetics applications
Programming with Explicit Dependencies. A Framework for Portable Parallel Programming

Programming-Model Centric Debugging for Multicore Embedded Systems

Progressive Clustering of Big Data with GPU Acceleration and Visualization

Progressive High-Quality Response Surfaces for Visually Guided Sensitivity Analysis

Progressive Photon Mapping on GPUs

Progressive Semantic Segmentation

Projected tetrahedra revisited: a barycentric formulation applied to digital radiograph reconstruction using higher-order attenuation functions

Projectile Monte-Carlo Trajectory Analysis Using a Graphics Processing Unit

Projecting Tetrahedra with a Simplified Basis Graph
PROJECTION Algorithm for Motif Finding on GPUs

Promise of embedded system with GPU in artificial leg control: Enabling time-frequency feature extraction from electromyography

ProofWright: Towards Agentic Formal Verification of CUDA

Proposition for propagated occupation grids for non-rigid moving objects tracking

Prospects for scalable 3D FFTs on heterogeneous exascale systems

Prospects of GPGPU in the Auger Offline Software Framework

pROST : A Smoothed Lp-norm Robust Online Subspace Tracking Method for Realtime Background Subtraction in Video

PROST: Parallel robust online simple tracking

Protecting Real-Time GPU Applications on Integrated CPU-GPU SoC Platforms

Protein alignment algorithms with an efficient backtracking routine on multiple GPUs

Proteus: Efficient Resource Use in Heterogeneous Architectures

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

Prototyping methodology of image processing applications on heterogeneous parallel systems

Provably Efficient GPU Algorithms

Providing performance portable numerics for Intel GPUs

Providing Source Code Level Portability Between CPU and GPU with MapCG

PSCToolkit: solving sparse linear systems with a large number of GPUs

Pseudo Random Number Generators on Graphics Processing Units, with Applications in Finance

Pseudo-random number generation for Brownian Dynamics and Dissipative Particle Dynamics simulations on GPU devices

Pseudo-Random Number Generation on GP-GPU
Pseudo-random number generators for Monte Carlo simulations on ATI Graphics Processing Units
Pseudo-random number generators for Monte Carlo simulations on Graphics Processing Units

Pseudorandom number generation on the GPU

Pseudorandom Numbers Generation for Monte Carlo Simulations on GPUs: OpenCL Approach

Pseudoscalar Meson in Two Flavors QCD with the Optimal Domain-Wall Fermion

pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations

PTask: Operating System Abstractions To Manage GPUs as Compute Devices

PTX2Kernel: Converting PTX Code into Compilable Kernels

PUGACE, a cellular Evolutionary Algorithm framework on GPUs

Pulsar Acceleration Searches on the GPU for the Square Kilometre Array

Pulsar search acceleration using FPGAs and OpenCL templates

Pulse-coupled neural network performance for real-time identification of vegetation during forced landing

Purine: A bi-graph based deep learning framework

Pushing the Envelope: Extreme Network Coding on the GPU

Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning

Pushing the limits for medical image reconstruction on recent standard multicore processors

Putting Automatic Polyhedral Compilation for GPGPU to Work

pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments

PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI

pyATF: Constraint-Based Auto-Tuning in Python

PyCOOL – a Cosmological Object-Oriented Lattice code written in Python

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

PyCUDA: GPU Run-Time Code Generation for High-Performance Computing

PyFAI, a versatile library for azimuthal regrouping

PyFAI: a Python library for high performance azimuthal integration on GPU

PyFR: An Open Source Framework for Solving Advection-Diffusion Type Problems on Streaming Architectures using the Flux Reconstruction Approach

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch

pyGSL: A Graph Structure Learning Toolkit

pyJac: analytical Jacobian generator for chemical kinetics

PyMatting: A Python Library for Alpha Matting

pyMIC: A Python Offload Module for the Intel Xeon Phi Coprocessor

PyOMP: Parallel programming for CPUs and GPUs with OpenMP and Python

pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment

PyPs, a programmable pass manager

Pyramid Methods in GPU-Based Image Processing

Pyramidal Image Blending Using CUDA Framework

PySAGES: flexible, advanced sampling methods accelerated with GPUs

PySchedCL: Leveraging Concurrency in Heterogeneous Data-Parallel Systems

PySPH: A Python framework for SPH

PySPH: a Python-based framework for smoothed particle hydrodynamics

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data

Python Non-Uniform Fast Fourier Transform (PyNUFFT): An Accelerated Non-Cartesian MRI Package on a Heterogeneous Platform (CPU/GPU)

Python Workflows on HPC Systems

Python-Based Quantum Chemistry Calculations with GPU Acceleration

PyTorch Hyperparameter Tuning – A Tutorial for spotPython

PyTorch: An Imperative Style, High-Performance Deep Learning Library

PyTorchPipe: a framework for rapid prototyping of pipelines combining language and vision

PyTransit: Fast and Easy Exoplanet Transit Modelling in Python

q-state Potts model metastability study using optimized GPU-based Monte Carlo algorithms

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays

QCD on GPUs: cost effective supercomputing

QCD simulations with staggered fermions on GPUs

QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers

QGTC: Accelerating Quantized GNN via GPU Tensor Core

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

QMCPACK: An open source ab initio Quantum Monte Carlo package for the electronic structure of atoms, molecules, and solids

QP: A Heterogeneous Multi-Accelerator Cluster

QPACE 2 and Domain Decomposition on the Intel Xeon Phi

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators

QSL Squasher: A Fast Quasi-Separatrix Layer Map Calculator

Quadratic Pseudo-Boolean Optimization for Scene Analysis using CUDA

Qualcomm Snapdragon Mobile Platform OpenCL General Programming and Optimization

Titles: 100
open PDFs: 94
packages: 40
