Papers on hgpu.org (.txt-file)
Performant low-order matrix-free finite element kernels on GPU architectures
Performing DCT8x8 Computation on GPU Using NVIDIA CUDA Technology
Performing efficient NURBS modeling operations on the GPU
PeriPy – A High Performance OpenCL Peridynamics Package
permGPU: Using graphics processing units in RNA microarray association studies
Permutation Index and GPU to Solve efficiently Many Queries
Persistent Kernels for Iterative Memory-bound GPU Applications
Persistent RNNs: Stashing Recurrent Weights On-Chip
Perturbation Functions in Computer Graphics
Petaflop biofluidics simulations on a two million-core system
Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems
Petascale computations for Large-scale Atomic and Molecular collisions
Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures
Petascale elliptic solvers for anisotropic PDEs on GPU clusters
Petascale turbulence simulation using a highly parallel fast multipole method
Petascale visualization: Approaches and initial results
PFAC Library: GPU-based string matching algorithm
PFunc: modern task parallelism for modern high performance computing
PG-PuReMD: A Parallel-GPU Reactive Molecular Dynamics Package
PGEM: Preemptive GPGPU Execution Model for Runtime Engines
Pgx: Hardware-accelerated parallel game simulation for reinforcement learning
Phase Based Volume Registration on the GPU with Application to Quantitative MRI
Phase Based Volume Registration Using CUDA
Phase diagram and critical behavior of the square-lattice Ising model with competing nearest- and next-nearest-neighbor interactions
Phase Transition in 3d Heisenberg Spin Glasses with Strong Random Anisotropies, through a Multi-GPU Parallelization
phiGEMM: a CPU-GPU library for porting Quantum ESPRESSO on hybrid systems
Phoenix: A Runtime Environment for High Performance Computing on Chip Multiprocessors
Photon mapping on programmable graphics hardware
Physical and graphical effects in OpenCL by example
Physical modeling and high-performance GPU computing for characterization, interception, and disruption of hazardous near-Earth objects
Physically Based Rendering: Implementation of Path Tracer
Physically-Based Interactive Flow Visualization Based on Schlieren and Interferometry Experimental Techniques
Physically-based interactive schlieren flow visualization
Physically-based painting style 3D image synthesis using GPU
Physically-Based Sound Synthesis on GPUs
Physically-based visual simulation on graphics hardware
Physics and Computing Performance of the Exa.TrkX TrackML Pipeline
Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers
Piccolo: building fast, distributed programs with partitioned tables
PIConGPU: A Fully Relativistic Particle-in-Cell Code for a GPU Cluster
PIConGPU: Predictive Simulations of Laser-Particle Accelerators with Manycore Hardware
Piecewise Tri-linear Contouring for Multi-material Volumes
PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks
Piko: A Design Framework for Programmable Graphics Pipelines
PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural Framework
PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks
Pipeline strategies to accelerate range query processing on a multi-GPU environment
Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing Units
Pipelined MapReduce: A Decoupled MapReduce RunTime for Shared Memory Multi-Processors
Pipelined Training with Stale Weights of Deep Convolutional Neural Networks
Pipelining the Fast Multipole Method over a Runtime System
PIPS Is not (just) Polyhedral Software
PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators
Pixel-Exact Rendering of Spacetime Finite Element Solutions
PixelPie: Maximal Poisson-disk Sampling with Rasterization
Places205-VGGNet Models for Scene Recognition
Planetary-Scale Terrain Composition
Plant Leaf Modeling and Rendering Based-On GPU
Plasma Visualization in Parallel using Particle Systems on Graphical Processing Units
Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications
Platform Characterization for Domain-Specific Computing
Platform-independent parallelization of the Lattice Boltzmann method with OpenCL
Platform-Specific Optimization and Mapping of Stencil Codes through Refinement
Playdoh: A lightweight Python library for distributed computing and optimisation
PLB-HeC: A Profile-based Load-Balancing algorithm for Heterogeneous CPU-GPU Clusters
Plenoptic Rendering With Interactive Performance Using GPUs
PlinkGPU: A Framework for GPU Acceleration of Whole Genome Data Analysis
PM4Py-GPU: a High-Performance General-Purpose Library for Process Mining
PMT: Power Measurement Toolkit
PNG1 triangles for tangent plane continuous surfaces on the GPU
PoCL-R: A Scalable Low Latency Distributed OpenCL Runtime
PoCL-R: An Open Standard Based Offloading Layer for Heterogeneous Multi-Access Edge Computing with Server Side Scalability
pocl: A Performance-Portable OpenCL Implementation
Point Based Approximate Color Bleeding With Cuda
Point Based Color Bleeding with CUDA and Caching
Point Rendering in CUDA Path Tracer
Point Spread Function Estimation of Solar Surface Images with a Cooperative Particle Swarm Optimization on GPUs
Point to Line Mappings and Other Line Parameterizations not only for Hough Transform
Point to point processing of digital images using parallel computing
Point-wise Adaptive Filtering for Fast Monte Carlo Noise Reduction
Pointer Analysis for Semi-Automatic Code Parallelizers
Poisson-Boltzmann model for protein-surface electrostatic interactions and grid-convergence study using the PyGBe code
Policy-based Tuning for Performance Portability and Library Co-optimization
Polly – Polyhedral optimization in LLVM
Polly-ACC: Transparent compilation to heterogeneous hardware
Polyconvexification of the multi-label optical flow problem
Polymer Field-Theory Simulations on Graphics Processing Units
POMPEI: Programming with OpenMP4 for Exascale Investigations
PONDER – A Real time software backend for pulsar and IPS observations at the Ooty Radio Telescope
PopSparse: Accelerated block sparse matrix multiplication on IPU
Population Parallel GP on the G80 GPU
Porous Rock Simulations and Lattice Boltzmann on GPUs
Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators
Portability of Fortran’s ‘do concurrent’ on GPUs
Portable and Performant GPU/Heterogeneous Asynchronous Many-Task Runtime System
Portable and Transparent Software Managed Scheduling on Accelerators for Fair Resource Sharing
Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL)
Titles: 100
open PDFs: 95
packages: 29