Papers on hgpu.org (.txt-file)
Physics and Computing Performance of the Exa.TrkX TrackML Pipeline

Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers

PhysProver: Advancing Automatic Theorem Proving for Physics

Piccolo: building fast, distributed programs with partitioned tables

PIConGPU: A Fully Relativistic Particle-in-Cell Code for a GPU Cluster

PIConGPU: Predictive Simulations of Laser-Particle Accelerators with Manycore Hardware

Piecewise Tri-linear Contouring for Multi-material Volumes

PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks

Piko: A Design Framework for Programmable Graphics Pipelines

PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural Framework

PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks

Pipeline strategies to accelerate range query processing on a multi-GPU environment

Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing Units

Pipelined MapReduce: A Decoupled MapReduce RunTime for Shared Memory Multi-Processors

Pipelined Training with Stale Weights of Deep Convolutional Neural Networks

Pipelining the Fast Multipole Method over a Runtime System

PIPS Is not (just) Polyhedral Software

PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators

Pixel-Exact Rendering of Spacetime Finite Element Solutions

PixelPie: Maximal Poisson-disk Sampling with Rasterization

Places205-VGGNet Models for Scene Recognition

Planetary-Scale Terrain Composition
Plant Leaf Modeling and Rendering Based-On GPU
Plasma Visualization in Parallel using Particle Systems on Graphical Processing Units

Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications

Platform Characterization for Domain-Specific Computing

Platform-independent parallelization of the Lattice Boltzmann method with OpenCL

Platform-Specific Optimization and Mapping of Stencil Codes through Refinement

Playdoh: A lightweight Python library for distributed computing and optimisation

PLB-HeC: A Profile-based Load-Balancing algorithm for Heterogeneous CPU-GPU Clusters

Plenoptic Rendering With Interactive Performance Using GPUs

PlinkGPU: A Framework for GPU Acceleration of Whole Genome Data Analysis

PM4Py-GPU: a High-Performance General-Purpose Library for Process Mining

PMT: Power Measurement Toolkit

PNG1 triangles for tangent plane continuous surfaces on the GPU

PoCL-R: A Scalable Low Latency Distributed OpenCL Runtime

PoCL-R: An Open Standard Based Offloading Layer for Heterogeneous Multi-Access Edge Computing with Server Side Scalability

pocl: A Performance-Portable OpenCL Implementation

Point Based Approximate Color Bleeding With Cuda

Point Based Color Bleeding with CUDA and Caching

Point Rendering in CUDA Path Tracer

Point Spread Function Estimation of Solar Surface Images with a Cooperative Particle Swarm Optimization on GPUs

Point to Line Mappings and Other Line Parameterizations not only for Hough Transform

Point to point processing of digital images using parallel computing

Point-wise Adaptive Filtering for Fast Monte Carlo Noise Reduction

Pointer Analysis for Semi-Automatic Code Parallelizers

Poisson-Boltzmann model for protein-surface electrostatic interactions and grid-convergence study using the PyGBe code

Policy-based Tuning for Performance Portability and Library Co-optimization

Polly – Polyhedral optimization in LLVM

Polly-ACC: Transparent compilation to heterogeneous hardware

Polyconvexification of the multi-label optical flow problem

Polymer Field-Theory Simulations on Graphics Processing Units

POMPEI: Programming with OpenMP4 for Exascale Investigations

PONDER – A Real time software backend for pulsar and IPS observations at the Ooty Radio Telescope

PopSparse: Accelerated block sparse matrix multiplication on IPU

Population Parallel GP on the G80 GPU

Porous Rock Simulations and Lattice Boltzmann on GPUs

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Portability of Fortran’s ‘do concurrent’ on GPUs

Portable and Performant GPU/Heterogeneous Asynchronous Many-Task Runtime System

Portable and Transparent Software Managed Scheduling on Accelerators for Fair Resource Sharing

Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL)

Portable GPU-Based Artificial Neural Networks for Accelerated Data-Driven Modeling

Portable high-order finite element kernels I: Streaming Operations

Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems

Portable OpenCL Out-of-Order Execution Framework for Heterogeneous Platforms

Portable Parallel Kernels for High-Speed Beamforming in Synthetic Aperture Ultrasound Imaging

Portable parallelized blowfish via RenderScript

Portable Performance on Heterogeneous Architectures

Portable Programming Models for Heterogeneous Platforms

Portable Real-Time DCT Based Steganography Using OpenCL

Portable, high-performance containers for HPC

Portable, Scalable Approaches for Improving Asynchronous Many-Task Runtime Node Use

Portage: Bringing Hackers’ Wisdom to Science

Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA

Porting a sparse linear algebra math library to Intel GPUs

Porting and optimizing MAGFLOW on CUDA

Porting Batched Iterative Solvers onto Intel GPUs with SYCL

Porting estimation of distribution algorithms to the cell broadband engine
Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned

Porting HPC Applications to AMD Instinct MI300A Using Unified Memory and OpenMP

Porting Large HPC Applications to GPU Clusters: The Codes GENE and VERTEX

Porting marine ecosystem model spin-up using transport matrices to GPUs

Porting numerical integration codes from CUDA to oneAPI: a case study

Porting of an Edge-Based CFD Solver to GPUs

Porting OpenACC to OpenMP on heterogeneous systems

Porting to the Intel Xeon Phi: Opportunities and Challenges

Porting tree-based hash table compression to GPGPU model checking

Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines

Position-Dependent Arrays and Their Application for High Performance Code Generation

Possible planet-forming regions on submillimetre images

Poster: CUDA-Accelerated Continuous 2D Scatterplots

Poster: GPU-accelerated artificial neural network for QSAR modeling

Poster: GPU-accelerated rigid body fitting of atomic structures into electron density maps

Potential contribution of CNN-based solving of stiff ODEs and PDEs to enabling real-time Computational Engineering

Potential Energy Landscapes for the 2D XY Model: Minima, Transition States and Pathways

Potential of General Purpose Graphic Processing Unit for Energy Management System
Titles: 100
open PDFs: 96
packages: 29
