Papers on hgpu.org (.txt-file)
Pipeline strategies to accelerate range query processing on a multi-GPU environment
Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing Units
Pipelined MapReduce: A Decoupled MapReduce RunTime for Shared Memory Multi-Processors
Pipelined Training with Stale Weights of Deep Convolutional Neural Networks
Pipelining the Fast Multipole Method over a Runtime System
PIPS Is not (just) Polyhedral Software
PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators
Pixel-Exact Rendering of Spacetime Finite Element Solutions
PixelPie: Maximal Poisson-disk Sampling with Rasterization
Places205-VGGNet Models for Scene Recognition
Planetary-Scale Terrain Composition
Plant Leaf Modeling and Rendering Based-On GPU
Plasma Visualization in Parallel using Particle Systems on Graphical Processing Units
Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications
Platform Characterization for Domain-Specific Computing
Platform-independent parallelization of the Lattice Boltzmann method with OpenCL
Platform-Specific Optimization and Mapping of Stencil Codes through Refinement
Playdoh: A lightweight Python library for distributed computing and optimisation
PLB-HeC: A Profile-based Load-Balancing algorithm for Heterogeneous CPU-GPU Clusters
Plenoptic Rendering With Interactive Performance Using GPUs
PlinkGPU: A Framework for GPU Acceleration of Whole Genome Data Analysis
PM4Py-GPU: a High-Performance General-Purpose Library for Process Mining
PMT: Power Measurement Toolkit
PNG1 triangles for tangent plane continuous surfaces on the GPU
PoCL-R: A Scalable Low Latency Distributed OpenCL Runtime
PoCL-R: An Open Standard Based Offloading Layer for Heterogeneous Multi-Access Edge Computing with Server Side Scalability
pocl: A Performance-Portable OpenCL Implementation
Point Based Approximate Color Bleeding With Cuda
Point Based Color Bleeding with CUDA and Caching
Point Rendering in CUDA Path Tracer
Point Spread Function Estimation of Solar Surface Images with a Cooperative Particle Swarm Optimization on GPUs
Point to Line Mappings and Other Line Parameterizations not only for Hough Transform
Point to point processing of digital images using parallel computing
Point-wise Adaptive Filtering for Fast Monte Carlo Noise Reduction
Pointer Analysis for Semi-Automatic Code Parallelizers
Poisson-Boltzmann model for protein-surface electrostatic interactions and grid-convergence study using the PyGBe code
Policy-based Tuning for Performance Portability and Library Co-optimization
Polly – Polyhedral optimization in LLVM
Polly-ACC: Transparent compilation to heterogeneous hardware
Polyconvexification of the multi-label optical flow problem
Polymer Field-Theory Simulations on Graphics Processing Units
POMPEI: Programming with OpenMP4 for Exascale Investigations
PONDER – A Real time software backend for pulsar and IPS observations at the Ooty Radio Telescope
PopSparse: Accelerated block sparse matrix multiplication on IPU
Population Parallel GP on the G80 GPU
Porous Rock Simulations and Lattice Boltzmann on GPUs
Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators
Portability of Fortran’s ‘do concurrent’ on GPUs
Portable and Performant GPU/Heterogeneous Asynchronous Many-Task Runtime System
Portable and Transparent Software Managed Scheduling on Accelerators for Fair Resource Sharing
Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL)
Portable GPU-Based Artificial Neural Networks for Accelerated Data-Driven Modeling
Portable high-order finite element kernels I: Streaming Operations
Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi
Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems
Portable OpenCL Out-of-Order Execution Framework for Heterogeneous Platforms
Portable Parallel Kernels for High-Speed Beamforming in Synthetic Aperture Ultrasound Imaging
Portable parallelized blowfish via RenderScript
Portable Performance on Heterogeneous Architectures
Portable Programming Models for Heterogeneous Platforms
Portable Real-Time DCT Based Steganography Using OpenCL
Portable, high-performance containers for HPC
Portable, Scalable Approaches for Improving Asynchronous Many-Task Runtime Node Use
Portage: Bringing Hackers’ Wisdom to Science
Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA
Porting a sparse linear algebra math library to Intel GPUs
Porting and optimizing MAGFLOW on CUDA
Porting Batched Iterative Solvers onto Intel GPUs with SYCL
Porting estimation of distribution algorithms to the cell broadband engine
Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned
Porting HPC Applications to AMD Instinct MI300A Using Unified Memory and OpenMP
Porting Large HPC Applications to GPU Clusters: The Codes GENE and VERTEX
Porting marine ecosystem model spin-up using transport matrices to GPUs
Porting numerical integration codes from CUDA to oneAPI: a case study
Porting of an Edge-Based CFD Solver to GPUs
Porting OpenACC to OpenMP on heterogeneous systems
Porting to the Intel Xeon Phi: Opportunities and Challenges
Porting tree-based hash table compression to GPGPU model checking
Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines
Position-Dependent Arrays and Their Application for High Performance Code Generation
Possible planet-forming regions on submillimetre images
Poster: CUDA-Accelerated Continuous 2D Scatterplots
Poster: GPU-accelerated artificial neural network for QSAR modeling
Poster: GPU-accelerated rigid body fitting of atomic structures into electron density maps
Potential contribution of CNN-based solving of stiff ODEs and PDEs to enabling real-time Computational Engineering
Potential Energy Landscapes for the 2D XY Model: Minima, Transition States and Pathways
Potential of General Purpose Graphic Processing Unit for Energy Management System
Power analysis and optimizations for GPU architecture using a power simulator
Power analysis of sorting algorithms on FPGA using OpenCL
Power and Performance Analysis of GPU-Accelerated Systems
Power and Performance Characterization of Computational Kernels on the GPU
Power and Performance Studies of the Explicit Multi-Threading (XMT) Architecture
Power Consumption Modeling and Prediction in a Hybrid CPU-GPU-MIC Supercomputer
Power Consumption of GPUs from a Software Perspective
Power consumption of mixed precision in the iterative solution of sparse linear systems
Power Control for GPU Clusters in processing large-scale streams
Power Flow Analysis on CUDA-based GPU
Titles: 100
open PDFs: 94
packages: 27