Papers on hgpu.org (.txt-file)
Performance Study of LU Decomposition on the Programmable GPU
Performance study of mapping irregular computations on GPUs
Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA
Performance study of using the Direct Compute API for implementing Support vector machines on GPUs
Performance study on GPU offloading techniques using the Gauss matrix inverse algorithm
Performance Testing of GPU-Based Approximate Matching Algorithm on Network Traffic
Performance Tradeoff Spectrum of Integer and Floating Point Applications
Performance Tradeoff Spectrum of Integer and Floating Point Applications Kernels on Various GPUs
Performance Traps in OpenCL for CPUs
Performance Tuning for CUDA-Accelerated Neighborhood Denoising Filters
Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies
Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs
Performance-Analysis-Based Acceleration of Image Quality Assessment
Performance-aware component composition for GPU-based systems
Performance-Correctness Challenges in Emerging Heterogeneous Multicore Processors
Performance-efficient mechanisms for managing irregularity in throughput processors
Performance-Oriented Neural Architecture Search
Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond
Performance/power assessment of CNN packages on embedded automotive platforms
Performant low-order matrix-free finite element kernels on GPU architectures
Performing DCT8x8 Computation on GPU Using NVIDIA CUDA Technology
Performing efficient NURBS modeling operations on the GPU
PeriPy – A High Performance OpenCL Peridynamics Package
permGPU: Using graphics processing units in RNA microarray association studies
Permutation Index and GPU to Solve efficiently Many Queries
Persistent Kernels for Iterative Memory-bound GPU Applications
Persistent RNNs: Stashing Recurrent Weights On-Chip
Perturbation Functions in Computer Graphics
Petaflop biofluidics simulations on a two million-core system
Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems
Petascale computations for Large-scale Atomic and Molecular collisions
Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures
Petascale elliptic solvers for anisotropic PDEs on GPU clusters
Petascale turbulence simulation using a highly parallel fast multipole method
Petascale visualization: Approaches and initial results
PFAC Library: GPU-based string matching algorithm
PFunc: modern task parallelism for modern high performance computing
PG-PuReMD: A Parallel-GPU Reactive Molecular Dynamics Package
PGEM: Preemptive GPGPU Execution Model for Runtime Engines
Pgx: Hardware-accelerated parallel game simulation for reinforcement learning
Phase Based Volume Registration on the GPU with Application to Quantitative MRI
Phase Based Volume Registration Using CUDA
Phase diagram and critical behavior of the square-lattice Ising model with competing nearest- and next-nearest-neighbor interactions
Phase Transition in 3d Heisenberg Spin Glasses with Strong Random Anisotropies, through a Multi-GPU Parallelization
phiGEMM: a CPU-GPU library for porting Quantum ESPRESSO on hybrid systems
Phoenix: A Runtime Environment for High Performance Computing on Chip Multiprocessors
Photon mapping on programmable graphics hardware
Physical and graphical effects in OpenCL by example
Physical modeling and high-performance GPU computing for characterization, interception, and disruption of hazardous near-Earth objects
Physically Based Rendering: Implementation of Path Tracer
Physically-Based Interactive Flow Visualization Based on Schlieren and Interferometry Experimental Techniques
Physically-based interactive schlieren flow visualization
Physically-based painting style 3D image synthesis using GPU
Physically-Based Sound Synthesis on GPUs
Physically-based visual simulation on graphics hardware
Physics and Computing Performance of the Exa.TrkX TrackML Pipeline
Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers
Piccolo: building fast, distributed programs with partitioned tables
PIConGPU: A Fully Relativistic Particle-in-Cell Code for a GPU Cluster
PIConGPU: Predictive Simulations of Laser-Particle Accelerators with Manycore Hardware
Piecewise Tri-linear Contouring for Multi-material Volumes
PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks
Piko: A Design Framework for Programmable Graphics Pipelines
PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural Framework
PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks
Pipeline strategies to accelerate range query processing on a multi-GPU environment
Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing Units
Pipelined MapReduce: A Decoupled MapReduce RunTime for Shared Memory Multi-Processors
Pipelined Training with Stale Weights of Deep Convolutional Neural Networks
Pipelining the Fast Multipole Method over a Runtime System
PIPS Is not (just) Polyhedral Software
PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators
Pixel-Exact Rendering of Spacetime Finite Element Solutions
PixelPie: Maximal Poisson-disk Sampling with Rasterization
Places205-VGGNet Models for Scene Recognition
Planetary-Scale Terrain Composition
Plant Leaf Modeling and Rendering Based-On GPU
Plasma Visualization in Parallel using Particle Systems on Graphical Processing Units
Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications
Platform Characterization for Domain-Specific Computing
Platform-independent parallelization of the Lattice Boltzmann method with OpenCL
Platform-Specific Optimization and Mapping of Stencil Codes through Refinement
Playdoh: A lightweight Python library for distributed computing and optimisation
PLB-HeC: A Profile-based Load-Balancing algorithm for Heterogeneous CPU-GPU Clusters
Plenoptic Rendering With Interactive Performance Using GPUs
PlinkGPU: A Framework for GPU Acceleration of Whole Genome Data Analysis
PM4Py-GPU: a High-Performance General-Purpose Library for Process Mining
PMT: Power Measurement Toolkit
PNG1 triangles for tangent plane continuous surfaces on the GPU
PoCL-R: A Scalable Low Latency Distributed OpenCL Runtime
PoCL-R: An Open Standard Based Offloading Layer for Heterogeneous Multi-Access Edge Computing with Server Side Scalability
pocl: A Performance-Portable OpenCL Implementation
Point Based Approximate Color Bleeding With Cuda
Point Based Color Bleeding with CUDA and Caching
Point Rendering in CUDA Path Tracer
Point Spread Function Estimation of Solar Surface Images with a Cooperative Particle Swarm Optimization on GPUs
Point to Line Mappings and Other Line Parameterizations not only for Hough Transform
Titles: 100
open PDFs: 95
packages: 26