Papers on hgpu.org (.txt-file)
Processing of synthetic Aperture Radar data with GPGPU
Processing OLTP Workloads on Hybrid CPU/GPU Systems

Processing Posting Lists Using OpenCL

Processing XPath Structural Constraints on GPU

Production Floating Point Applications on FPGAs

Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

Productive and Efficient Computational Science Through Domain-specific Abstractions

Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages

Productive Performance Engineering for Weather and Climate Modeling with Python

Productivity, Portability, Performance: Data-Centric Python

Professional CUDA C Programming

Profile Util library: A quick and easy way to get MPI, OpenMP and GPU runtime information

Profile-guided optimization of critical medical imaging algorithms

Profiling Apple Silicon Performance for ML Training

Profiling based Out-of-core Hybrid Method for Large Neural Networks

Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson – Extended

Profiling General Purpose GPU Applications

Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms

Profiling High Level Heterogeneous Programs: Using the SPOC GPGPU framework for OCaml

Profiling of Data-Parallel Processors

ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler

Program Acceleration in a Heterogeneous Computing Environment Using OpenCL, FPGA, and CPU

Program Analysis and Machine Learning based Approach to Predict Power Consumption of CUDA Kernel

Program optimization carving for GPU computing?
Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+
Program Optimization of Stencil Based Application on the GPU-Accelerated System
Program optimization space pruning for a multithreaded gpu

Program Optimization Strategies for Data-Parallel Many-Core Processors

Program Optimization Study on a 128-Core GPU

PROGRAML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems

Programmability: Design Costs and Payoffs using AMD GPU Streaming Languages and Traditional Multi-Core Libraries

Programmable and Scalable Architecture for Graphics Processing Units

Programmable shaders for deformation rendering

Programming Abstractions and Optimization Techniques for GPU-based Heterogeneous Systems

Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume Method

Programming and Scheduling Model for Supporting Heterogeneous Accelerators in Linux

Programming CUDA and OpenCL: A Case Study Using Modern C++ Libraries

Programming Dense Linear Algebra Kernels on Vectorized Architectures

Programming Embedded Manycore: Refinement and Optimizing Compilation of a Parallel Action Language for Hierarchical State Machines

Programming finite-difference time-domain for graphics processor units using compute unified device architecture

Programming for scientific computing on peta-scale heterogeneous parallel systems

Programming framework for clusters with heterogeneous accelerators

Programming Frameworks for Distributed Smartphone Computing

Programming Future Parallel Architectures with Haskell and Intel ArBB

Programming GPUs with C++14 and Just-In-Time Compilation

Programming Heterogeneous Systems from an Image Processing DSL

Programming Heterogeneous Systems with General and Domain-Specific Frameworks

Programming hybrid systems with implicit memory based synchronization

Programming in CUDA for Kepler and Maxwell Architecture

Programming issues for video analysis on Graphics Processing Units

Programming Massively Parallel Architectures using MARTE: a Case Study

Programming massively parallel processors : A Hands – on approach
Programming Massively Parallel Processors with CUDA (audio course)

Programming model for a heterogeneous x86 platform
Programming Models and Runtimes for Heterogeneous Systems

Programming Models and Scheduling Techniques for Heterogeneous Architectures

Programming Models and Tools for Many-Core Platforms

Programming NVIDIA cards by means of transitive closure based parallelization algorithms

Programming of shared memory GPUs shared memory systems

Programming on Parallel Machines: GPU, Multicore, Clusters and More

Programming video cards for computational electromagnetics applications
Programming with Explicit Dependencies. A Framework for Portable Parallel Programming

Programming-Model Centric Debugging for Multicore Embedded Systems

Progressive Clustering of Big Data with GPU Acceleration and Visualization

Progressive High-Quality Response Surfaces for Visually Guided Sensitivity Analysis

Progressive Photon Mapping on GPUs

Progressive Semantic Segmentation

Projected tetrahedra revisited: a barycentric formulation applied to digital radiograph reconstruction using higher-order attenuation functions

Projectile Monte-Carlo Trajectory Analysis Using a Graphics Processing Unit

Projecting Tetrahedra with a Simplified Basis Graph
PROJECTION Algorithm for Motif Finding on GPUs

Promise of embedded system with GPU in artificial leg control: Enabling time-frequency feature extraction from electromyography

ProofWright: Towards Agentic Formal Verification of CUDA

Proposition for propagated occupation grids for non-rigid moving objects tracking

Prospects for scalable 3D FFTs on heterogeneous exascale systems

Prospects of GPGPU in the Auger Offline Software Framework

pROST : A Smoothed Lp-norm Robust Online Subspace Tracking Method for Realtime Background Subtraction in Video

PROST: Parallel robust online simple tracking

Protecting Real-Time GPU Applications on Integrated CPU-GPU SoC Platforms

Protein alignment algorithms with an efficient backtracking routine on multiple GPUs

Proteus: Efficient Resource Use in Heterogeneous Architectures

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

Prototyping methodology of image processing applications on heterogeneous parallel systems

Provably Efficient GPU Algorithms

Providing performance portable numerics for Intel GPUs

Providing Source Code Level Portability Between CPU and GPU with MapCG

PSCToolkit: solving sparse linear systems with a large number of GPUs

Pseudo Random Number Generators on Graphics Processing Units, with Applications in Finance

Pseudo-random number generation for Brownian Dynamics and Dissipative Particle Dynamics simulations on GPU devices

Pseudo-Random Number Generation on GP-GPU
Pseudo-random number generators for Monte Carlo simulations on ATI Graphics Processing Units
Pseudo-random number generators for Monte Carlo simulations on Graphics Processing Units

Pseudorandom number generation on the GPU

Pseudorandom Numbers Generation for Monte Carlo Simulations on GPUs: OpenCL Approach

Titles: 100
open PDFs: 87
packages: 20
