Papers on hgpu.org (.txt-file)
Predicting NVIDIA’s Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH Models

Predicting the Execution Time of a kernel on a specific GPU using PTX code

Prediction of Performance and Power Consumption of GPGPU Applications

Predictive Data Race Detection for GPUs

Predictive Lazy Amplification: Synthesis and Rendering of Massive Procedural Scenes in Real Time

Predictive Modeling and Analysis of OP2 on Distributed Memory GPU Clusters

Predictive Runtime Code Scheduling for Heterogeneous Architectures

Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels

Preliminary Experiences with the Uintah Framework on Intel Xeon Phi and Stampede

Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor

Preliminary implementation of two parallel programs for fractal image coding on GPUs
Preliminary implementation of VQ image coding using GPGPU
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC

Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680

Preparing Ginkgo for AMD GPUs – A Testimonial on Porting CUDA Code to HIP

Pretty Good Accuracy in Matrix Multiplication with GPUs

Pricing composable contracts on the GP-GPU

Pricing of cross-currency interest rate derivatives on Graphics Processing Units

Pricing the American Option Using Reconfigurable Hardware

Primal Dual Affine Scaling on GPUs

Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads

Principles for Automated and Reproducible Benchmarking

Principles towards Real-Time Simulation of Material Point Method on Modern GPUs

Principles, Techniques, and Tools for Explicit and Automatic Parallelization

Priority-Based Task Management in a GPGPU Megakernel

PRISM-PSY: Precise GPU-Accelerated Parameter Synthesis for Stochastic Systems

Prius: A Runtime for Hybrid Computing

Probabilistic View-based 3D Curve Skeleton Computation on the GPU

Probing biomolecular machines with graphics processors

Probing the Statistical Validity of the Ductile-to-Brittle Transition in Metallic Nanowires Using GPU Computing

Process Time Comparison between GPU and CPU

Processing Big Data in Main Memory and on GPU

Processing data streams with hard real-time constraints on heterogeneous systems

Processing Hard Sphere Collisions on a GPU Using OpenCL

Processing Large-scale XML Files on GPGPU Cluster

Processing Markov Logic Networks with GPUs

Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data

Processing Neocognitron of Face Recognition on High Performance Environment Based on GPU with CUDA Architecture

Processing of synthetic Aperture Radar data with GPGPU
Processing OLTP Workloads on Hybrid CPU/GPU Systems

Processing Posting Lists Using OpenCL

Processing XPath Structural Constraints on GPU

Production Floating Point Applications on FPGAs

Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

Productive and Efficient Computational Science Through Domain-specific Abstractions

Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages

Productive Performance Engineering for Weather and Climate Modeling with Python

Productivity, Portability, Performance: Data-Centric Python

Professional CUDA C Programming

Profile Util library: A quick and easy way to get MPI, OpenMP and GPU runtime information

Profile-guided optimization of critical medical imaging algorithms

Profiling Apple Silicon Performance for ML Training

Profiling based Out-of-core Hybrid Method for Large Neural Networks

Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson – Extended

Profiling General Purpose GPU Applications

Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms

Profiling High Level Heterogeneous Programs: Using the SPOC GPGPU framework for OCaml

Profiling of Data-Parallel Processors

Program Acceleration in a Heterogeneous Computing Environment Using OpenCL, FPGA, and CPU

Program Analysis and Machine Learning based Approach to Predict Power Consumption of CUDA Kernel

Program optimization carving for GPU computing?
Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+
Program Optimization of Stencil Based Application on the GPU-Accelerated System
Program optimization space pruning for a multithreaded gpu

Program Optimization Strategies for Data-Parallel Many-Core Processors

Program Optimization Study on a 128-Core GPU

PROGRAML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems

Programmability: Design Costs and Payoffs using AMD GPU Streaming Languages and Traditional Multi-Core Libraries

Programmable and Scalable Architecture for Graphics Processing Units

Programmable shaders for deformation rendering

Programming Abstractions and Optimization Techniques for GPU-based Heterogeneous Systems

Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume Method

Programming and Scheduling Model for Supporting Heterogeneous Accelerators in Linux

Programming CUDA and OpenCL: A Case Study Using Modern C++ Libraries

Programming Dense Linear Algebra Kernels on Vectorized Architectures

Programming Embedded Manycore: Refinement and Optimizing Compilation of a Parallel Action Language for Hierarchical State Machines

Programming finite-difference time-domain for graphics processor units using compute unified device architecture

Programming for scientific computing on peta-scale heterogeneous parallel systems

Programming framework for clusters with heterogeneous accelerators

Programming Frameworks for Distributed Smartphone Computing

Programming Future Parallel Architectures with Haskell and Intel ArBB

Programming GPUs with C++14 and Just-In-Time Compilation

Programming Heterogeneous Systems from an Image Processing DSL

Programming Heterogeneous Systems with General and Domain-Specific Frameworks

Programming hybrid systems with implicit memory based synchronization

Programming in CUDA for Kepler and Maxwell Architecture

Programming issues for video analysis on Graphics Processing Units

Programming Massively Parallel Architectures using MARTE: a Case Study

Programming massively parallel processors : A Hands – on approach
Programming Massively Parallel Processors with CUDA (audio course)

Programming model for a heterogeneous x86 platform
Programming Models and Runtimes for Heterogeneous Systems

Programming Models and Scheduling Techniques for Heterogeneous Architectures

Programming Models and Tools for Many-Core Platforms

Titles: 100
open PDFs: 91
packages: 19
