Papers on hgpu.org (.txt-file)
Practical parallel imaging compressed sensing MRI: Summary of two years of experience in accelerating body MRI of pediatric patients

Practical Patient-Specific Cardiac Blood Flow Simulations Using SPH

Practical Pre-stack Kirchhoff Time Migration of Seismic Processing on General Purpose GPU

Practical Random Linear Network Coding on GPUs

Practical Symbolic Execution Analysis and Methodology for GPU Programs

Practical Symbolic Race Checking of GPU Programs

Practical Symmetric Key Cryptography on Modern Graphics Hardware

Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture

Pragma Directed Shared Memory Centric Optimizations on GPUs

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

PRAND: GPU accelerated parallel random number generation library: Using most reliable algorithms and applying parallelism of modern GPUs and CPUs

Pre-Training LLMs on a budget: A comparison of three optimizers

Precise dynamic analysis for slack elasticity: adding buffering without adding bugs

Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads

Precision and Performance Analysis of C Standard Math Library Functions on GPUs

Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs

Precision-Aware Soft Error Protection for GPUs

Precomputed Atmospheric Scattering

Precomputed compressive sensing for light transport acquisition

Precomputed Visibility Cuts for Interactive Relighting with Dynamic BRDFs

Preconditioned conjugate gradient solver for structural problems

Predictable GPGPU Computing in DNN-Driven Autonomous Systems

Predicting GPUDirect Benefits for HPC Workloads

Predicting NVIDIA’s Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH Models

Predicting the Execution Time of a kernel on a specific GPU using PTX code

Prediction of Performance and Power Consumption of GPGPU Applications

Predictive Data Race Detection for GPUs

Predictive Lazy Amplification: Synthesis and Rendering of Massive Procedural Scenes in Real Time

Predictive Modeling and Analysis of OP2 on Distributed Memory GPU Clusters

Predictive Runtime Code Scheduling for Heterogeneous Architectures

Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels

Preliminary Experiences with the Uintah Framework on Intel Xeon Phi and Stampede

Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor

Preliminary implementation of two parallel programs for fractal image coding on GPUs
Preliminary implementation of VQ image coding using GPGPU
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC

Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680

Preparing Ginkgo for AMD GPUs – A Testimonial on Porting CUDA Code to HIP

Pretty Good Accuracy in Matrix Multiplication with GPUs

Pricing composable contracts on the GP-GPU

Pricing of cross-currency interest rate derivatives on Graphics Processing Units

Pricing the American Option Using Reconfigurable Hardware

Primal Dual Affine Scaling on GPUs

Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads

Principles for Automated and Reproducible Benchmarking

Principles towards Real-Time Simulation of Material Point Method on Modern GPUs

Principles, Techniques, and Tools for Explicit and Automatic Parallelization

Priority-Based Task Management in a GPGPU Megakernel

PRISM-PSY: Precise GPU-Accelerated Parameter Synthesis for Stochastic Systems

Prius: A Runtime for Hybrid Computing

Probabilistic View-based 3D Curve Skeleton Computation on the GPU

Probing biomolecular machines with graphics processors

Probing the Statistical Validity of the Ductile-to-Brittle Transition in Metallic Nanowires Using GPU Computing

Process Time Comparison between GPU and CPU

Processing Big Data in Main Memory and on GPU

Processing data streams with hard real-time constraints on heterogeneous systems

Processing Hard Sphere Collisions on a GPU Using OpenCL

Processing Large-scale XML Files on GPGPU Cluster

Processing Markov Logic Networks with GPUs

Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data

Processing Neocognitron of Face Recognition on High Performance Environment Based on GPU with CUDA Architecture

Processing of synthetic Aperture Radar data with GPGPU
Processing OLTP Workloads on Hybrid CPU/GPU Systems

Processing Posting Lists Using OpenCL

Processing XPath Structural Constraints on GPU

Production Floating Point Applications on FPGAs

Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

Productive and Efficient Computational Science Through Domain-specific Abstractions

Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages

Productive Performance Engineering for Weather and Climate Modeling with Python

Productivity, Portability, Performance: Data-Centric Python

Professional CUDA C Programming

Profile Util library: A quick and easy way to get MPI, OpenMP and GPU runtime information

Profile-guided optimization of critical medical imaging algorithms

Profiling Apple Silicon Performance for ML Training

Profiling based Out-of-core Hybrid Method for Large Neural Networks

Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson – Extended

Profiling General Purpose GPU Applications

Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms

Profiling High Level Heterogeneous Programs: Using the SPOC GPGPU framework for OCaml

Profiling of Data-Parallel Processors

Program Acceleration in a Heterogeneous Computing Environment Using OpenCL, FPGA, and CPU

Program Analysis and Machine Learning based Approach to Predict Power Consumption of CUDA Kernel

Program optimization carving for GPU computing?
Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+
Program Optimization of Stencil Based Application on the GPU-Accelerated System
Program optimization space pruning for a multithreaded gpu

Program Optimization Strategies for Data-Parallel Many-Core Processors

Program Optimization Study on a 128-Core GPU

PROGRAML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems

Programmability: Design Costs and Payoffs using AMD GPU Streaming Languages and Traditional Multi-Core Libraries

Programmable and Scalable Architecture for Graphics Processing Units

Programmable shaders for deformation rendering

Programming Abstractions and Optimization Techniques for GPU-based Heterogeneous Systems

Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume Method

Programming and Scheduling Model for Supporting Heterogeneous Accelerators in Linux

Titles: 100
open PDFs: 93
packages: 21
