Papers on hgpu.org (.txt-file)
PRAND: GPU accelerated parallel random number generation library: Using most reliable algorithms and applying parallelism of modern GPUs and CPUs
Precise dynamic analysis for slack elasticity: adding buffering without adding bugs
Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads
Precision and Performance Analysis of C Standard Math Library Functions on GPUs
Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs
Precision-Aware Soft Error Protection for GPUs
Precomputed Atmospheric Scattering
Precomputed compressive sensing for light transport acquisition
Precomputed Visibility Cuts for Interactive Relighting with Dynamic BRDFs
Preconditioned conjugate gradient solver for structural problems
Predictable GPGPU Computing in DNN-Driven Autonomous Systems
Predicting GPUDirect Benefits for HPC Workloads
Predicting NVIDIA’s Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH Models
Predicting the Execution Time of a kernel on a specific GPU using PTX code
Prediction of Performance and Power Consumption of GPGPU Applications
Predictive Data Race Detection for GPUs
Predictive Lazy Amplification: Synthesis and Rendering of Massive Procedural Scenes in Real Time
Predictive Modeling and Analysis of OP2 on Distributed Memory GPU Clusters
Predictive Runtime Code Scheduling for Heterogeneous Architectures
Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels
Preliminary Experiences with the Uintah Framework on Intel Xeon Phi and Stampede
Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor
Preliminary implementation of two parallel programs for fractal image coding on GPUs
Preliminary implementation of VQ image coding using GPGPU
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC
Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680
Preparing Ginkgo for AMD GPUs – A Testimonial on Porting CUDA Code to HIP
Pretty Good Accuracy in Matrix Multiplication with GPUs
Pricing composable contracts on the GP-GPU
Pricing of cross-currency interest rate derivatives on Graphics Processing Units
Pricing the American Option Using Reconfigurable Hardware
Primal Dual Affine Scaling on GPUs
Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads
Principles for Automated and Reproducible Benchmarking
Principles towards Real-Time Simulation of Material Point Method on Modern GPUs
Principles, Techniques, and Tools for Explicit and Automatic Parallelization
Priority-Based Task Management in a GPGPU Megakernel
PRISM-PSY: Precise GPU-Accelerated Parameter Synthesis for Stochastic Systems
Prius: A Runtime for Hybrid Computing
Probabilistic View-based 3D Curve Skeleton Computation on the GPU
Probing biomolecular machines with graphics processors
Probing the Statistical Validity of the Ductile-to-Brittle Transition in Metallic Nanowires Using GPU Computing
Process Time Comparison between GPU and CPU
Processing Big Data in Main Memory and on GPU
Processing data streams with hard real-time constraints on heterogeneous systems
Processing Hard Sphere Collisions on a GPU Using OpenCL
Processing Large-scale XML Files on GPGPU Cluster
Processing Markov Logic Networks with GPUs
Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data
Processing Neocognitron of Face Recognition on High Performance Environment Based on GPU with CUDA Architecture
Processing of synthetic Aperture Radar data with GPGPU
Processing OLTP Workloads on Hybrid CPU/GPU Systems
Processing Posting Lists Using OpenCL
Processing XPath Structural Constraints on GPU
Production Floating Point Applications on FPGAs
Production Level CFD Code Acceleration for Hybrid Many-Core Architectures
Productive and Efficient Computational Science Through Domain-specific Abstractions
Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages
Productive Performance Engineering for Weather and Climate Modeling with Python
Productivity, Portability, Performance: Data-Centric Python
Professional CUDA C Programming
Profile Util library: A quick and easy way to get MPI, OpenMP and GPU runtime information
Profile-guided optimization of critical medical imaging algorithms
Profiling based Out-of-core Hybrid Method for Large Neural Networks
Profiling General Purpose GPU Applications
Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms
Profiling High Level Heterogeneous Programs: Using the SPOC GPGPU framework for OCaml
Profiling of Data-Parallel Processors
Program Acceleration in a Heterogeneous Computing Environment Using OpenCL, FPGA, and CPU
Program Analysis and Machine Learning based Approach to Predict Power Consumption of CUDA Kernel
Program optimization carving for GPU computing?
Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+
Program Optimization of Stencil Based Application on the GPU-Accelerated System
Program optimization space pruning for a multithreaded gpu
Program Optimization Strategies for Data-Parallel Many-Core Processors
Program Optimization Study on a 128-Core GPU
PROGRAML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations
ProGraML: Graph-based Deep Learning for Program Optimization and Analysis
Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems
Programmability: Design Costs and Payoffs using AMD GPU Streaming Languages and Traditional Multi-Core Libraries
Programmable and Scalable Architecture for Graphics Processing Units
Programmable shaders for deformation rendering
Programming Abstractions and Optimization Techniques for GPU-based Heterogeneous Systems
Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume Method
Programming and Scheduling Model for Supporting Heterogeneous Accelerators in Linux
Programming CUDA and OpenCL: A Case Study Using Modern C++ Libraries
Programming Dense Linear Algebra Kernels on Vectorized Architectures
Programming Embedded Manycore: Refinement and Optimizing Compilation of a Parallel Action Language for Hierarchical State Machines
Programming finite-difference time-domain for graphics processor units using compute unified device architecture
Programming for scientific computing on peta-scale heterogeneous parallel systems
Programming framework for clusters with heterogeneous accelerators
Programming Frameworks for Distributed Smartphone Computing
Programming Future Parallel Architectures with Haskell and Intel ArBB
Programming GPUs with C++14 and Just-In-Time Compilation
Programming Heterogeneous Systems from an Image Processing DSL
Programming Heterogeneous Systems with General and Domain-Specific Frameworks
Programming hybrid systems with implicit memory based synchronization
Titles: 100
open PDFs: 93
packages: 21