Papers on hgpu.org (.txt-file)
Power Management and Optimization
Power Management for GPU-CPU Heterogeneous Systems
Power Management Techniques for Data Centers: A Survey
Power Modeling and Optimization for GPGPUs
Power Profiling and Optimization for Heterogeneous Multi-Core Systems
Power Profiling of GeMTC Many Task Computing
Power-aware Performance of Mixed Precision Linear Solvers for FPGAs and GPGPUs
Power-Efficient Accelerators for High-Performance Applications
Power-efficient medical image processing using PUMA
Power-Efficient Time-Sensitive Mapping in Heterogeneous Systems
Power-Efficient Work Distribution Method for CPU-GPU Heterogeneous System
Power-performance comparison of single-task driven many-cores
Power, Energy and Speed of Embedded and Server Multi-Cores applied to Distributed Simulation of Spiking Neural Networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores
PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusion
Practical Algorithms for Finding Extremal Sets
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Theoretical Aspects of a Parallel Twig Join Algorithm for XML Processing using a GPGPU
Practical CFD Simulations on Programmable Graphics Hardware using SMAC
Practical considerations for GPU-accelerated CT
Practical craniofacial surgery simulator based on GPU accelerated lattice shape matching
Practical examples of GPU computing optimization principles
Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing
Practical logarithmic rasterization for low-error shadow maps
Practical parallel imaging compressed sensing MRI: Summary of two years of experience in accelerating body MRI of pediatric patients
Practical Patient-Specific Cardiac Blood Flow Simulations Using SPH
Practical Pre-stack Kirchhoff Time Migration of Seismic Processing on General Purpose GPU
Practical Random Linear Network Coding on GPUs
Practical Symbolic Execution Analysis and Methodology for GPU Programs
Practical Symbolic Race Checking of GPU Programs
Practical Symmetric Key Cryptography on Modern Graphics Hardware
Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture
Pragma Directed Shared Memory Centric Optimizations on GPUs
PRAND: GPU accelerated parallel random number generation library: Using most reliable algorithms and applying parallelism of modern GPUs and CPUs
Pre-Training LLMs on a budget: A comparison of three optimizers
Precise dynamic analysis for slack elasticity: adding buffering without adding bugs
Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads
Precision and Performance Analysis of C Standard Math Library Functions on GPUs
Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs
Precision-Aware Soft Error Protection for GPUs
Precomputed Atmospheric Scattering
Precomputed compressive sensing for light transport acquisition
Precomputed Visibility Cuts for Interactive Relighting with Dynamic BRDFs
Preconditioned conjugate gradient solver for structural problems
Predictable GPGPU Computing in DNN-Driven Autonomous Systems
Predicting GPUDirect Benefits for HPC Workloads
Predicting NVIDIA’s Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH Models
Predicting the Execution Time of a kernel on a specific GPU using PTX code
Prediction of Performance and Power Consumption of GPGPU Applications
Predictive Data Race Detection for GPUs
Predictive Lazy Amplification: Synthesis and Rendering of Massive Procedural Scenes in Real Time
Predictive Modeling and Analysis of OP2 on Distributed Memory GPU Clusters
Predictive Runtime Code Scheduling for Heterogeneous Architectures
Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels
Preliminary Experiences with the Uintah Framework on Intel Xeon Phi and Stampede
Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor
Preliminary implementation of two parallel programs for fractal image coding on GPUs
Preliminary implementation of VQ image coding using GPGPU
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC
Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680
Preparing Ginkgo for AMD GPUs – A Testimonial on Porting CUDA Code to HIP
Pretty Good Accuracy in Matrix Multiplication with GPUs
Pricing composable contracts on the GP-GPU
Pricing of cross-currency interest rate derivatives on Graphics Processing Units
Pricing the American Option Using Reconfigurable Hardware
Primal Dual Affine Scaling on GPUs
Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads
Principles for Automated and Reproducible Benchmarking
Principles towards Real-Time Simulation of Material Point Method on Modern GPUs
Principles, Techniques, and Tools for Explicit and Automatic Parallelization
Priority-Based Task Management in a GPGPU Megakernel
PRISM-PSY: Precise GPU-Accelerated Parameter Synthesis for Stochastic Systems
Prius: A Runtime for Hybrid Computing
Probabilistic View-based 3D Curve Skeleton Computation on the GPU
Probing biomolecular machines with graphics processors
Probing the Statistical Validity of the Ductile-to-Brittle Transition in Metallic Nanowires Using GPU Computing
Process Time Comparison between GPU and CPU
Processing Big Data in Main Memory and on GPU
Processing data streams with hard real-time constraints on heterogeneous systems
Processing Hard Sphere Collisions on a GPU Using OpenCL
Processing Large-scale XML Files on GPGPU Cluster
Processing Markov Logic Networks with GPUs
Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data
Processing Neocognitron of Face Recognition on High Performance Environment Based on GPU with CUDA Architecture
Processing of synthetic Aperture Radar data with GPGPU
Processing OLTP Workloads on Hybrid CPU/GPU Systems
Processing Posting Lists Using OpenCL
Processing XPath Structural Constraints on GPU
Production Floating Point Applications on FPGAs
Production Level CFD Code Acceleration for Hybrid Many-Core Architectures
Productive and Efficient Computational Science Through Domain-specific Abstractions
Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages
Productive Performance Engineering for Weather and Climate Modeling with Python
Productivity, Portability, Performance: Data-Centric Python
Professional CUDA C Programming
Profile Util library: A quick and easy way to get MPI, OpenMP and GPU runtime information
Profile-guided optimization of critical medical imaging algorithms
Profiling Apple Silicon Performance for ML Training
Titles: 100
open PDFs: 92
packages: 17