Papers on hgpu.org (.txt-file)
Power analysis and optimizations for GPU architecture using a power simulator
Power analysis of sorting algorithms on FPGA using OpenCL

Power and Performance Analysis of GPU-Accelerated Systems

Power and Performance Characterization of Computational Kernels on the GPU

Power and Performance Studies of the Explicit Multi-Threading (XMT) Architecture

Power Consumption Modeling and Prediction in a Hybrid CPU-GPU-MIC Supercomputer

Power Consumption of GPUs from a Software Perspective

Power consumption of mixed precision in the iterative solution of sparse linear systems

Power Control for GPU Clusters in processing large-scale streams

Power Flow Analysis on CUDA-based GPU

Power Management and Optimization

Power Management for GPU-CPU Heterogeneous Systems

Power Management Techniques for Data Centers: A Survey

Power Modeling and Optimization for GPGPUs

Power Profiling and Optimization for Heterogeneous Multi-Core Systems

Power Profiling of GeMTC Many Task Computing

Power-aware Performance of Mixed Precision Linear Solvers for FPGAs and GPGPUs

Power-Efficient Accelerators for High-Performance Applications

Power-efficient medical image processing using PUMA

Power-Efficient Time-Sensitive Mapping in Heterogeneous Systems

Power-Efficient Work Distribution Method for CPU-GPU Heterogeneous System
Power-performance comparison of single-task driven many-cores

Power, Energy and Speed of Embedded and Server Multi-Cores applied to Distributed Simulation of Spiking Neural Networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores

PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusion

Practical Algorithms for Finding Extremal Sets

Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering

Practical and Theoretical Aspects of a Parallel Twig Join Algorithm for XML Processing using a GPGPU

Practical CFD Simulations on Programmable Graphics Hardware using SMAC

Practical considerations for GPU-accelerated CT

Practical craniofacial surgery simulator based on GPU accelerated lattice shape matching
Practical examples of GPU computing optimization principles
Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs

Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing

Practical logarithmic rasterization for low-error shadow maps

Practical parallel imaging compressed sensing MRI: Summary of two years of experience in accelerating body MRI of pediatric patients

Practical Patient-Specific Cardiac Blood Flow Simulations Using SPH

Practical Pre-stack Kirchhoff Time Migration of Seismic Processing on General Purpose GPU

Practical Random Linear Network Coding on GPUs

Practical Symbolic Execution Analysis and Methodology for GPU Programs

Practical Symbolic Race Checking of GPU Programs

Practical Symmetric Key Cryptography on Modern Graphics Hardware

Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture

Pragma Directed Shared Memory Centric Optimizations on GPUs

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

PRAND: GPU accelerated parallel random number generation library: Using most reliable algorithms and applying parallelism of modern GPUs and CPUs

Pre-Training LLMs on a budget: A comparison of three optimizers

Precise dynamic analysis for slack elasticity: adding buffering without adding bugs

Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads

Precision and Performance Analysis of C Standard Math Library Functions on GPUs

Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs

Precision-Aware Soft Error Protection for GPUs

Precomputed Atmospheric Scattering

Precomputed compressive sensing for light transport acquisition

Precomputed Visibility Cuts for Interactive Relighting with Dynamic BRDFs

Preconditioned conjugate gradient solver for structural problems

Predictable GPGPU Computing in DNN-Driven Autonomous Systems

Predicting GPUDirect Benefits for HPC Workloads

Predicting NVIDIA’s Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH Models

Predicting the Execution Time of a kernel on a specific GPU using PTX code

Prediction of Performance and Power Consumption of GPGPU Applications

Predictive Data Race Detection for GPUs

Predictive Lazy Amplification: Synthesis and Rendering of Massive Procedural Scenes in Real Time

Predictive Modeling and Analysis of OP2 on Distributed Memory GPU Clusters

Predictive Runtime Code Scheduling for Heterogeneous Architectures

Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels

Preliminary Experiences with the Uintah Framework on Intel Xeon Phi and Stampede

Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor

Preliminary implementation of two parallel programs for fractal image coding on GPUs
Preliminary implementation of VQ image coding using GPGPU
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC

Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680

Preparing Ginkgo for AMD GPUs – A Testimonial on Porting CUDA Code to HIP

Pretty Good Accuracy in Matrix Multiplication with GPUs

Pricing composable contracts on the GP-GPU

Pricing of cross-currency interest rate derivatives on Graphics Processing Units

Pricing the American Option Using Reconfigurable Hardware

Primal Dual Affine Scaling on GPUs

Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads

Principles for Automated and Reproducible Benchmarking

Principles towards Real-Time Simulation of Material Point Method on Modern GPUs

Principles, Techniques, and Tools for Explicit and Automatic Parallelization

Priority-Based Task Management in a GPGPU Megakernel

PRISM-PSY: Precise GPU-Accelerated Parameter Synthesis for Stochastic Systems

Prius: A Runtime for Hybrid Computing

Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs

Probabilistic View-based 3D Curve Skeleton Computation on the GPU

Probing biomolecular machines with graphics processors

Probing the Statistical Validity of the Ductile-to-Brittle Transition in Metallic Nanowires Using GPU Computing

Process Time Comparison between GPU and CPU

Processing Big Data in Main Memory and on GPU

Processing data streams with hard real-time constraints on heterogeneous systems

Processing Hard Sphere Collisions on a GPU Using OpenCL

Processing Large-scale XML Files on GPGPU Cluster

Processing Markov Logic Networks with GPUs

Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data

Processing Neocognitron of Face Recognition on High Performance Environment Based on GPU with CUDA Architecture

Titles: 100
open PDFs: 91
packages: 21
