Papers on hgpu.org (.txt-file)
Point to point processing of digital images using parallel computing
Point-wise Adaptive Filtering for Fast Monte Carlo Noise Reduction
Pointer Analysis for Semi-Automatic Code Parallelizers
Poisson-Boltzmann model for protein-surface electrostatic interactions and grid-convergence study using the PyGBe code
Policy-based Tuning for Performance Portability and Library Co-optimization
Polly – Polyhedral optimization in LLVM
Polly-ACC: Transparent compilation to heterogeneous hardware
Polyconvexification of the multi-label optical flow problem
Polymer Field-Theory Simulations on Graphics Processing Units
POMPEI: Programming with OpenMP4 for Exascale Investigations
PONDER – A Real time software backend for pulsar and IPS observations at the Ooty Radio Telescope
PopSparse: Accelerated block sparse matrix multiplication on IPU
Population Parallel GP on the G80 GPU
Porous Rock Simulations and Lattice Boltzmann on GPUs
Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators
Portability of Fortran’s ‘do concurrent’ on GPUs
Portable and Performant GPU/Heterogeneous Asynchronous Many-Task Runtime System
Portable and Transparent Software Managed Scheduling on Accelerators for Fair Resource Sharing
Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL)
Portable GPU-Based Artificial Neural Networks for Accelerated Data-Driven Modeling
Portable high-order finite element kernels I: Streaming Operations
Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi
Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems
Portable OpenCL Out-of-Order Execution Framework for Heterogeneous Platforms
Portable Parallel Kernels for High-Speed Beamforming in Synthetic Aperture Ultrasound Imaging
Portable parallelized blowfish via RenderScript
Portable Performance on Heterogeneous Architectures
Portable Programming Models for Heterogeneous Platforms
Portable Real-Time DCT Based Steganography Using OpenCL
Portable, high-performance containers for HPC
Portable, Scalable Approaches for Improving Asynchronous Many-Task Runtime Node Use
Portage: Bringing Hackers’ Wisdom to Science
Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA
Porting a sparse linear algebra math library to Intel GPUs
Porting and optimizing MAGFLOW on CUDA
Porting Batched Iterative Solvers onto Intel GPUs with SYCL
Porting estimation of distribution algorithms to the cell broadband engine
Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned
Porting HPC Applications to AMD Instinct MI300A Using Unified Memory and OpenMP
Porting Large HPC Applications to GPU Clusters: The Codes GENE and VERTEX
Porting marine ecosystem model spin-up using transport matrices to GPUs
Porting numerical integration codes from CUDA to oneAPI: a case study
Porting of an Edge-Based CFD Solver to GPUs
Porting OpenACC to OpenMP on heterogeneous systems
Porting to the Intel Xeon Phi: Opportunities and Challenges
Porting tree-based hash table compression to GPGPU model checking
Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines
Position-Dependent Arrays and Their Application for High Performance Code Generation
Possible planet-forming regions on submillimetre images
Poster: CUDA-Accelerated Continuous 2D Scatterplots
Poster: GPU-accelerated artificial neural network for QSAR modeling
Poster: GPU-accelerated rigid body fitting of atomic structures into electron density maps
Potential contribution of CNN-based solving of stiff ODEs and PDEs to enabling real-time Computational Engineering
Potential Energy Landscapes for the 2D XY Model: Minima, Transition States and Pathways
Potential of General Purpose Graphic Processing Unit for Energy Management System
Power analysis and optimizations for GPU architecture using a power simulator
Power analysis of sorting algorithms on FPGA using OpenCL
Power and Performance Analysis of GPU-Accelerated Systems
Power and Performance Characterization of Computational Kernels on the GPU
Power and Performance Studies of the Explicit Multi-Threading (XMT) Architecture
Power Consumption Modeling and Prediction in a Hybrid CPU-GPU-MIC Supercomputer
Power Consumption of GPUs from a Software Perspective
Power consumption of mixed precision in the iterative solution of sparse linear systems
Power Control for GPU Clusters in processing large-scale streams
Power Flow Analysis on CUDA-based GPU
Power Management and Optimization
Power Management for GPU-CPU Heterogeneous Systems
Power Management Techniques for Data Centers: A Survey
Power Modeling and Optimization for GPGPUs
Power Profiling and Optimization for Heterogeneous Multi-Core Systems
Power Profiling of GeMTC Many Task Computing
Power-aware Performance of Mixed Precision Linear Solvers for FPGAs and GPGPUs
Power-Efficient Accelerators for High-Performance Applications
Power-efficient medical image processing using PUMA
Power-Efficient Time-Sensitive Mapping in Heterogeneous Systems
Power-Efficient Work Distribution Method for CPU-GPU Heterogeneous System
Power-performance comparison of single-task driven many-cores
Power, Energy and Speed of Embedded and Server Multi-Cores applied to Distributed Simulation of Spiking Neural Networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores
PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusion
Practical Algorithms for Finding Extremal Sets
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Theoretical Aspects of a Parallel Twig Join Algorithm for XML Processing using a GPGPU
Practical CFD Simulations on Programmable Graphics Hardware using SMAC
Practical considerations for GPU-accelerated CT
Practical craniofacial surgery simulator based on GPU accelerated lattice shape matching
Practical examples of GPU computing optimization principles
Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing
Practical logarithmic rasterization for low-error shadow maps
Practical parallel imaging compressed sensing MRI: Summary of two years of experience in accelerating body MRI of pediatric patients
Practical Patient-Specific Cardiac Blood Flow Simulations Using SPH
Practical Pre-stack Kirchhoff Time Migration of Seismic Processing on General Purpose GPU
Practical Random Linear Network Coding on GPUs
Practical Symbolic Execution Analysis and Methodology for GPU Programs
Practical Symbolic Race Checking of GPU Programs
Practical Symmetric Key Cryptography on Modern Graphics Hardware
Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture
Pragma Directed Shared Memory Centric Optimizations on GPUs
Titles: 100
open PDFs: 91
packages: 24