Papers on hgpu.org (.txt-file)
Performance of GPU for Pricing Financial Derivatives: Convertible Bonds

Performance of GTX Titan X GPUs and Code Optimization

Performance of Implicit Solver Strategies on GPUs

Performance of inverse atomistic scale fracture modeling on GPGPU architectures
Performance of Kepler GTX Titan GPUs and Xeon Phi System

Performance of Optical Flow Techniques on Graphics Hardware

Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

Performance Optimisations for Heterogeneous Managed Runtime Systems

Performance Optimization of 3-D Lattice Boltzmann Flow Solver on a GPU

Performance Optimization of Clustering On GPU

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU

Performance Optimization of GPU ELF-Codes
Performance Optimization of Memory Intensive Applications on FPGA Accelerator

Performance Optimization of Vision Apps on Mobile Application Processor

Performance Optimization using Multimodal Modeling and Heterogeneous GNN

Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs

Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores

Performance portability analysis of SYCL with a classical CG on CPU, GPU, and FPGA

Performance Portability and Evaluation of Heterogeneous Components of SeisSol Targeted to Upcoming Intel HPC GPUs

Performance Portability Challenges for Fortran Applications

Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler

Performance portability evaluation of blocked stencil computations on GPUs

Performance Portability in Accelerated Parallel Kernels

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework

Performance Portability of the Aeras Atmosphere Model to Next Generation Architectures using Kokkos

Performance Portability Strategies for Computational Fluid Dynamics (CFD) Applications on HPC Systems

Performance portability study of epistasis detection using SYCL on NVIDIA GPU

Performance Portability Study of Linear Algebra Kernels in OpenCL

Performance portability through machine learning guided kernel selection in SYCL libraries

Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study

Performance Portability with the Chapel Language

Performance Portable GPU Code Generation for Matrix Multiplication

Performance Portable Gradient Computations Using Source Transformation

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Performance potential for simulating spin models on GPU

Performance prediction of deep learning applications training in GPU as a service systems

Performance Predictions for General-Purpose Computation on GPUs
Performance study of filtered back-projection algorithms implemented on GPUs

Performance study of interference on GPU and CPU resources with multiple applications
Performance Study of LU Decomposition on the Programmable GPU

Performance study of mapping irregular computations on GPUs

Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA

Performance study of using the Direct Compute API for implementing Support vector machines on GPUs

Performance study on GPU offloading techniques using the Gauss matrix inverse algorithm

Performance Testing of GPU-Based Approximate Matching Algorithm on Network Traffic

Performance Tradeoff Spectrum of Integer and Floating Point Applications

Performance Tradeoff Spectrum of Integer and Floating Point Applications Kernels on Various GPUs

Performance Traps in OpenCL for CPUs

Performance Tuning for CUDA-Accelerated Neighborhood Denoising Filters

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies

Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs

Performance-Analysis-Based Acceleration of Image Quality Assessment

Performance-aware component composition for GPU-based systems

Performance-Correctness Challenges in Emerging Heterogeneous Multicore Processors

Performance-efficient mechanisms for managing irregularity in throughput processors

Performance-Oriented Neural Architecture Search

Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

Performance/power assessment of CNN packages on embedded automotive platforms

Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement

Performant low-order matrix-free finite element kernels on GPU architectures

Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision

Performing DCT8x8 Computation on GPU Using NVIDIA CUDA Technology

Performing efficient NURBS modeling operations on the GPU

PeriPy – A High Performance OpenCL Peridynamics Package

permGPU: Using graphics processing units in RNA microarray association studies

Permutation Index and GPU to Solve efficiently Many Queries

Persistent Kernels for Iterative Memory-bound GPU Applications

Persistent RNNs: Stashing Recurrent Weights On-Chip

Perturbation Functions in Computer Graphics

Petaflop biofluidics simulations on a two million-core system

Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems

Petascale computations for Large-scale Atomic and Molecular collisions

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures

Petascale elliptic solvers for anisotropic PDEs on GPU clusters

Petascale turbulence simulation using a highly parallel fast multipole method

Petascale visualization: Approaches and initial results

PFAC Library: GPU-based string matching algorithm

PFunc: modern task parallelism for modern high performance computing

PG-PuReMD: A Parallel-GPU Reactive Molecular Dynamics Package

PGEM: Preemptive GPGPU Execution Model for Runtime Engines

Pgx: Hardware-accelerated parallel game simulation for reinforcement learning

Phase Based Volume Registration on the GPU with Application to Quantitative MRI

Phase Based Volume Registration Using CUDA

Phase diagram and critical behavior of the square-lattice Ising model with competing nearest- and next-nearest-neighbor interactions

Phase Transition in 3d Heisenberg Spin Glasses with Strong Random Anisotropies, through a Multi-GPU Parallelization

phiGEMM: a CPU-GPU library for porting Quantum ESPRESSO on hybrid systems

Phoenix: A Runtime Environment for High Performance Computing on Chip Multiprocessors

Photon mapping on programmable graphics hardware

Physical and graphical effects in OpenCL by example
Physical modeling and high-performance GPU computing for characterization, interception, and disruption of hazardous near-Earth objects

Physically Based Rendering: Implementation of Path Tracer

Physically-Based Interactive Flow Visualization Based on Schlieren and Interferometry Experimental Techniques

Physically-based interactive schlieren flow visualization

Physically-based painting style 3D image synthesis using GPU
Physically-Based Sound Synthesis on GPUs

Physically-based visual simulation on graphics hardware

Titles: 100
open PDFs: 93
packages: 20
