Papers on hgpu.org (.txt-file)
Performance Optimization of 3-D Lattice Boltzmann Flow Solver on a GPU
Performance Optimization of Clustering On GPU
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU
Performance Optimization of GPU ELF-Codes
Performance Optimization of Memory Intensive Applications on FPGA Accelerator
Performance Optimization of Vision Apps on Mobile Application Processor
Performance Optimization using Multimodal Modeling and Heterogeneous GNN
Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs
Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores
Performance portability analysis of SYCL with a classical CG on CPU, GPU, and FPGA
Performance Portability and Evaluation of Heterogeneous Components of SeisSol Targeted to Upcoming Intel HPC GPUs
Performance Portability Challenges for Fortran Applications
Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler
Performance portability evaluation of blocked stencil computations on GPUs
Performance Portability in Accelerated Parallel Kernels
Performance Portability of a GPU Enabled Factorization with the DAGuE Framework
Performance Portability of the Aeras Atmosphere Model to Next Generation Architectures using Kokkos
Performance Portability Strategies for Computational Fluid Dynamics (CFD) Applications on HPC Systems
Performance portability study of epistasis detection using SYCL on NVIDIA GPU
Performance Portability Study of Linear Algebra Kernels in OpenCL
Performance portability through machine learning guided kernel selection in SYCL libraries
Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study
Performance Portability with the Chapel Language
Performance Portable GPU Code Generation for Matrix Multiplication
Performance Portable Gradient Computations Using Source Transformation
Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs
Performance potential for simulating spin models on GPU
Performance prediction of deep learning applications training in GPU as a service systems
Performance Predictions for General-Purpose Computation on GPUs
Performance study of filtered back-projection algorithms implemented on GPUs
Performance study of interference on GPU and CPU resources with multiple applications
Performance Study of LU Decomposition on the Programmable GPU
Performance study of mapping irregular computations on GPUs
Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA
Performance study of using the Direct Compute API for implementing Support vector machines on GPUs
Performance study on GPU offloading techniques using the Gauss matrix inverse algorithm
Performance Testing of GPU-Based Approximate Matching Algorithm on Network Traffic
Performance Tradeoff Spectrum of Integer and Floating Point Applications
Performance Tradeoff Spectrum of Integer and Floating Point Applications Kernels on Various GPUs
Performance Traps in OpenCL for CPUs
Performance Tuning for CUDA-Accelerated Neighborhood Denoising Filters
Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies
Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs
Performance-Analysis-Based Acceleration of Image Quality Assessment
Performance-aware component composition for GPU-based systems
Performance-Correctness Challenges in Emerging Heterogeneous Multicore Processors
Performance-efficient mechanisms for managing irregularity in throughput processors
Performance-Oriented Neural Architecture Search
Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond
Performance/power assessment of CNN packages on embedded automotive platforms
Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement
Performant low-order matrix-free finite element kernels on GPU architectures
Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision
Performing DCT8x8 Computation on GPU Using NVIDIA CUDA Technology
Performing efficient NURBS modeling operations on the GPU
PeriPy – A High Performance OpenCL Peridynamics Package
permGPU: Using graphics processing units in RNA microarray association studies
Permutation Index and GPU to Solve efficiently Many Queries
Persistent Kernels for Iterative Memory-bound GPU Applications
Persistent RNNs: Stashing Recurrent Weights On-Chip
Perturbation Functions in Computer Graphics
Petaflop biofluidics simulations on a two million-core system
Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems
Petascale computations for Large-scale Atomic and Molecular collisions
Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures
Petascale elliptic solvers for anisotropic PDEs on GPU clusters
Petascale turbulence simulation using a highly parallel fast multipole method
Petascale visualization: Approaches and initial results
PFAC Library: GPU-based string matching algorithm
PFunc: modern task parallelism for modern high performance computing
PG-PuReMD: A Parallel-GPU Reactive Molecular Dynamics Package
PGEM: Preemptive GPGPU Execution Model for Runtime Engines
Pgx: Hardware-accelerated parallel game simulation for reinforcement learning
Phase Based Volume Registration on the GPU with Application to Quantitative MRI
Phase Based Volume Registration Using CUDA
Phase diagram and critical behavior of the square-lattice Ising model with competing nearest- and next-nearest-neighbor interactions
Phase Transition in 3d Heisenberg Spin Glasses with Strong Random Anisotropies, through a Multi-GPU Parallelization
phiGEMM: a CPU-GPU library for porting Quantum ESPRESSO on hybrid systems
Phoenix: A Runtime Environment for High Performance Computing on Chip Multiprocessors
Photon mapping on programmable graphics hardware
Physical and graphical effects in OpenCL by example
Physical modeling and high-performance GPU computing for characterization, interception, and disruption of hazardous near-Earth objects
Physically Based Rendering: Implementation of Path Tracer
Physically-Based Interactive Flow Visualization Based on Schlieren and Interferometry Experimental Techniques
Physically-based interactive schlieren flow visualization
Physically-based painting style 3D image synthesis using GPU
Physically-Based Sound Synthesis on GPUs
Physically-based visual simulation on graphics hardware
Physics and Computing Performance of the Exa.TrkX TrackML Pipeline
Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers
Piccolo: building fast, distributed programs with partitioned tables
PIConGPU: A Fully Relativistic Particle-in-Cell Code for a GPU Cluster
PIConGPU: Predictive Simulations of Laser-Particle Accelerators with Manycore Hardware
Piecewise Tri-linear Contouring for Multi-material Volumes
PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks
Piko: A Design Framework for Programmable Graphics Pipelines
PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural Framework
PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks
Titles: 100
open PDFs: 94
packages: 24