Papers on hgpu.org (.txt-file)
Performance Evaluation of Quicksort with GPU Dynamic Parallelism for Gene-Expression Quantile Normalization
Performance Evaluation of R with Intel Xeon Phi Coprocessor
Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi
Performance Evaluation of the Intel Many Integrated Core Architecture for 3D Image Reconstruction in Computed Tomography
Performance evaluation of the multi-device OpenCL FDTD solver
Performance Evaluation of the NVIDIA GeForce 8800 GTX GPU for Machine Learning
Performance Evaluation of the Ocean-Land-Atmosphere Model Using Graphics Processing Units
Performance Evaluations of Document-Oriented Databases using GPU and Cache Structure
Performance Evaluations of Graph Database using CUDA and OpenMP-Compatible Libraries
Performance Exploration of Selected Manually and Automatically Parallelized Codes on GPUs
Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors
Performance Impact of Data Layout on the GPU-accelerated IDW Interpolation
Performance impact of dynamic parallelism on different clustering algorithms
Performance Impact of Memory Channels on Sparse and Irregular Algorithms
Performance Improvement of Data Mining in Weka through GPU Acceleration
Performance Improvement of Multichannel Audio by Graphics Processing Units
Performance Improvement of Optical Algorithms on Multicore Platforms
Performance Improvement of TOUGH2 Simulation with Graphics Processing Unit
Performance improvements for iterative electron tomography reconstruction using graphics processing units (GPUs)
Performance improvements of real-time crowd simulations
Performance in GPU Architectures: Potentials and Distances
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
Performance modeling of atomic additions on GPU scratchpad memory
Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures
Performance Modelling and Traffic Characterisation of Optical Networks
Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures
Performance models for CPU-GPU data transfers
Performance models for CUDA streams on NVIDIA GeForce series
Performance Models for Heterogeneous Iterative Programs
Performance Monitoring of Multi-FPGA Systems
Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives
Performance of a GPU-based Direct Summation Algorithm for Computation of Small Angle Scattering Profile
Performance of CPU and GPU HPC Architectures for off-design aircraft simulation
Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms
Performance of GPU for Pricing Financial Derivatives: Convertible Bonds
Performance of GTX Titan X GPUs and Code Optimization
Performance of Implicit Solver Strategies on GPUs
Performance of inverse atomistic scale fracture modeling on GPGPU architectures
Performance of Kepler GTX Titan GPUs and Xeon Phi System
Performance of Optical Flow Techniques on Graphics Hardware
Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes
Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures
Performance Optimisations for Heterogeneous Managed Runtime Systems
Performance Optimization of 3-D Lattice Boltzmann Flow Solver on a GPU
Performance Optimization of Clustering On GPU
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU
Performance Optimization of GPU ELF-Codes
Performance Optimization of Memory Intensive Applications on FPGA Accelerator
Performance Optimization of Vision Apps on Mobile Application Processor
Performance Optimization using Multimodal Modeling and Heterogeneous GNN
Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs
Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores
Performance portability analysis of SYCL with a classical CG on CPU, GPU, and FPGA
Performance Portability and Evaluation of Heterogeneous Components of SeisSol Targeted to Upcoming Intel HPC GPUs
Performance Portability Challenges for Fortran Applications
Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler
Performance portability evaluation of blocked stencil computations on GPUs
Performance Portability in Accelerated Parallel Kernels
Performance Portability of a GPU Enabled Factorization with the DAGuE Framework
Performance Portability of the Aeras Atmosphere Model to Next Generation Architectures using Kokkos
Performance Portability Strategies for Computational Fluid Dynamics (CFD) Applications on HPC Systems
Performance portability study of epistasis detection using SYCL on NVIDIA GPU
Performance Portability Study of Linear Algebra Kernels in OpenCL
Performance portability through machine learning guided kernel selection in SYCL libraries
Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study
Performance Portability with the Chapel Language
Performance Portable GPU Code Generation for Matrix Multiplication
Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs
Performance potential for simulating spin models on GPU
Performance prediction of deep learning applications training in GPU as a service systems
Performance Predictions for General-Purpose Computation on GPUs
Performance study of filtered back-projection algorithms implemented on GPUs
Performance study of interference on GPU and CPU resources with multiple applications
Performance Study of LU Decomposition on the Programmable GPU
Performance study of mapping irregular computations on GPUs
Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA
Performance study of using the Direct Compute API for implementing Support vector machines on GPUs
Performance study on GPU offloading techniques using the Gauss matrix inverse algorithm
Performance Testing of GPU-Based Approximate Matching Algorithm on Network Traffic
Performance Tradeoff Spectrum of Integer and Floating Point Applications
Performance Tradeoff Spectrum of Integer and Floating Point Applications Kernels on Various GPUs
Performance Traps in OpenCL for CPUs
Performance Tuning for CUDA-Accelerated Neighborhood Denoising Filters
Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies
Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs
Performance-Analysis-Based Acceleration of Image Quality Assessment
Performance-aware component composition for GPU-based systems
Performance-Correctness Challenges in Emerging Heterogeneous Multicore Processors
Performance-efficient mechanisms for managing irregularity in throughput processors
Performance-Oriented Neural Architecture Search
Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond
Performance/power assessment of CNN packages on embedded automotive platforms
Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement
Performant low-order matrix-free finite element kernels on GPU architectures
Performing DCT8x8 Computation on GPU Using NVIDIA CUDA Technology
Performing efficient NURBS modeling operations on the GPU
PeriPy – A High Performance OpenCL Peridynamics Package
Titles: 100
open PDFs: 94
packages: 18