Papers on hgpu.org (.txt-file)
Performance assessment of CUDA and OpenACC in large scale combustion simulations

Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Performance Assessment of using OpenCL on FPGA Systems for ODE Solvers

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Performance benchmarking of deep learning framework on Intel Xeon Phi

Performance Characterization and Optimization of Atomic Operations on AMD GPUs

Performance characterization of data-intensive kernels on AMD Fusion architectures

Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

Performance Comparison Between Cg-based and CUDA-based Matrix Multiplications

Performance Comparison for Neuroscience Application Benchmarks

Performance comparison of CFD-DEM solver MFiX-Exa, on GPUs and CPUs

Performance Comparison of Cholesky Decomposition on GPUs and FPGAs

Performance Comparison of Different OpenCL Implementations of LBM Simulation on Commodity Computer Hardware

Performance comparison of FPGA, GPU and CPU in image processing
Performance comparison of gauss-Jordan elimination method using OpenMP and CUDA

Performance comparison of GPU and FPGA architectures for the SVM training problem

Performance Comparison of GPU, DSP and FPGA implementations of image processing and computer vision algorithms in embedded systems

Performance Comparison of GPUs with a Genetic Algorithm based on CUDA

Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study

Performance comparison of Lattice Boltzmann fluid flow simulation using OpenCL and CUDA frameworks

Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors

Performance Comparison with OpenMP Parallelization for Multi-core Systems
Performance Considerations When Using a Dedicated Ray Traversal Engine

Performance Counters based Power Modeling of Mobile GPUs using Deep Learning

Performance Debugging Frameworks for FPGA High-Level Synthesis

Performance Debugging of GPGPU Applications with the Divergence Map
Performance Degradation Analysis of GPU Kernels

Performance Drawbacks for Matrix Multiplication using Set Associative Cache in GPU devices

Performance Efficient DNA Sequence Detection on GPU Using Parallel Pattern Matching Approach

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs

Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results

Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems

Performance enhancement of MAGIC FDTD-PIC plasma-wave simulations using GPU processing
Performance Evaluation and Analysis of Sparse Matrix and Graph Kernels on Heterogeneous Processors

Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs

Performance Evaluation and Optimization of HPCG benchmark on CPU + MIC platform

Performance evaluation and optimization of random memory access on multicores with high productivity

Performance Evaluation and Tuning of An OpenCL based Matrix Multiplier

Performance Evaluation of Advanced Features in CUDA Unified Memory

Performance Evaluation of Blocking and NonBlocking Concurrent Queues on GPUs

Performance Evaluation of Concurrent Lock-free Data Structures on GPUs

Performance Evaluation of Container-based Virtualization for High Performance Computing Environments

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Performance evaluation of CUDA programming for machining simulation

Performance evaluation of deep learning on smartphones

Performance Evaluation of Deep Learning Tools in Docker Containers

Performance Evaluation of Discrete Wavelet Transform Based on Image Compression Technique on Both CPU and GPU

Performance Evaluation of Edge Detection Techniques on GPU Using OpenCL

Performance Evaluation of Feature Extraction Algorithm on GPGPU

Performance evaluation of GPU memory hierarchy using the FFT

Performance evaluation of H.264/AVC decoding and visualization using the GPU

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

Performance evaluation of image processing algorithms on the GPU

Performance Evaluation of Intel Xeon Phi Coprocessor using XKaapi

Performance Evaluation of Mixed Precision Algorithms for Solving Sparse Linear Systems

Performance Evaluation of OpenMP’s Target Construct on GPUs – Exploring Compiler Optimizations

Performance Evaluation of OpenMP’s Target Construct on GPUs: Exploring Compiler Optimizations

Performance Evaluation of Parallel AES Implementations over CUDA GPU Framework

Performance Evaluation of Parallel Count Sort using GPU Computing with CUDA

Performance Evaluation of Particle Swarm Optimization Algorithms on GPU Using CUDA

Performance Evaluation of Python ParallelProgramming Models: Charm4Py and mpi4py

Performance Evaluation of Query Processing Algorithms on GPGPUs

Performance Evaluation of Quicksort with GPU Dynamic Parallelism for Gene-Expression Quantile Normalization

Performance Evaluation of R with Intel Xeon Phi Coprocessor

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Performance Evaluation of the Intel Many Integrated Core Architecture for 3D Image Reconstruction in Computed Tomography

Performance evaluation of the multi-device OpenCL FDTD solver
Performance Evaluation of the NVIDIA GeForce 8800 GTX GPU for Machine Learning

Performance Evaluation of the Ocean-Land-Atmosphere Model Using Graphics Processing Units

Performance Evaluations of Document-Oriented Databases using GPU and Cache Structure

Performance Evaluations of Graph Database using CUDA and OpenMP-Compatible Libraries

Performance Exploration of Selected Manually and Automatically Parallelized Codes on GPUs

Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors

Performance Impact of Data Layout on the GPU-accelerated IDW Interpolation

Performance impact of dynamic parallelism on different clustering algorithms

Performance Impact of Memory Channels on Sparse and Irregular Algorithms

Performance Improvement of Data Mining in Weka through GPU Acceleration

Performance Improvement of Multichannel Audio by Graphics Processing Units

Performance Improvement of Optical Algorithms on Multicore Platforms

Performance Improvement of TOUGH2 Simulation with Graphics Processing Unit

Performance improvements for iterative electron tomography reconstruction using graphics processing units (GPUs)

Performance improvements of real-time crowd simulations

Performance in GPU Architectures: Potentials and Distances

Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

Performance modeling of atomic additions on GPU scratchpad memory

Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures

Performance Modelling and Traffic Characterisation of Optical Networks
Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures

Performance models for CPU-GPU data transfers

Performance models for CUDA streams on NVIDIA GeForce series

Performance Models for Heterogeneous Iterative Programs

Performance Monitoring of Multi-FPGA Systems

Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

Performance of a GPU-based Direct Summation Algorithm for Computation of Small Angle Scattering Profile

Performance of Confidential Computing GPUs

Performance of CPU and GPU HPC Architectures for off-design aircraft simulation

Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms

Titles: 100
open PDFs: 93
packages: 13
