Papers on hgpu.org (.txt-file)
Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs

Performance Evaluation and Optimization of HPCG benchmark on CPU + MIC platform

Performance evaluation and optimization of random memory access on multicores with high productivity

Performance Evaluation and Tuning of An OpenCL based Matrix Multiplier

Performance Evaluation of Advanced Features in CUDA Unified Memory

Performance Evaluation of Blocking and NonBlocking Concurrent Queues on GPUs

Performance Evaluation of Concurrent Lock-free Data Structures on GPUs

Performance Evaluation of Container-based Virtualization for High Performance Computing Environments

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Performance evaluation of CUDA programming for machining simulation

Performance evaluation of deep learning on smartphones

Performance Evaluation of Deep Learning Tools in Docker Containers

Performance Evaluation of Discrete Wavelet Transform Based on Image Compression Technique on Both CPU and GPU

Performance Evaluation of Edge Detection Techniques on GPU Using OpenCL

Performance Evaluation of Feature Extraction Algorithm on GPGPU

Performance evaluation of GPU memory hierarchy using the FFT

Performance evaluation of H.264/AVC decoding and visualization using the GPU

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

Performance evaluation of image processing algorithms on the GPU

Performance Evaluation of Intel Xeon Phi Coprocessor using XKaapi

Performance Evaluation of Mixed Precision Algorithms for Solving Sparse Linear Systems

Performance Evaluation of OpenMP’s Target Construct on GPUs – Exploring Compiler Optimizations

Performance Evaluation of OpenMP’s Target Construct on GPUs: Exploring Compiler Optimizations

Performance Evaluation of Parallel AES Implementations over CUDA GPU Framework

Performance Evaluation of Parallel Count Sort using GPU Computing with CUDA

Performance Evaluation of Particle Swarm Optimization Algorithms on GPU Using CUDA

Performance Evaluation of Python ParallelProgramming Models: Charm4Py and mpi4py

Performance Evaluation of Query Processing Algorithms on GPGPUs

Performance Evaluation of Quicksort with GPU Dynamic Parallelism for Gene-Expression Quantile Normalization

Performance Evaluation of R with Intel Xeon Phi Coprocessor

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Performance Evaluation of the Intel Many Integrated Core Architecture for 3D Image Reconstruction in Computed Tomography

Performance evaluation of the multi-device OpenCL FDTD solver
Performance Evaluation of the NVIDIA GeForce 8800 GTX GPU for Machine Learning

Performance Evaluation of the Ocean-Land-Atmosphere Model Using Graphics Processing Units

Performance Evaluations of Document-Oriented Databases using GPU and Cache Structure

Performance Evaluations of Graph Database using CUDA and OpenMP-Compatible Libraries

Performance Exploration of Selected Manually and Automatically Parallelized Codes on GPUs

Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors

Performance Impact of Data Layout on the GPU-accelerated IDW Interpolation

Performance impact of dynamic parallelism on different clustering algorithms

Performance Impact of Memory Channels on Sparse and Irregular Algorithms

Performance Improvement of Data Mining in Weka through GPU Acceleration

Performance Improvement of Multichannel Audio by Graphics Processing Units

Performance Improvement of Optical Algorithms on Multicore Platforms

Performance Improvement of TOUGH2 Simulation with Graphics Processing Unit

Performance improvements for iterative electron tomography reconstruction using graphics processing units (GPUs)

Performance improvements of real-time crowd simulations

Performance in GPU Architectures: Potentials and Distances

Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

Performance modeling of atomic additions on GPU scratchpad memory

Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures

Performance Modelling and Traffic Characterisation of Optical Networks
Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures

Performance models for CPU-GPU data transfers

Performance models for CUDA streams on NVIDIA GeForce series

Performance Models for Heterogeneous Iterative Programs

Performance Monitoring of Multi-FPGA Systems

Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

Performance of a GPU-based Direct Summation Algorithm for Computation of Small Angle Scattering Profile

Performance of Confidential Computing GPUs

Performance of CPU and GPU HPC Architectures for off-design aircraft simulation

Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms

Performance of GPU for Pricing Financial Derivatives: Convertible Bonds

Performance of GTX Titan X GPUs and Code Optimization

Performance of Implicit Solver Strategies on GPUs

Performance of inverse atomistic scale fracture modeling on GPGPU architectures
Performance of Kepler GTX Titan GPUs and Xeon Phi System

Performance of Optical Flow Techniques on Graphics Hardware

Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

Performance Optimisations for Heterogeneous Managed Runtime Systems

Performance Optimization of 3-D Lattice Boltzmann Flow Solver on a GPU

Performance Optimization of Clustering On GPU

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU

Performance Optimization of GPU ELF-Codes
Performance Optimization of Memory Intensive Applications on FPGA Accelerator

Performance Optimization of Vision Apps on Mobile Application Processor

Performance Optimization using Multimodal Modeling and Heterogeneous GNN

Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs

Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores

Performance portability analysis of SYCL with a classical CG on CPU, GPU, and FPGA

Performance Portability and Evaluation of Heterogeneous Components of SeisSol Targeted to Upcoming Intel HPC GPUs

Performance Portability Challenges for Fortran Applications

Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler

Performance portability evaluation of blocked stencil computations on GPUs

Performance Portability in Accelerated Parallel Kernels

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework

Performance Portability of the Aeras Atmosphere Model to Next Generation Architectures using Kokkos

Performance Portability Strategies for Computational Fluid Dynamics (CFD) Applications on HPC Systems

Performance portability study of epistasis detection using SYCL on NVIDIA GPU

Performance Portability Study of Linear Algebra Kernels in OpenCL

Performance portability through machine learning guided kernel selection in SYCL libraries

Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study

Performance Portability with the Chapel Language

Performance Portable GPU Code Generation for Matrix Multiplication

Performance Portable Gradient Computations Using Source Transformation

Titles: 100
open PDFs: 95
packages: 16
