## Papers on hgpu.org (.txt-file)

Comparison of GPU Architectures for Asynchronous Communication with Finite-Differencing Applications

Comparison of HPC Architectures for Computing All-Pairs Shortest Paths. Intel Xeon Phi KNL vs NVIDIA Pascal

Comparison of Hybrid Sorting Algorithms Implemented on Different Parallel Hardware Platforms

Comparison of OpenCL performance on different platforms using VexCL and Blaze

Comparison of OpenMP & OpenCL Parallel Processing Technologies

Comparison of OpenMP and OpenCL Parallel Processing Technologies

Comparison of parallel sorting algorithms

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

Comparison of Random Number Generators in Particle Swarm Optimization Algorithm

Comparison of Rectangular Matrix Multiplication with and without Border Conditions

Comparison of several parallel API for cloth modelling on modern GPUs

Comparison of SPMV performance on matrices with different matrix format using CUSP, cuSPARSE and ViennaCL

Comparison of Technologies for General-Purpose Computing on Graphics Processing Units

Comparison of Thread Execution Methods for GPU-oriented OpenCL Programs on Multicore Processors

COMPASS: a programmable data prefetcher using idle GPU shaders

Compensated Visual Hull for Defective Segmentation and Occlusion

Compensated Visual Hull with GPU-Based Optimization

Compensating Indirect Scattering for Immersive and Semi-Immersive Projection Displays

Competing computational approaches to reaction-diffusion equations in clusters of cells

Compilation and Design Space Exploration of Dataflow Programs for Heterogeneous CPU-GPU Platforms

Compilation for Heterogeneous Computing: Automating Analyses, Transformations and Decisions

Compilation techniques and language support to facilitate dependence-driven computation

Compile-time GPU memory access optimizations

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations

Compiler and runtime techniques for bulk-synchronous programming models on CPU architectures

Compiler Assisted Runtime Adaptation

Compiler Fuzzing through Deep Learning

Compiler optimizations for directive-based programming for accelerators

Compiler Optimizations for Industrial Unstructured Mesh CFD Applications on GPUs

Compiler Optimizations for SIMD/GPU/Multicore Architectures

Compiler support for general-purpose computation on GPUs

Compiler Support for High-level GPU Programming

Compiler Technologies in Deep Learning Co-Design: A Survey

Compiler-assisted distribution of OpenMP code for improved scalability

Compiler-Assisted Workload Consolidation For Efficient Dynamic Parallelism on GPU

Compiler-based Data Prefetching and Streaming Non-temporal Store Generation for the Intel Xeon Phi Coprocessor

Compiler-Based Tools to Aid in Data Transfer Optimization and On-Chip Debug of Heterogeneous Compute Systems

Compiler-centric across-stack deep learning acceleration

Compiler-directed memory management for heterogeneous MPSoCs

Compiler-Driven Performance on Heterogeneous Computing Platforms

Compiler-Level Explicit Cache for a GPGPU Programming Framework

CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research

Compilers for Portable Programming of Heterogeneous Parallel & Approximate Computing Systems

Compiling a High-level Directive-Based Programming Model for GPGPUs

Compiling a high-level language for GPUs: (via language support for architectures and compilers)

Compiling an Array Language to a Graphics Processor

Compiling and Optimizing Java 8 Programs for GPU Execution

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR

Compiling for a heterogeneous vector image processor

Compiling High Performance Recursive Filters

Compiling Parallel Functional Code with Data Parallel Idealised Algol

Compiling Python to a hybrid execution environment

Compiling Stream Applications for Heterogeneous Architectures

Complete PISO and SIMPLE solvers on Graphics Processing Units

Complexity Analysis and Algorithm Design for Reorganizing Data to Minimize Non-Coalesced Memory Accesses on GPU

Complexity effective memory access scheduling for many-core accelerator architectures

Composability of parallel codes on heterogeneous architectures

Composing Distributed Computations Through Task and Kernel Fusion

Composing multiple StarPU applications over heterogeneous machines: a supervised approach

Composition and Reuse with Compiled Domain-Specific Languages

Compositional Compilation for Sparse, Irregular Data Parallelism

Compositional Deep Learning in Futhark

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Compoundly weighted Voronoi: a sequential and parallel implementation

Comprehensive Analysis of High-Performance Computing Methods for Filtered Back-Projection

Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs

Comprehensive Evaluations of Cone-beam CT dose in Image-guided Radiation Therapy via GPU-based Monte Carlo simulations

Comprehensive Optimization of Parametric Kernels for Graphics Processing Units

Comprehensive Performance Monitoring for GPU Cluster Systems

Compressed Dynamic Mode Decomposition for Real-Time Object Detection

Compressed Facade Displacement Maps

Compressed Learning of Deep Neural Networks for OpenCL-Capable Embedded Systems

Compressed Multiple-Row Storage Format

Compressed Real Numbers for AI: a case-study using a RISC-V CPU

Compressed sensing using hidden Markov models with application to vision based aircraft tracking

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

Compressing Floating-Point Number Stream for Numerical Applications

Compression Domain Volume Rendering

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Compressive Phase Contrast Tomography

Computation of Air-Vortices Based on GPU Technology: Optimizing and Parallelizing a Model for Wake-Vortex Prediction Using OpenCL

Computation of electron quantum transport in graphene nanoribbons using GPU

Computation of Galois field expressions for quaternary logic functions on GPUs

Computation of gray-level co-occurrence matrix based on CUDA and its optimization

Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

Computation of the Isogeometric Analysis Stiffness Matrix on GPU

Computation of the Spatial Impulse Response for Ultrasonic Fields on the Graphics Processing Units (GPU)

Computation of Troposphere Slant Delays on a GPU

Computation of Voronoi diagrams using a graphics processing unit

Computation on GPU of Eigenvalues and Eigenvectors of a Large Number of Small Hermitian Matrices

Computation on programmable graphics hardware

Computational advances in gravitational microlensing: a comparison of CPU, GPU, and parallel, large data codes

Computational Biology and Applied Bioinformatics

Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers

Computational Experiments in Markov Chain Monte Carlo

Computational Fluid Dynamic on GPU

Computational Fluid Dynamics Simulations using Many Graphics Processors

Computational Fluid Dynamics Using Graphics Processing Units: Challenges and Opportunities

Computational Fluid Dynamics using OpenCL – a Practical Introduction

Computational Gravitational Dynamics with Modern Numerical Accelerators

Titles: 100

open PDFs: 93

packages: 15