Papers on hgpu.org (.txt-file)
Combining Data Parallelism and Task Parallelism for Efficient Performance on Hybrid CPU and GPU Systems

Combining Multiple Optimised FPGA-based Pulsar Search Modules Using OpenCL

Combining Performance and Productivity: Accelerating the Network Sensing Graph Challenge with GPUs and Commodity Data Science Software

Combining recent HPC techniques for 3D geophysics acceleration

Combustion Simulations Using Graphic Processing Units

Coming Soon: Research in a Cloud

Communication and Coordination Paradigms for Highly-Parallel Accelerators

Communication Architectures for Scalable GPU-centric Computing Systems

Communication Optimization for Multi GPU Implementation of Smith-Waterman Algorithm

Communication-Avoiding Optimization of Geometric Multigrid on GPUs

Communication-avoiding QR decomposition for GPUs

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Communication-Minimizing 2D Convolution in GPU Registers

Communication-minimizing Asynchronous Tensor Parallelism

Community Structure Discovery algorithm on GPU with CUDA
Compact data structure and scalable algorithms for the sparse grid technique

Comparative Analysis of OpenACC, OpenMP and CUDA using Sequential and Parallel Algorithms

Comparative Evaluation of Binary Features

Comparative evaluation of platforms for parallel Ant Colony Optimization

Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU

Comparative Performance and Scalability Analysis of GPU-accelerated Database Operations

Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning

Comparative Study of Frequent Itemset Mining Techniques on Graphics Processor

Comparative Study of High Performance Computing Using Multi-core Parallel Systems

Comparative study of parallel programming models for multicore computing

Comparative Study of the Parallelization of the Smith-Waterman Algorithm on OpenMP and Cuda C

Comparing CUDA and OpenGL implementations for a Jacobi iteration

Comparing CUDA, OpenCL and OpenGL Implementations of the Cardiac Monodomain Equations

Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels

Comparing FPGAs to Graphics Accelerators and the Playstation 2 Using a Unified Source Description

Comparing GPU and CPU in OLAP Cubes Creation
Comparing GPU-based multi-volume ray casting techniques
Comparing Hardware Accelerators in Scientific Applications: A Case Study
Comparing Intra- and Inter-Processor Parallelism on Multi-Core Cell Processors for Scientific Simulations

Comparing Linear and Convex Relaxations for Stereo and Motion

Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation

Comparing Many-Core Accelerator Frameworks

Comparing Parallel Functional Array Languages: Programming and Performance

Comparing Parallel Hardware Architectures for Visually Guided Robot Navigation

Comparing Parallel Simulation of Social Agents using Cilk and OpenCL

Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing

Comparing Performance and Portability between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs

Comparing Programmer Productivity in OpenACC and CUDA: an Empirical Investigation

Comparing SYCL data transfer strategies for tracking use cases

Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips

Comparing the Power and Performance of Intel’s SCC to State-of-the-Art CPUs and GPUs

Comparing the Treecode with FMM on GPUs for vortex particle simulations of a leapfrogging vortex ring

Comparing Two Generations of Embedded GPUs Running a Feature Detection Algorithm

Comparison and Analysis of GPGPU and Parallel Computing on Multi-Core CPU

Comparison and Analysis of GPU Energy Effciency For CUDA and OpenCL

Comparison and Analysis of GPU Energy Efficiency For CUDA and OpenCL

Comparison based sorting for systems with multiple GPUs

Comparison between GPU and parallel CPU optimizations in viewshed analysis

Comparison of Cilk, Kaapi and CUDA for the Jacobi Method

Comparison of CPML Implementations for the GPU-Accelerated FDTD Solver

Comparison of different n-body algorithms on various hardware platforms using SYCL

Comparison of Different Parallel Implementaions of the 2+1-Dimensional KPZ Model and the 3-Dimensional KMC Model

Comparison of FPGA and GPU implementations of real-time stereo vision

Comparison of Fragmentation/Dispersion Models for Asteroid Nuclear Disruption Mission Design

Comparison of GPU Architectures for Asynchronous Communication with Finite-Differencing Applications

Comparison of HPC Architectures for Computing All-Pairs Shortest Paths. Intel Xeon Phi KNL vs NVIDIA Pascal

Comparison of Hybrid Sorting Algorithms Implemented on Different Parallel Hardware Platforms

Comparison of OpenCL performance on different platforms using VexCL and Blaze

Comparison of OpenMP & OpenCL Parallel Processing Technologies

Comparison of OpenMP and OpenCL Parallel Processing Technologies

Comparison of parallel sorting algorithms

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

Comparison of Random Number Generators in Particle Swarm Optimization Algorithm

Comparison of Rectangular Matrix Multiplication with and without Border Conditions

Comparison of several parallel API for cloth modelling on modern GPUs
Comparison of SPMV performance on matrices with different matrix format using CUSP, cuSPARSE and ViennaCL

Comparison of Technologies for General-Purpose Computing on Graphics Processing Units

Comparison of Thread Execution Methods for GPU-oriented OpenCL Programs on Multicore Processors

COMPASS: a programmable data prefetcher using idle GPU shaders

Compensated Visual Hull for Defective Segmentation and Occlusion

Compensated Visual Hull with GPU-Based Optimization

Compensating Indirect Scattering for Immersive and Semi-Immersive Projection Displays

Competing computational approaches to reaction-diffusion equations in clusters of cells

Compilation and Design Space Exploration of Dataflow Programs for Heterogeneous CPU-GPU Platforms

Compilation for Heterogeneous Computing: Automating Analyses, Transformations and Decisions

Compilation techniques and language support to facilitate dependence-driven computation

Compile-time GPU memory access optimizations

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations

Compiler and Runtime Systems for Generative AI Models

Compiler and runtime techniques for bulk-synchronous programming models on CPU architectures

Compiler Assisted Runtime Adaptation

Compiler Fuzzing through Deep Learning

Compiler optimizations for directive-based programming for accelerators

Compiler Optimizations for Industrial Unstructured Mesh CFD Applications on GPUs

Compiler Optimizations for SIMD/GPU/Multicore Architectures

Compiler support for general-purpose computation on GPUs
Compiler Support for High-level GPU Programming

Compiler Support for Speculation in Decoupled Access/Execute Architectures

Compiler Technologies in Deep Learning Co-Design: A Survey

Compiler-assisted distribution of OpenMP code for improved scalability

Compiler-Assisted Workload Consolidation For Efficient Dynamic Parallelism on GPU

Compiler-based Data Prefetching and Streaming Non-temporal Store Generation for the Intel Xeon Phi Coprocessor

Compiler-Based Tools to Aid in Data Transfer Optimization and On-Chip Debug of Heterogeneous Compute Systems

Compiler-centric across-stack deep learning acceleration

Titles: 100
open PDFs: 94
packages: 17
