Papers on hgpu.org (.txt-file)
Climbing Mont Blanc – A Training Site for Energy Efficient Programming on Heterogeneous Multicore Processors

Clinically applicable Monte Carlo-based biological dose optimization for the treatment of head and neck cancers with spot-scanning proton therapy

clMAGMA: High Performance Dense Linear Algebra with OpenCL

clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization

Clock Math – A System for Solving SLEs Exactly

CLOP: A Multi-stage Compiler to Seamlessly Embed Heterogeneous Code

clOpenCL – Supporting Distributed Heterogeneous Computing in HPC Clusters

CLort: High Throughput and Low Energy Network Intrusion Detection on IoT Devices with Embedded GPUs

Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology

Cloth Simulation Using AABB Hierarchies and GPU Parallelism
CloudCL: Single-Paradigm Distributed Heterogeneous Computing for Cloud Infrastructures

clpeak – peak performance of your opencl device

clRNG: A Random Number API with Multiple Streams for OpenCL

clSPARSE: A Vendor-Optimized Open-Source Sparse BLAS Library

clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs

CLTestCheck: Measuring Test Effectiveness for GPU Kernels

cltorch: a Hardware-Agnostic Backend for the Torch Deep Neural Network Library, Based on OpenCL

CLTune: A Generic Auto-Tuner for OpenCL Kernels

CLUEstering: a high-performance density-based clustering library for scientific computing

ClusCo: clustering and comparison of protein models

Cluster and Fast-Update Simulations of Regular and Rewired Lattice Ising Models Using CUDA and Graphical Processing Units

Cluster versus GPU implementation of an Orthogonal Target Detection Algorithm for Remotely Sensed Hyperspectral Images

Cluster-Level Tuning of a Shallow Water Equation Solver on the Intel MIC Architecture

Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters

Clustering Based Search Algorithm For Motion Estimation

Clustering billions of data points using GPUs

Clustering coefficient queries on massive dynamic social networks
Clustering on GPU – A Brief Survey

Clustering Throughput Optimization on the GPU

ClusterWatch: Flexible, Lightweight Monitoring for High-end GPGPU Clusters

CMA-ES for Hyperparameter Optimization of Deep Neural Networks

CMCpy: Genetic Code-Message Coevolution Models in Python

CMLCompiler: A Unified Compiler for Classical Machine Learning

CnC-CUDA: declarative programming for GPUs

CNN2Gate: An Implementation of Convolutional Neural Networks Inference on FPGAs with Automated Design Space Exploration

CNNLab: a Novel Parallel Framework for Neural Networks using GPU and FPGA-a Practical Study with Trade-off Analysis

Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing

Co-processing SPMD Computation on GPUs and CPUs on Shared Memory System

Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU

Co-tuning of Software Specializers and Hardware Accelerators within a CNN Application

Coalition Structure Generation with the Graphic Processor Unit

Coalition Structure Generation with the Graphics Processing Unit

Coarse grain parallelization of evolutionary algorithms on GPGPU cards with EASEA
Coating Process Monitoring Using Computer Vision

CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning

Code Generation Compiler for the OpenMP 4.0 Accelerator Model onto OMPSS

Code Generation for a Variety of Accelerators for a Graph DSL

Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU

Code Generation for Embedded Heterogeneous Architectures on Android

Code Generation for High-Level Synthesis of Multiresolution Applications on FPGAs

Code Generation from Functional to Imperative: Combining Destination-Passing Style and Views

Code Optimization and Performance Analysis of Oceanographic Software Package NEMO for GPGPU Systems

Code Optimization and Scaling of the Astrophysics Software Gadget on Intel Xeon Phi

Code optimization based on source to source transformations using profile guided metrics

Code Optimization on Kepler GPUs and Xeon Phi

Code Optimization Techniques for Graphics Processing Units

Code Refinement of Stencil Codes

Coding Ants: Using Ant Colony Optimization to Accelerate CT Reconstruction

CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices

Cofactorization on Graphics Processing Units

COFFEE: an Optimizing Compiler for Finite Element Local Assembly

Cognitive radio network for the smart grid: Experimental system architecture, control algorithms, security, and microgrid testbed

Coherence aware GPU-based ray casting for virtual colonoscopy
Coherent Photon Mapping on the Intel MIC Architecture

Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos

Coherent transport by adiabatic passage on atom chips

Collaborative design and optimization using Collective Knowledge

Collaborative Diffusion on the GPU for Path-Finding in Games

Collaborative diffusion: programming antiobjects

Collaborative execution environment for heterogeneous parallel systems

Collage: Automated Integration of Deep Learning Backends

Collection skeletons: declarative abstractions for data collections

Collective Communication for 100k+ GPUs

Collision Detection Based on Fuzzy Scene Subdivision

Collision Detection of Triangle Meshes using GPU

Collision detection on the GPU

Collision Detection: Broad Phase Adaptation from Multi-Core to Multi-GPU Architecture

Collision for 75-step SHA-1: Intensive Parallelization with GPU

Collision-Driven Volumetric Deformation on the GPU

Collision-streams: fast GPU-based collision detection for deformable models

Color and motion-based particle filter target tracking in a network of overlapping cameras with multi-threading and GPGPU

Color Correction Acceleration Using a Color Cube and OpenCL

Color Me Noisy: Example-based Rendering of Hand-colored Animations with Temporal Noise Control

Color Seamlessness in Multi-Projector Displays Using Constrained Gamut Morphing

Colored stochastic shadow maps

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

Colour flux-tubes in static Pentaquark and Tetraquark systems

Column-Oriented Datalog on the GPU

Combinatorial Optimization of Work Distribution on Heterogeneous Systems

Combined acoustic and optical trapping

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

Combining approximate inference methods for efficient learning on large computer clusters

Combining Belief Propagation and Successive Cancellation List Decoding of Polar Codes on a GPU Platform

Combining computer vision and physics simulations using GPGPU

Titles: 100
open PDFs: 93
packages: 27
