Papers on hgpu.org (.txt-file)
CHPS: An Environment for Collaborative Execution on Heterogeneous Desktop Systems
Chrono: a parallel multi-physics library for rigid-body, flexible-body, and fluid dynamics
Chunkflow: Distributed Hybrid Cloud Processing of Large 3D Images by Convolutional Nets
Cinematic Particle Systems with OpenCL
Circular Hough Transform in OpenCL
CitiusSynapse: A Deep Learning Framework for Embedded Systems
CL-VIS: Visualization Platform for Understanding and Checking the OpenCL Programs
CL2QCD – Lattice QCD based on OpenCL
Clacc: Translating OpenACC to OpenMP in Clang
Classical Mechanical Hard-Core Particles Simulated in a Rigid Enclosure using Multi-GPU Systems
Classical Simulation of Quantum Adiabatic Algorithms using Mathematica on GPUs
Classiffication-based Financial Markets Prediction using Deep Neural Networks
Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks
Classification Performance of Convolutional Neural Networks
Classify QCD phase transition with deep learning
ClawHMMER: A Streaming HMMer-Search Implementation
CLBlast: A Tuned OpenCL BLAS Library
ClearPath: highly parallel collision avoidance for multi-agent simulation
ClearView: An Interactive Context Preserving Hotspot Visualization Technique
CLgrep: A Parallel String Matching Tool
Climbing Mont Blanc – A Training Site for Energy Efficient Programming on Heterogeneous Multicore Processors
Clinically applicable Monte Carlo-based biological dose optimization for the treatment of head and neck cancers with spot-scanning proton therapy
clMAGMA: High Performance Dense Linear Algebra with OpenCL
clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization
Clock Math – A System for Solving SLEs Exactly
CLOP: A Multi-stage Compiler to Seamlessly Embed Heterogeneous Code
clOpenCL – Supporting Distributed Heterogeneous Computing in HPC Clusters
CLort: High Throughput and Low Energy Network Intrusion Detection on IoT Devices with Embedded GPUs
Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology
Cloth Simulation Using AABB Hierarchies and GPU Parallelism
CloudCL: Single-Paradigm Distributed Heterogeneous Computing for Cloud Infrastructures
clpeak – peak performance of your opencl device
clRNG: A Random Number API with Multiple Streams for OpenCL
clSPARSE: A Vendor-Optimized Open-Source Sparse BLAS Library
clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs
CLTestCheck: Measuring Test Effectiveness for GPU Kernels
cltorch: a Hardware-Agnostic Backend for the Torch Deep Neural Network Library, Based on OpenCL
CLTune: A Generic Auto-Tuner for OpenCL Kernels
ClusCo: clustering and comparison of protein models
Cluster and Fast-Update Simulations of Regular and Rewired Lattice Ising Models Using CUDA and Graphical Processing Units
Cluster versus GPU implementation of an Orthogonal Target Detection Algorithm for Remotely Sensed Hyperspectral Images
Cluster-Level Tuning of a Shallow Water Equation Solver on the Intel MIC Architecture
Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters
Clustering Based Search Algorithm For Motion Estimation
Clustering billions of data points using GPUs
Clustering coefficient queries on massive dynamic social networks
Clustering on GPU – A Brief Survey
Clustering Throughput Optimization on the GPU
ClusterWatch: Flexible, Lightweight Monitoring for High-end GPGPU Clusters
CMA-ES for Hyperparameter Optimization of Deep Neural Networks
CMCpy: Genetic Code-Message Coevolution Models in Python
CMLCompiler: A Unified Compiler for Classical Machine Learning
CnC-CUDA: declarative programming for GPUs
CNN2Gate: An Implementation of Convolutional Neural Networks Inference on FPGAs with Automated Design Space Exploration
CNNLab: a Novel Parallel Framework for Neural Networks using GPU and FPGA-a Practical Study with Trade-off Analysis
Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing
Co-processing SPMD Computation on GPUs and CPUs on Shared Memory System
Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU
Co-tuning of Software Specializers and Hardware Accelerators within a CNN Application
Coalition Structure Generation with the Graphic Processor Unit
Coalition Structure Generation with the Graphics Processing Unit
Coarse grain parallelization of evolutionary algorithms on GPGPU cards with EASEA
Coating Process Monitoring Using Computer Vision
CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
Code Generation Compiler for the OpenMP 4.0 Accelerator Model onto OMPSS
Code Generation for a Variety of Accelerators for a Graph DSL
Code Generation for Embedded Heterogeneous Architectures on Android
Code Generation for High-Level Synthesis of Multiresolution Applications on FPGAs
Code Generation from Functional to Imperative: Combining Destination-Passing Style and Views
Code Optimization and Performance Analysis of Oceanographic Software Package NEMO for GPGPU Systems
Code Optimization and Scaling of the Astrophysics Software Gadget on Intel Xeon Phi
Code optimization based on source to source transformations using profile guided metrics
Code Optimization on Kepler GPUs and Xeon Phi
Code Optimization Techniques for Graphics Processing Units
Code Refinement of Stencil Codes
Coding Ants: Using Ant Colony Optimization to Accelerate CT Reconstruction
CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices
Cofactorization on Graphics Processing Units
COFFEE: an Optimizing Compiler for Finite Element Local Assembly
Cognitive radio network for the smart grid: Experimental system architecture, control algorithms, security, and microgrid testbed
Coherence aware GPU-based ray casting for virtual colonoscopy
Coherent Photon Mapping on the Intel MIC Architecture
Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos
Coherent transport by adiabatic passage on atom chips
Collaborative design and optimization using Collective Knowledge
Collaborative Diffusion on the GPU for Path-Finding in Games
Collaborative diffusion: programming antiobjects
Collaborative execution environment for heterogeneous parallel systems
Collage: Automated Integration of Deep Learning Backends
Collision Detection Based on Fuzzy Scene Subdivision
Collision Detection of Triangle Meshes using GPU
Collision detection on the GPU
Collision Detection: Broad Phase Adaptation from Multi-Core to Multi-GPU Architecture
Collision for 75-step SHA-1: Intensive Parallelization with GPU
Titles: 100
open PDFs: 93
packages: 31