Papers on hgpu.org (.txt-file)
Character-level Transformer-based Neural Machine Translation
Charactering and Detecting CUDA Program Bugs
Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks
Characterising Bipartite Graph Matching Algorithms on GPUs
Characterization and Analysis of Dynamic Parallelism in Unstructured GPU Applications
Characterization and Exploitation of GPU Memory Systems
Characterization and Performance Analysis for 3D Benchmarks
Characterization and Transformation of Unstructured Control Flow in Bulk Synchronous GPU Applications
Characterization and Transformation of Unstructured Control Flow in GPU Applications
Characterization of FPGA-based High Performance Computers
Characterization of Lossy SIW Resonators Based on Multilayer Perceptron Neural Networks on Graphics Processing Unit
Characterization of OpenCL on a Scalable FPGA Architecture
Characterization of Speech Recognition Systems on GPU Architectures
Characterizing and Enhancing Global Memory Data Coalescing on GPUs
Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems
Characterizing and Improving the Use of Demand-Fetched Caches in GPUs
Characterizing and Optimizing Irregular Applications on Graphics Processing Units
Characterizing and Predicting Scientific Workloads for Heterogeneous Computing Systems
Characterizing CUDA and OpenMP Synchronization Primitives
Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
Characterizing Deep Learning Training Workloads on Alibaba-PAI
Characterizing Optimizations to Memory Access Patterns using Architecture-Independent Program Features
Characterizing the Challenges and Evaluating the Efficacy of a CUDA-to-OpenCL Translator
Charged particles constrained to a curved surface
CHARM-SYCL: New Unified Programming Environment for Multiple Accelerator Types
Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services
Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs
CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications
Chest CT automatic analysis for lung nodules detection implemented on a GPU computing system
Chestnut: A GPU Programming Language for Non-Experts
CHO: A Benchmark Suite for OpenCL-based FPGA Accelerators
CHO: Towards a Benchmark Suite for OpenCL FPGA Accelerators
Cholla : A New Massively-Parallel Hydrodynamics Code For Astrophysical Simulation
CHPS: An Environment for Collaborative Execution on Heterogeneous Desktop Systems
Chrono: a parallel multi-physics library for rigid-body, flexible-body, and fluid dynamics
Chunkflow: Distributed Hybrid Cloud Processing of Large 3D Images by Convolutional Nets
CI/CD Efforts for Validation, Verification and Benchmarking OpenMP Implementations
Cinematic Particle Systems with OpenCL
Circular Hough Transform in OpenCL
CitiusSynapse: A Deep Learning Framework for Embedded Systems
CL-VIS: Visualization Platform for Understanding and Checking the OpenCL Programs
CL2QCD – Lattice QCD based on OpenCL
Clacc: Translating OpenACC to OpenMP in Clang
Classical Mechanical Hard-Core Particles Simulated in a Rigid Enclosure using Multi-GPU Systems
Classical Simulation of Quantum Adiabatic Algorithms using Mathematica on GPUs
Classiffication-based Financial Markets Prediction using Deep Neural Networks
Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks
Classification Performance of Convolutional Neural Networks
Classify QCD phase transition with deep learning
ClawHMMER: A Streaming HMMer-Search Implementation
CLBlast: A Tuned OpenCL BLAS Library
ClearPath: highly parallel collision avoidance for multi-agent simulation
ClearView: An Interactive Context Preserving Hotspot Visualization Technique
CLgrep: A Parallel String Matching Tool
Climbing Mont Blanc – A Training Site for Energy Efficient Programming on Heterogeneous Multicore Processors
Clinically applicable Monte Carlo-based biological dose optimization for the treatment of head and neck cancers with spot-scanning proton therapy
clMAGMA: High Performance Dense Linear Algebra with OpenCL
clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization
Clock Math – A System for Solving SLEs Exactly
CLOP: A Multi-stage Compiler to Seamlessly Embed Heterogeneous Code
clOpenCL – Supporting Distributed Heterogeneous Computing in HPC Clusters
CLort: High Throughput and Low Energy Network Intrusion Detection on IoT Devices with Embedded GPUs
Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology
Cloth Simulation Using AABB Hierarchies and GPU Parallelism
CloudCL: Single-Paradigm Distributed Heterogeneous Computing for Cloud Infrastructures
clpeak – peak performance of your opencl device
clRNG: A Random Number API with Multiple Streams for OpenCL
clSPARSE: A Vendor-Optimized Open-Source Sparse BLAS Library
clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs
CLTestCheck: Measuring Test Effectiveness for GPU Kernels
cltorch: a Hardware-Agnostic Backend for the Torch Deep Neural Network Library, Based on OpenCL
CLTune: A Generic Auto-Tuner for OpenCL Kernels
ClusCo: clustering and comparison of protein models
Cluster and Fast-Update Simulations of Regular and Rewired Lattice Ising Models Using CUDA and Graphical Processing Units
Cluster versus GPU implementation of an Orthogonal Target Detection Algorithm for Remotely Sensed Hyperspectral Images
Cluster-Level Tuning of a Shallow Water Equation Solver on the Intel MIC Architecture
Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters
Clustering Based Search Algorithm For Motion Estimation
Clustering billions of data points using GPUs
Clustering coefficient queries on massive dynamic social networks
Clustering on GPU – A Brief Survey
Clustering Throughput Optimization on the GPU
ClusterWatch: Flexible, Lightweight Monitoring for High-end GPGPU Clusters
CMA-ES for Hyperparameter Optimization of Deep Neural Networks
CMCpy: Genetic Code-Message Coevolution Models in Python
CMLCompiler: A Unified Compiler for Classical Machine Learning
CnC-CUDA: declarative programming for GPUs
CNN2Gate: An Implementation of Convolutional Neural Networks Inference on FPGAs with Automated Design Space Exploration
CNNLab: a Novel Parallel Framework for Neural Networks using GPU and FPGA-a Practical Study with Trade-off Analysis
Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing
Co-processing SPMD Computation on GPUs and CPUs on Shared Memory System
Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU
Co-tuning of Software Specializers and Hardware Accelerators within a CNN Application
Coalition Structure Generation with the Graphic Processor Unit
Coalition Structure Generation with the Graphics Processing Unit
Titles: 100
open PDFs: 95
packages: 36