Papers on hgpu.org (.txt-file)
Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck?
Can GPUs Sort Strings Efficiently?
Can PCM Benefit GPU? Reconciling Hybrid Memory Design with GPU Massive Parallelism for Energy Efficiency
Can Portability Improve Performance? An Empirical Study of Parallel Graph Analytics
Can We Run in Parallel? Automating Loop Parallelization for TornadoVM
Canadian Hydrogen Intensity Mapping Experiment (CHIME) Pathfinder
Candidate set parallelization strategies for Ant Colony Optimization on the GPU
CANNA: Neural Network Acceleration using Configurable Approximation on GPGPU
Canny edge detection on NVIDIA CUDA
Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL
CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures
Capturing the Memory Topology of GPUs
Caracal: dynamic translation of runtime environments for GPUs
Caracteristiques arithmetiques des processeurs graphiques
CaravelaMPI: Message Passing Interface for Parallel GPU-Based Applications
Cardiac Dysrhythmia Detection with GPU-Accelerated Neural Networks
Cardiac simulation on multi-GPU platform
Cardiac tissue simulation using graphics hardware
Cartesian SENSE and k-t SENSE reconstruction using commodity graphics hardware
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting
Case Studies in Acceleration of Heston’s Stochastic Volatility Financial Engineering Model: GPU, Cloud and FPGA Implementations
Case Study: GPU-based implementation of sequence pair based floorplanning using CUDA
Case study: Interactive rendering of adaptive mesh refinement data
Case study: Runtime reduction of a buffer insertion algorithm using GPU parallel programming
CASE: A Compiler-Assisted SchEduling Framework for Multi-GPU Systems
Caustics Mapping: An Image-Space Technique for Real-Time Caustics
CAVE-CL: An OpenCL version of the package for detection and quantitative analysis of internal cavities in a system of overlapping balls: application to proteins
CBench: Analyzing Compute Performance for Modern NVIDIA and AMD GPUs
CBESW: sequence alignment on the Playstation 3
CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data
CDFC: Collision Detection Based on Fuzzy Clustering for Deformable Objects on GPU’s
Celeris: A GPU-accelerated open source software with a Boussinesq-type wave solver for real-time, interactive simulation and visualization
CELES: CUDA-accelerated simulation of electromagnetic scattering by large ensembles of spheres
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabled GPU
cellGPU: massively parallel simulations of dynamic vertex models
Cellular automaton for ultra-fast watershed transform on GPU
Cellular Genetic Algorithms and Local Search for 3-SAT problem on Graphic Hardware
Cellular GPU Models to Euclidean Optimization Problems
Cellular Level Agent Based Modelling on the Graphics Processing Unit
cf4ocl: a C framework for OpenCL
CFD code adaptation to the FPGA architecture
CFD Simulation of Jet Cooling and Implementation of Flow Solvers in GPU
CFD-based analysis and two-level aerodynamic optimization on Graphics Processing Units
CFMDS: CUDA-based fast multidimensional scaling for genome-scale data
CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs
Cg: a system for programming graphics hardware in a C-like language
CGiS, a new Language for Data-parallel GPU Programming
CGO: G: Intelligent Heuristic Construction with Active Learning
Chai: Collaborative Heterogeneous Applications for Integrated-architectures
ChainerMN: Scalable Distributed Deep Learning Framework
Challenge benchmarks that must be conquered to sustain the gpu revolution
Challenges Adapting CUDA PIC Codes to multiple GPUs
Challenges and Opportunities in C/C++ Source-To-Source Compilation
Challenges and opportunities of obtaining performance from multi-core CPUs and many-core GPUs
Challenges and Techniques for Transparent Acceleration of Unmodified Big Data Applications
Challenges for a GPU-Accelerated Dynamic Programming Approach for Join-Order Optimization
Challenges for compiler support for exascale computing
Challenges of mapping financial analytics to many-core architecture
Challenges of medical image processing
Challenging cloning related problems with GPU-based algorithms
Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching
ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation
CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi
Character-level Transformer-based Neural Machine Translation
Charactering and Detecting CUDA Program Bugs
Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks
Characterising Bipartite Graph Matching Algorithms on GPUs
Characterization and Analysis of Dynamic Parallelism in Unstructured GPU Applications
Characterization and Exploitation of GPU Memory Systems
Characterization and Performance Analysis for 3D Benchmarks
Characterization and Transformation of Unstructured Control Flow in Bulk Synchronous GPU Applications
Characterization and Transformation of Unstructured Control Flow in GPU Applications
Characterization of FPGA-based High Performance Computers
Characterization of Lossy SIW Resonators Based on Multilayer Perceptron Neural Networks on Graphics Processing Unit
Characterization of OpenCL on a Scalable FPGA Architecture
Characterization of Speech Recognition Systems on GPU Architectures
Characterizing and Enhancing Global Memory Data Coalescing on GPUs
Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems
Characterizing and Improving the Use of Demand-Fetched Caches in GPUs
Characterizing and Optimizing Irregular Applications on Graphics Processing Units
Characterizing and Predicting Scientific Workloads for Heterogeneous Computing Systems
Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
Characterizing Deep Learning Training Workloads on Alibaba-PAI
Characterizing Optimizations to Memory Access Patterns using Architecture-Independent Program Features
Characterizing the Challenges and Evaluating the Efficacy of a CUDA-to-OpenCL Translator
Charged particles constrained to a curved surface
CHARM-SYCL: New Unified Programming Environment for Multiple Accelerator Types
Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs
CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications
Chest CT automatic analysis for lung nodules detection implemented on a GPU computing system
Chestnut: A GPU Programming Language for Non-Experts
CHO: A Benchmark Suite for OpenCL-based FPGA Accelerators
CHO: Towards a Benchmark Suite for OpenCL FPGA Accelerators
Cholla : A New Massively-Parallel Hydrodynamics Code For Astrophysical Simulation
Titles: 100
open PDFs: 89
packages: 26