Papers on hgpu.org (.txt-file)
CuMAPz: a tool to analyze memory access patterns in CUDA

CuMF_SGD: Fast and Scalable Matrix Factorization

CuMF: scale matrix factorization using just ONE machine with GPUs

CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures

CuNeuQuant: A CUDA Implementation of the NeuQuant Image Quantization Algorithm

CuParcone A High-Performance Evolvable Neural Network Model
CuPBoP-AMD: Extending CUDA to AMD Platforms

CuPBoP: CUDA for Parallelized and Broad-range Processors

CuPBoP: Making CUDA a Portable Language

cuPC: CUDA-based Parallel PC Algorithm for Causal Structure Learning on GPU

cuPentBatch – A batched pentadiagonal solver for NVIDIA GPUs

CuPP – A framework for easy CUDA integration

cuPSO: GPU Parallelization for Particle Swarm Optimization Algorithms

CURFIL: Random Forests for Image Labeling on GPU

Curling and clumping fur represented by texture layers

Curracurrong: a stream processing system for distributed environments

Current and Nascent SETI Instruments in the Radio and Optical

CUSA and CUDE: GPU-accelerated methods for estimating solvent accessible surface area and desolvation

cusFFT: A High-Performance Sparse Fast Fourier Transform Algorithm on GPUs

CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform

CUSIMANN: An optimized simulated annealing software for GPUs

cuSLINK: Single-linkage Agglomerative Clustering on the GPU

cuSten – CUDA Finite Difference and Stencil Library

Custom Code Generation for a Graph DSL

Customizable Domain-Specific Computing

Customizable Memory Schemes for Data Parallel Accelerators

Customization of OpenCL Applications for Efficient Task Mapping under Heterogeneous Platform Constraints

Customizing Driving Directions with GPUs

Customizing Instruction Set Extensible Reconfigurable Processors using GPUs

cuSZ-I: High-Fidelity Error-Bounded Lossy Compression for Scientific Data on GPUs

cuSZ(x): Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs

cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio

CUTE solutions for two-point correlation functions from large cosmological datasets

cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs

CUVLE: Variable-Length Encoding on CUDA

cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs

CVC: The Contourlet Video Compression algorithm for real-time applications

CVPI: A Computer Vision Library For Mobile and Embedded Platforms

Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid

Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers

CytonRL: an Efficient Reinforcement Learning Open-source Toolkit Implemented in C++

D-face: Parallel Implementation of CNN Based Face Classifier using Drone Data On K40 & Jetson TK1

D5.5.2 – Architectural Techniques to exploit SLACK & ACCURACY trade-offs

D5.5.3 – Design and implementation of the SIMD-MIMD GPU architecture

D5.5.4 – Characterization of Redundancy and Definition of Work Reuse

Daino: A High-level Framework for Parallel and Efficient AMR on GPUs

Daisen: A Framework for Visualizing Detailed GPU Execution

DAMS: distributed adaptive metaheuristic selection

Dandelion: a Compiler and Runtime for Heterogeneous Systems

Dank Learning: Generating Memes Using Deep Neural Networks

Dark Sky Simulations: Early Data Release

Darknet on OpenCL: a multi-platform tool for object detection and classification

DarKnight: An Accelerated Framework for Privacy and Integrity Preserving Deep Learning Using Trusted Hardware

Data access optimized applications on the GPU using NVIDIA CUDA

Data Acquisition with GPUs: The DAQ for the Muon g-2 Experiment at Fermilab

Data analysis and 3D evolution in High Energy Physics using graphic processor

Data Analysis of Minimally-Structured Heterogeneous Logs: An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes

Data Assimilation using a GPU Accelerated Path Integral Monte Carlo Approach

Data Buffering Optimization Methods toward a Uniform Programming Interface for GPU-based Applications

Data Coherence Analysis and Optimization for Heterogeneous Computing

Data Compression using CUDA programming in GPU
Data driven scheduling approach for the multi-node multi-GPU Cholesky decomposition

Data handling inefficiencies between CUDA, 3D rendering, and system memory
Data Layout Optimization for Multi-Valued Containers in OpenCL

Data Layout Oriented Compilation Techniques in Vectorization for Multi-/Many-cores

Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications

Data Layout Transformation for Structured-Grid Codes on GPU

Data Mining and Machine Learning in Astronomy

Data Mining Techniques in Parallel and Distributed Environment – A Comprehensive Survey

Data Mining Using Graphics Processing Units

Data Movement Optimization for High-Performance Computing

Data parallel acceleration of decision support queries using Cell/BE and GPUs

Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL

Data parallel execution challenges and runtime performance of agent simulations on GPUs

Data parallel loop statement extension to CUDA: GpuC

Data parallel patterns on CPU/GPU mix

Data Parallel Quadtree Indexing and Spatial Query Processing of Complex Polygon Data on GPUs

Data Parallel Three-Dimensional Cahn-Hilliard Field Equation Simulation on GPUs with CUDA

Data Parallel Visualization and Rendering on the RAMSES Supercomputer with ANARI

Data Parallelism Exploiting for H.264 Encoder

Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications

Data registration module – a component of semantic simulation engine

Data Regression with Normal Equation on GPU using CUDA

Data Remanence and Digital Forensic Investigation for CUDA Graphics Processing Units

Data Sorting Using Graphics Processing Units

Data Stream Classification using Random Feature Functions and Novel Method Combinations

Data structure design for GPU based heterogeneous systems

Data Structures and Algorithms for Counting Problems on Graphs using GPU

Data Structures and Transformations for Physically Based Simulation on a GPU

Data Structures for Task-based Priority Scheduling

Data Transfer Matters for GPU Computing

Data transfer optimizations for heterogeneous managed runtime systems

Data transformations enabling loop vectorization on multithreaded data parallel architectures

Data Triage and Visual Analytics for Scientific Visualization

Data Visualization and Mining using the GPU

Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
Data-Aware Task Scheduling on Multi-accelerator Based Platforms

Titles: 100
open PDFs: 94
packages: 28
