Papers on hgpu.org (.txt-file)
CUDA: Scalable parallel programming for high-performance scientific computing
cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis
CudaChain: A Practical GPU-accelerated 2D Convex Hull Algorithm
CUDACL: A tool for CUDA and OpenCL programmers
CUDACLAW: a Data Parallel Solution Framework for Hyperbolic PDEs
CUDACS: securing the cloud with CUDA-enabled secure virtualization
CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization
CUDAEASY – a GPU Accelerated Cosmological Lattice Program
CudaGIS: Report on the Design and Realization of a Massive Data Parallel GIS on GPUs
Cudagrind: A Valgrind Extension for CUDA
CudaHull: Fast Parallel 3D Convex Hull on the GPU
CUDAICA: GPU optimization of Infomax-ICA EEG analysis
CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences
cudaMap: a GPU accelerated program for gene expression connectivity mapping
CudaRF: A CUDA-based Implementation of Random Forests
CUDASA: Compute Unified Device and Systems Architecture
CUDASW++ 2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions
CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions
CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units
cuDNN: Efficient Primitives for Deep Learning
CUDT: A CUDA Based Decision Tree Algorithm
Cue-independent extending inverse kinematics for robust pose estimation in 3D point clouds
CUED-RNNLM – An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models
cufftShift: High Performance CUDA-accelerated FFT-shift Library
cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs
CUgrep: A GPU-based high performance multi-string matching system
cuGWAM: Genome-wide association multifactor dimensionality reduction using CUDA-enabled high-performance graphics processing unit
CuHMMer: A load-balanced CPU-GPU cooperative bioinformatics application
cuIBM — A GPU-accelerated Immersed Boundary Method
cuInspiral: prototype gravitational waves detection pipeline fully coded on GPU using CUDA
CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU
CULA: hybrid GPU accelerated linear algebra routines
CuLDA_CGS: Solving Large-scale LDA Problems on GPUs
cuLGT: Lattice Gauge Fixing on GPUs
CULLIDE: interactive collision detection between complex models in large environments using graphics hardware
CuMAPz: a tool to analyze memory access patterns in CUDA
CuMF_SGD: Fast and Scalable Matrix Factorization
CuMF: scale matrix factorization using just ONE machine with GPUs
CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures
CuNeuQuant: A CUDA Implementation of the NeuQuant Image Quantization Algorithm
CuParcone A High-Performance Evolvable Neural Network Model
CuPBoP-AMD: Extending CUDA to AMD Platforms
CuPBoP: CUDA for Parallelized and Broad-range Processors
CuPBoP: Making CUDA a Portable Language
cuPC: CUDA-based Parallel PC Algorithm for Causal Structure Learning on GPU
cuPentBatch – A batched pentadiagonal solver for NVIDIA GPUs
CuPP – A framework for easy CUDA integration
cuPSO: GPU Parallelization for Particle Swarm Optimization Algorithms
CURFIL: Random Forests for Image Labeling on GPU
Curling and clumping fur represented by texture layers
Curracurrong: a stream processing system for distributed environments
Current and Nascent SETI Instruments in the Radio and Optical
CUSA and CUDE: GPU-accelerated methods for estimating solvent accessible surface area and desolvation
cusFFT: A High-Performance Sparse Fast Fourier Transform Algorithm on GPUs
CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform
CUSIMANN: An optimized simulated annealing software for GPUs
cuSLINK: Single-linkage Agglomerative Clustering on the GPU
cuSten – CUDA Finite Difference and Stencil Library
Custom Code Generation for a Graph DSL
Customizable Domain-Specific Computing
Customizable Memory Schemes for Data Parallel Accelerators
Customization of OpenCL Applications for Efficient Task Mapping under Heterogeneous Platform Constraints
Customizing Driving Directions with GPUs
Customizing Instruction Set Extensible Reconfigurable Processors using GPUs
cuSZ-I: High-Fidelity Error-Bounded Lossy Compression for Scientific Data on GPUs
cuSZ(x): Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs
CUTE solutions for two-point correlation functions from large cosmological datasets
cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs
CUVLE: Variable-Length Encoding on CUDA
cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs
CVC: The Contourlet Video Compression algorithm for real-time applications
CVPI: A Computer Vision Library For Mobile and Embedded Platforms
Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid
Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers
CytonRL: an Efficient Reinforcement Learning Open-source Toolkit Implemented in C++
D-face: Parallel Implementation of CNN Based Face Classifier using Drone Data On K40 & Jetson TK1
D5.5.2 – Architectural Techniques to exploit SLACK & ACCURACY trade-offs
D5.5.3 – Design and implementation of the SIMD-MIMD GPU architecture
D5.5.4 – Characterization of Redundancy and Definition of Work Reuse
Daino: A High-level Framework for Parallel and Efficient AMR on GPUs
Daisen: A Framework for Visualizing Detailed GPU Execution
DAMS: distributed adaptive metaheuristic selection
Dandelion: a Compiler and Runtime for Heterogeneous Systems
Dank Learning: Generating Memes Using Deep Neural Networks
Dark Sky Simulations: Early Data Release
Darknet on OpenCL: a multi-platform tool for object detection and classification
DarKnight: An Accelerated Framework for Privacy and Integrity Preserving Deep Learning Using Trusted Hardware
Data access optimized applications on the GPU using NVIDIA CUDA
Data Acquisition with GPUs: The DAQ for the Muon g-2 Experiment at Fermilab
Data analysis and 3D evolution in High Energy Physics using graphic processor
Data Analysis of Minimally-Structured Heterogeneous Logs: An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes
Data Assimilation using a GPU Accelerated Path Integral Monte Carlo Approach
Data Buffering Optimization Methods toward a Uniform Programming Interface for GPU-based Applications
Data Coherence Analysis and Optimization for Heterogeneous Computing
Data Compression using CUDA programming in GPU
Data driven scheduling approach for the multi-node multi-GPU Cholesky decomposition
Data handling inefficiencies between CUDA, 3D rendering, and system memory
Titles: 100
open PDFs: 92
packages: 40