Papers on hgpu.org (.txt-file)
Brownian dynamics simulations on CPU and GPU with BD_BOX
Browsing a Large Collection of Community Photos Based on Similarity on GPU
Browsing Large Image Datasets through Voronoi Diagrams
Brute force de-shredding algorithm using the GPU
Brute-Force k-Nearest Neighbors Search on the GPU
BSGP: bulk-synchronous GPU programming
Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs
Buffer overflow vulnerabilities in CUDA: a preliminary analysis
Bufferless NOC Simulation of Large Multicore System on GPU Hardware
Build and Travel KD-Tree with CUDA
Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
Building a Personal High Performance Computer with Heterogeneous Processors
Building a Real-Time Multi-GPU Platform: Robust Real-Time Interrupt Handling Despite Closed-Source Drivers
Building Correlators with Many-Core Hardware
Building Human Brain Network in 3D Coefficient Map Determined by X-ray Microtomography
Building Multiclass Nonlinear Classifiers with GPUs
Building Source-to-Source Compilers for Heterogeneous Targets
Building-Blocks for Performance Oriented DSLs
Bulk Execution of Oblivious Algorithms on the Unified Memory Machine, with GPU Implementation
Bulk GCD Computation Using a GPU to Break Weak RSA Keys
Bump Mapping Unparametrized Surfaces on the GPU
Bundled depth-map merging for multi-view stereo
Burrows-Wheeler Aligner: A Parallel Approach
BVH for efficient raytracing of dynamic metaballs on GPU
C and CUDA Implementation for SIRT and SART Reconstruction Algorithms
C Language Extensions for Hybrid CPU/GPU Programming with StarPU
C to Cellular Automata and Execution on CPU, GPU and FPGA
C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators
C-for-Metal: High Performance SIMD Programming on Intel GPUs
C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++
Cache and bandwidth aware matrix multiplication on the GPU
Cache Miss Analysis for GPU Programs Based on Stack Distance Profile
Cache-efficient numerical algorithms using graphics hardware
CADDIES: A New Framework for Rapid Development of Parallel Cellular Automata Algorithms for Flood Simulation
Caffe con Troll: Shallow Ideas to Speed Up Deep Learning
Caffe: Convolutional Architecture for Fast Feature Embedding
Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks
Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
CaffeLink: Mathematica binding for Caffe Deep Learning Framework
CaffePresso: An Optimized Library for Deep Learning on Embedded Accelerator-based platforms
Calamari – A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition
Calculation by articificial compressibility method and virtual flux method on GPU
Calculation of fermion loops for eta-prime and nucleon scalar and electromagnetic form factors
Calculation of Force Field Grids for Molecular Docking Using Graphics Processing Unit
Calculation of HELAS amplitudes for QCD processes using graphics processing unit (GPU)
Calculation of Stochastic Heating and Emissivity of Cosmic Dust Grains with Optimization for the Intel Many Integrated Core Architecture
Calculation of weight vectors for wideband beamforming using Graphics Processing Units
CAMPAIGN: An open-source Library of GPU-accelerated Data Clustering Algorithms
Can CUDA be exposed through web services?
Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck?
Can GPUs Sort Strings Efficiently?
Can Large Language Models Predict Parallel Code Performance?
Can PCM Benefit GPU? Reconciling Hybrid Memory Design with GPU Massive Parallelism for Energy Efficiency
Can Portability Improve Performance? An Empirical Study of Parallel Graph Analytics
Can Tensor Cores Benefit Memory-Bound Kernels? (No!)
Can We Run in Parallel? Automating Loop Parallelization for TornadoVM
Canadian Hydrogen Intensity Mapping Experiment (CHIME) Pathfinder
Candidate set parallelization strategies for Ant Colony Optimization on the GPU
CANNA: Neural Network Acceleration using Configurable Approximation on GPGPU
Canny edge detection on NVIDIA CUDA
Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL
CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures
Capturing the Memory Topology of GPUs
Caracal: dynamic translation of runtime environments for GPUs
Caracteristiques arithmetiques des processeurs graphiques
CaravelaMPI: Message Passing Interface for Parallel GPU-Based Applications
Cardiac Dysrhythmia Detection with GPU-Accelerated Neural Networks
Cardiac simulation on multi-GPU platform
Cardiac tissue simulation using graphics hardware
Cartesian SENSE and k-t SENSE reconstruction using commodity graphics hardware
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting
Case Studies in Acceleration of Heston’s Stochastic Volatility Financial Engineering Model: GPU, Cloud and FPGA Implementations
Case Study: GPU-based implementation of sequence pair based floorplanning using CUDA
Case study: Interactive rendering of adaptive mesh refinement data
Case study: Runtime reduction of a buffer insertion algorithm using GPU parallel programming
CASE: A Compiler-Assisted SchEduling Framework for Multi-GPU Systems
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark
CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization
Caustics Mapping: An Image-Space Technique for Real-Time Caustics
CAVE-CL: An OpenCL version of the package for detection and quantitative analysis of internal cavities in a system of overlapping balls: application to proteins
CBench: Analyzing Compute Performance for Modern NVIDIA and AMD GPUs
CBESW: sequence alignment on the Playstation 3
CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data
CDFC: Collision Detection Based on Fuzzy Clustering for Deformable Objects on GPU’s
Celeris: A GPU-accelerated open source software with a Boussinesq-type wave solver for real-time, interactive simulation and visualization
CELES: CUDA-accelerated simulation of electromagnetic scattering by large ensembles of spheres
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabled GPU
cellGPU: massively parallel simulations of dynamic vertex models
Cellular automaton for ultra-fast watershed transform on GPU
Cellular Genetic Algorithms and Local Search for 3-SAT problem on Graphic Hardware
Cellular GPU Models to Euclidean Optimization Problems
Cellular Level Agent Based Modelling on the Graphics Processing Unit
cf4ocl: a C framework for OpenCL
CFD code adaptation to the FPGA architecture
CFD Simulation of Jet Cooling and Implementation of Flow Solvers in GPU
CFD-based analysis and two-level aerodynamic optimization on Graphics Processing Units
Titles: 100
open PDFs: 86
packages: 30