Papers on hgpu.org (.txt-file)
BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images
Bioinformatics Sequence Comparisons on Manycore Processors
Biomedical and Clinical English Model Packages in the Stanza Python NLP Library
Biomedical image analysis on a cooperative cluster of GPUs and multicores
Biomolecular electrostatics simulation with a parallel FMM-based BEM, using up to 512 GPUs
Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns
Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU
Bit-level Parallelization of 3DES Encryption on GPU
Bit-Packed Damaged Lattice Potts Model Simulations with CUDA and GPUs
Bit-Parallel Multiple Pattern Matching
Bit-Vectorized GPU Implementation of a Stochastic Cellular Automaton Model for Surface Growth
Bitcoin and The Age of Bespoke Silicon
BitCracker: BitLocker meets GPUs
Bitmap Filter: Speeding up Exact Set Similarity Joins with Bitwise Operations
Bitstream Database-Driven FPGA Programming Flow Based on Standard OpenCL
BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages
Black-Box Side-Channel Attacks Highlight the Importance of Countermeasures: An Analysis of the Xilinx Virtex-4 and Virtex-5 Bitstream Encryption Mechanism
BLAS Comparison on FPGA, CPU and GPU
Blasting through lattice calculations using CUDA
BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing
Blind image deconvolution algorithm on NVIDIA CUDA platform
Blink: Fast and Generic Collectives for Distributed ML
Blister: GPU-based rendering of Boolean combinations of free-form triangulated shapes
Block based Singular Value Decomposition approach to matrix factorization for recommender systems
Block Conjugate Gradient Solver in OpenCL
Block Time Step Storage Scheme for Astrophysical N-body Simulations
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems
Block-Relaxation Methods for 3D Constant-Coefficient Stencils on GPUs and Multicore CPUs
Block-Size Independence for GPU Programs
Blockchain Goes Green? Part II: Characterizing the Performance and Cost of Blockchains on the Cloud and at the Edge
Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study
Blocking Self-avoiding Walks Stops Cyber-epidemics: A Scalable GPU-based Approach
Blocks and Fuel: Frameworks for deep learning
Boda-RTC: Productive Generation of Portable, Efficient Code for Convolutional Neural Networks on Mobile Computing Platforms
Bohrium: Unmodified NumPy Code on CPU, GPU, and Cluster
Boids that see: Using self-occlusion for simulating large groups on GPUs
Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
Bone structure analysis on multiple GPGPUs
Bone Structure Analysis with GPGPUs
Boosted Algorithms for Visual Object Detection on Graphics Processing Units
Boosting GPU Virtualization Performance with Hybrid Shadow Page Tables
Boosting Java Performance using GPGPUs
Boosting quantum evolutions using Trotter-Suzuki algorithms on GPUs
Boosting sphere decoding speed through Graphic Processing Units
BootCMatchG: An adaptive Algebraic MultiGrid linear solver for GPUs
BOPM implemented on a GPU-architecture
Bothnia: a dual-personality extension to the Intel integrated graphics driver
Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU
Bouncing Behavior of Microscopic Dust Aggregates
Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory
Bounding the effect of partition camping in GPU kernels
Bounds on the Energy Consumption of Computational Kernels
Brain perfusion imaging: performance and accuracy
BrainCove: A Tool for Voxel-wise fMRI Brain Connectivity Visualization
BrainFrame: A heterogeneous accelerator platform for neuron simulations
BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism
Branch and Data Herding: Reducing Control and Memory Divergence for Error-tolerant GPU Applications
Breadth First Search Vectorization on the Intel Xeon Phi
Breadth-First Search using Dynamic Parallelism on the GPU
Breaking the GPU programming barrier with the auto-parallelising SAC compiler
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
Bridging Control-Centric and Data-Centric Optimization
Bridging OpenCL and CUDA: A Comparative Analysis and Translation
Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+
Bridging the Gap between FPGAs and Multi-Processor Architectures: A Video Processing Perspective
Bridging the GPGPU-FPGA efficiency gap
Bridging the Performance-Programmability Gap for FPGAs via OpenCL: A Case Study with OpenDwarfs
Bridging the Semantic Gaps of GPU Acceleration for Scaleout CNN-based Big Data Processing: Think Big, See Small
Brief announcement: better speedups for parallel max-flow
Brief Announcement: On the Limits of Parallelizing Convolutional Neural Networks on GPUs
Bringing Auto-tuning to HIP: Analysis of Tuning Impact and Difficulty on AMD and Nvidia GPUs
Bringing OpenCL to Commodity RISC-V CPUs
Bringing Parallel Performance to Python with Domain-Specific Selective Embedded Just-in-Time Specialization
Brook for GPUs: Stream Computing on Graphics Hardware
Brownian Dynamics of Active Sphere Suspensions Confined Near a No-Slip Boundary
Brownian dynamics simulations on CPU and GPU with BD_BOX
Browsing a Large Collection of Community Photos Based on Similarity on GPU
Browsing Large Image Datasets through Voronoi Diagrams
Brute force de-shredding algorithm using the GPU
Brute-Force k-Nearest Neighbors Search on the GPU
BSGP: bulk-synchronous GPU programming
Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs
Buffer overflow vulnerabilities in CUDA: a preliminary analysis
Bufferless NOC Simulation of Large Multicore System on GPU Hardware
Build and Travel KD-Tree with CUDA
Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
Building a Personal High Performance Computer with Heterogeneous Processors
Building a Real-Time Multi-GPU Platform: Robust Real-Time Interrupt Handling Despite Closed-Source Drivers
Building Correlators with Many-Core Hardware
Building Human Brain Network in 3D Coefficient Map Determined by X-ray Microtomography
Building Multiclass Nonlinear Classifiers with GPUs
Building Source-to-Source Compilers for Heterogeneous Targets
Building-Blocks for Performance Oriented DSLs
Bulk Execution of Oblivious Algorithms on the Unified Memory Machine, with GPU Implementation
Titles: 100
open PDFs: 93
packages: 32