Papers on hgpu.org (.txt-file)
Batched Linear Algebra Problems on GPU Accelerators
Batched Matrix Computations on Hardware Accelerators
Batched Matrix Computations on Hardware Accelerators Based on GPUs
Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression
Batched Shift Reduce Parsing with Lists of Vectors on CUDA
Bayesian Image Restoration Using A Large-scale Total Patch Variation Prior
Bayesian inference for artificial perception using OpenCL on FPGAs and GPUs
Bayesian model comparison via sequential Monte Carlo
Bayesian neural networks for detecting epistasis in genetic association studies
Bayesian Neural Networks for Genetic Association Studies of Complex Disease
Bayesian Neural Networks in Data-Intensive High Energy Physics Applications
Bayesian Optimization for auto-tuning GPU kernels
Bayesian real-time perception algorithms on GPU
Bayesian Sparse Unsupervised Learning for Probit Models of Binary Data
Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors
Bayesian State-Space Modelling on High-Performance Hardware Using LibBi
BbmTTP: Beat-based Parallel Simulated Annealing Algorithm on GPGPUs for the Mirrored Traveling Tournament Problem
BEAGLE: an Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics
Beam Dynamics Simulations Using GPUs
Beam Dynamics Simulations with a GPU-accelerated Version of ELEGANT
Beauty And The Beast: Exploiting GPUs In Haskell
Beehive SPIR-V Toolkit: A Composable and Functional API for Runtime SPIR-V Code Generation
Behavioral graph fraud detection in E-commerce
Behavioral Non-portability in Scientific Numeric Computing
Behavioral Spherical Harmonics for Long-Range Agents’ Interaction
Belief Propagation by Message Passing in Junction Trees: Computing Each Message Faster Using GPU Parallelization
Belief Propagation on the GPU for Stereo Vision
Believe it or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!
Bempp-cl: A fast Python based just-in-time compiling boundary element library
BenchDirect: A Directed Language Model for Compiler Benchmarks
BenchFriend: Correlating the Performance of GPU Benchmarks
BENCHIP: Benchmarking Intelligence Processors
Benchmarking a Proof-of-Concept Performance Portable SYCL-based Fast Fourier Transformation Library
Benchmarking Across Platforms: European Option Pricing
Benchmarking and Dissecting the Nvidia Hopper GPU Architecture
Benchmarking and Implementation of Probability-Based Simulations on Programmable Graphics Cards
Benchmarking and modelling of POWER7, Westmere, BG/P, and GPUs: an industry case study
Benchmarking and Optimization of Gradient Boosted Decision Tree Algorithms
Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor
Benchmarking Deep Learning Models on Jetson TX2
Benchmarking GPU and CPU codes for Heisenberg spin glass overrelaxation
Benchmarking GPU and TPU Performance with Graph Neural Networks
Benchmarking GPU Devices with N-Body Simulations
Benchmarking GPUs to tune dense linear algebra
Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters
Benchmarking Intel Xeon Phi to Guide Kernel Design
Benchmarking Modern Edge Devices for AI Applications
Benchmarking Next Generation Hardware Platforms: An Experimental Approach
Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption
Benchmarking optimization algorithms for auto-tuning GPU kernels
Benchmarking Parallel Performance on Many-Core Processors
Benchmarking performance of a hybrid Xeon/Xeon Phi system for parallel computation of similarity measures between large vectors
Benchmarking State-of-the-Art Deep Learning Software Tools
Benchmarking the cost of thread divergence in CUDA
Benchmarking the Intel Xeon Phi Coprocessor
Benchmarking the Memory Hierarchy of Modern GPUs
Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers
Benchmarking Thread Block Cluster
Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
Benchmarks Based on Anti-Parallel Patterns for the Evaluation of GPUs
Benchmarks for Intel MIC Architecture
BenchPress: A Deep Active Benchmark Generator
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations
Best Practice Guide – Intel Xeon Phi
Best Practice Guide Intel Xeon Phi v2.0
Best-effort semantic document search on GPUs
Betatron tune measurement with the LHC damper using a GPU
Better speedups using simpler parallel programming for graph connectivity and biconnectivity
Betweenness Centrality on GPUs and Heterogeneous Architectures
Beyond 16GB: Out-of-Core Stencil Computations
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising
Beyond Amdahl’s Law: An Objective Function That Links Multiprocessor Performance Gains To Delay and Energy
Beyond Desktop Computation: Challenges in Scaling a GPU Infrastructure
Beyond programmable shading (parts I and II)
Beyond Straightforward Vectorization of Lightweight Data Compression Algorithms for Larger Vector Sizes
BFROST: Binary Features from Robust Orientation Segment Tests accelerated on the GPU
Bi-directional Path Tracing on GPU
Bidimensional Median Filter for Parallel Computing Architectures
BIDMach: Large-scale Learning with Zero Memory Allocation
Bifrost: a Python/C++ Framework for High-Throughput Stream Processing in Astronomy
Big Integer Multiplication with CUDA FFT (cuFFT) Library
Bigger Buffer k-d Trees on Multi-Many-Core Systems
BigKernel — High Performance CPU-GPU Communication Pipelining for Big Data-style Applications
Billion-scale similarity search with GPUs
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models
Binary Interval Search (BITS): A Scalable Algorithm for Counting Interval Intersections
Binary Interval Search: a scalable algorithm for counting interval intersections
Binary Mesh Partitioning for Cache-Efficient Visualization
Binary Segmentation of Video Sequences in Real Time
BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
Binaural Simulations Using Audio Rate FDTD Schemes and CUDA
Binomial American Option Pricing on CPU-GPU Hetergenous System
Bio-inspired computer visual system using GPU and Visual Pattern Assessment Language (ViPAL): Application on breast cancer prognosis
Bio-Inspired Optimization of Ultra-Wideband Patch Antennas Using Graphics Processing Unit Acceleration
Bio-sequence database scanning on a GPU
Titles: 100
open PDFs: 97
packages: 29