Papers on hgpu.org (.txt-file)
Automatic Synthesis of Heterogeneous CPU-GPU Embedded Applications from a UML Profile
Automatic Termination Analysis for GPU Kernels
Automatic Test Case Reduction for OpenCL
Automatic test case reduction of randomly generated OpenCL kernels
Automatic transformation and optimization of applications on GPUs and GPU clusters
Automatic Translation of CUDA to OpenCL and Comparison of Performance Optimizations on GPUs
Automatic tuning matrix multiplication performance on graphics hardware
Automatic Tuning of Local Memory Use on GPGPUs
Automatic Virtualization of Accelerators
Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation
Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation
Automatically Generating Efficient Simulation Codes on GPUs from Partial Differential Equations
Automatically Harnessing Sparse Acceleration
Automatically Selecting Profitable Thread Block Sizes Using Machine Learning
Automatically translating a general purpose C++ image processing library for GPUs
Automatically Tuned Dense Linear Algebra for Multicore+GPU
Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
Automating a Labour Performance Measurement and Risk Assessment: An Evaluation of Methods for a Computer Vision based System
Automating elimination of idle functions by run-time reconfiguration
Automating GPU computing in MATLAB
Automating the Last-Mile for High Performance Dense Linear Algebra
AutOMP: An Automatic OpenMP Parallelization Generator for Variable-Oriented High-Performance Scientific Codes
AutoParBench: A Unified Test Framework for OpenMP-based Parallelizers
AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning
Autotuning CUDA Compiler Parameters for Heterogeneous Applications using the OpenTuner Framework
Autotuning CUDA: Applying NLP Techniques to LS-CAT
Autotuning for Automatic Parallelization on Heterogeneous Systems
Autotuning GPU Kernels via Static and Predictive Analysis
Autotuning of Pattern Runtimes for Accelerated Parallel Systems
Autotuning OpenACC Work Distribution via Direct Search
Autotuning OpenCL Workgroup Size for Stencil Patterns
Autotuning Programs with Algorithmic Choice
Autotuning Stencil-Based Computations on GPUs
Autotuning Stencils Codes with Algorithmic Skeletons
Autotuning Tensor Contraction Computations on GPUs
Autotuning Wavefront Abstractions for Heterogeneous Architectures
Autotuning Wavefront Patterns for Heterogeneous Architectures
Autotuning, Code Generation and Optimizing Compiler Technology for GPUs
Auxiliary Image Regularization for Deep CNNs with Noisy Labels
AvA: Accelerated Virtualization of Accelerators
AVEC: Accelerator Virtualization in Cloud-Edge Computing for Deep Learning Libraries
AVSS2011 demo session: GPU enabled Smart Video Node
AVX-512 extension to OpenQCD 1.6
AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL
Axel: a heterogeneous cluster with FPGAs and GPUs
AZP: Automatic Specialization for Zero Values in Gaming Applications
b-Bit Minwise Hashing in Practice: Large-Scale Batch and Online Learning and Using GPUs for Fast Preprocessing with Simple Hash Functions
B-CALM: An open-source GPU-based 3D-FDTD with multi-pole dispersion for plasmonics
B-Calm: an Open-Source Multi-Gpu-Based 3D-FDTD with Multi-Pole Dispersion for Plasmonics
Back Ground Subtraction Algorithm For Moving Object Detection In FPGA
Backpropagation Training for Fisher Vectors within Neural Networks
BaCO: A Fast and Portable Bayesian Compiler Optimization Framework
Bacon: A GPU Programming System With Just in Time Specialization
Balancing locality and concurrency: solving sparse triangular systems on GPUs
Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach
Bamboo: Automatic Translation of MPI Source into a Latency-Tolerant Form
Bandicoot: C++ Library for GPU Linear Algebra and Scientific Computing
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
Bandwidth Reduction Through Multithreaded Compression of Seismic Images
Bandwidth Requirements of GPU Architectures
BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU
Barra, a Modular Functional GPU Simulator for GPGPU
Barra: A Parallel Functional Simulator for GPGPU
BarraCUDA – a fast short read sequence aligner using graphics processing units
Barrier Invariants: A Shared State Abstraction for the Analysis of Data-Dependent GPU Kernels
Barycentric coordinates computation in homogeneous coordinates
BASEMENT v3: a modular freeware for river process modelling over multiple computational backends
Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts
BAT: A Benchmark suite for AutoTuners
Batch Method for Efficient Resource Sharing in Real-time Multi-GPU Systems
Batch Records Insertion into Multidimensional Linear Dynamic Hashing Table on GPU
Batched Kronecker product for 2-D matrices and 3-D arrays on NVIDIA GPUs
Batched Linear Algebra Problems on GPU Accelerators
Batched Matrix Computations on Hardware Accelerators
Batched Matrix Computations on Hardware Accelerators Based on GPUs
Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression
Batched Shift Reduce Parsing with Lists of Vectors on CUDA
Bayesian Image Restoration Using A Large-scale Total Patch Variation Prior
Bayesian inference for artificial perception using OpenCL on FPGAs and GPUs
Bayesian model comparison via sequential Monte Carlo
Bayesian neural networks for detecting epistasis in genetic association studies
Bayesian Neural Networks for Genetic Association Studies of Complex Disease
Bayesian Neural Networks in Data-Intensive High Energy Physics Applications
Bayesian Optimization for auto-tuning GPU kernels
Bayesian real-time perception algorithms on GPU
Bayesian Sparse Unsupervised Learning for Probit Models of Binary Data
Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors
Bayesian State-Space Modelling on High-Performance Hardware Using LibBi
BbmTTP: Beat-based Parallel Simulated Annealing Algorithm on GPGPUs for the Mirrored Traveling Tournament Problem
BEAGLE: an Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics
Beam Dynamics Simulations Using GPUs
Beam Dynamics Simulations with a GPU-accelerated Version of ELEGANT
Beauty And The Beast: Exploiting GPUs In Haskell
Beehive SPIR-V Toolkit: A Composable and Functional API for Runtime SPIR-V Code Generation
Behavioral graph fraud detection in E-commerce
Behavioral Non-portability in Scientific Numeric Computing
Behavioral Spherical Harmonics for Long-Range Agents’ Interaction
Titles: 100
open PDFs: 96
packages: 34