Papers on hgpu.org (.txt-file)
K3 Moore’s Law in the Era of GPU Computing
KAdvice: infering synchronization patterns from an existing codebase
KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks
Kalman Filter Tracking on Parallel Architectures
Kalman-Filter-Based Particle Tracking on Parallel Architectures at Hadron Colliders
kANN on the GPU with Shifted Sorting
Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras
Kargus: a Highly-scalable Software-based Intrusion Detection System
KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators
Kd-Jump: a Path-Preserving Stackless Traversal for Faster Isosurface Raytracing on GPUs
KD-tree acceleration structures for a GPU raytracer
Kd-tree Based N-Body Simulations with Volume-Mass Heuristic on the GPU
kEDM: A Performance-portable Implementation of Empirical Dynamic Modeling using Kokkos
Keeneland: Bringing heterogeneous GPU computing to the computational science community
Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU
Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications
Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)
Kernel Tuner: A search-optimizing GPU code auto-tuner
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
Kernel-as-a-Service: A Serverless Interface to GPUs
Kernel-Centric Optimizations for Deep Neural Networks on GPGPU
Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
KERNELGEN – A Toolchain for Automatic GPU-centric Applications Porting
KernelGen – the design and implementation of a next generation compiler platform for accelerating numerical models on GPUs
KernelInterceptor: automating GPU kernel verification by intercepting kernels and their parameters
Kernelized Renyi distance for speaker recognition
KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications
Key derivation functions and their GPU implementation
Key Reconciliation with Low-Density Parity-Check Codes for Long-Distance Quantum Cryptography
Keynote address: Immersive exploration of large datasets
KFusion: Obtaining Modularity and Performance with Regards to General Purpose GPU Computing and Co-processors
Kinematic Modelling of Disc Galaxies using Graphics Processing Units
Kinetics of liquid-solid phase transition in large nickel clusters
Kite: Braided Parallelism for Heterogeneous Systems
KLARAPTOR: A Tool for Dynamically Finding Optimal Kernel Launch Parameters Targeting CUDA Programs
kNN Query Processing in Metric Spaces Using GPUs
Kokkidio: Fast, expressive, portable code, based on Kokkos and Eigen
Kokkos: Enabling performance portability across manycore architectures
Krylov Subspace Accelerated Algebraic Multigrid for Mimetic Finite Differences on GPUs
KUDA: GPU Accelerated Split Race Checker
LAMDA: Learning-Assisted Multi-Stage Autotuning for FPGA Design Closure
LAMMPS’ PPPM Long-Range Solver for the Second Generation Xeon Phi
LAMMPScuda – a new GPU accelerated Molecular Dynamics Simulations Package and its Application to Ion-Conducting Glasses
Landau Gauge Fixing on GPUs and String Tension
Langevin dynamics simulations of biomolecules on graphics processors
Language Modeling with Gated Convolutional Networks
Language virtualization for heterogeneous parallel computing
Large calculation of the flow over a hypersonic vehicle using a GPU
Large data visualization on distributed memory multi-GPU clusters
Large Integer Arithmetic in GPU for Cryptography
Large neighborhood local search optimization on graphics processing units
Large scale 3D shape retrieval by exploiting multi-core and GPU
Large Scale Artificial Neural Network Training Using Multi-GPUs
Large Scale Bioinformatics Data Mining with Parallel Genetic Programming on Graphics Processing Units
Large Scale DNA Sequence Alignment and Kernel Method Implemented with GPUs
Large Scale Finite Element Analysis Using GPU Parallel Computing
Large Scale GPU Accelerated PPMLR-MHD Simulations for Space Weather Forecast
Large Scale GPU Based Simulations of Turbulent Bubbly Flow in a Square Duct
Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
Large Scale Monte Carlo Tree Search on GPU
Large scale parallel state space search utilizing graphics processing units and solid state disks
Large Scale Physical Modeling Sound Synthesis
Large Scale Plane Wave Pseudopotential Density Functional Theory Calculations on GPU Clusters
Large Scale Simulations of the Euler Equations on GPU Clusters
Large Speed Increase Using Novel GPU Based Algorithms to Simulate Cardiac Excitation Waves in a Rabbit Ventricle
Large steps in GPU-based deformable bodies simulation
Large-eddy simulations with ClimateMachine: a new open-source code for atmospheric simulations on GPUs and CPUs
Large-Scale Compute-Intensive Analysis via a Combined In-Situ and Co-Scheduling Workflow Approach
Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations
Large-Scale Deep Learning on the YFCC100M Dataset
Large-scale deep unsupervised learning using graphics processors
Large-Scale DNS of Gas-Solid Flow on Mole-8.5
Large-scale ferrofluid simulations on graphics processing units
Large-scale FFT on GPU clusters
Large-Scale Geospatial Processing on Multi-Core and Many-Core Processors: Evaluations on CPUs, GPUs and MICs
Large-Scale High-Lundquist Number Reduced MHD Simulations of the Solar Corona Using GPU Accelerated Machines
Large-scale image analysis using docker sandboxing
Large-scale mixer simulations using massively parallel GPU architectures
Large-scale Monte Carlo simulation of two-dimensional classical XY model using multiple GPUs
Large-Scale Motion Modelling using a Graphical Processing Unit
Large-scale multi-dimensional document clustering on GPU clusters
Large-scale Nanostructure Simulations from X-ray Scattering Data On Graphics Processor Clusters
Large-scale network simulation over heterogeneous computing architecture
Large-Scale Paralleled Sparse Principal Component Analysis
Large-Scale Physics-Based Terrain Editing Using Adaptive Tiles on the GPU
Large-Scale Sound Field Rendering in Rectangular Room with Specular Reflection
Large-Scale Stereo Display Wall Using Programmable Graphics Hardware
Large-Scale Stochastic Learning using GPUs
Large-scale transient stability simulation on graphics processing units
Large-scale Virtual Acoustics Simulation at Audio Rates Using Three Dimensional Finite Difference Time Domain and Multiple GPUs
Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation
Larrabee: a many-core x86 architecture for visual computing
Latency considerations of depth-first GPU ray tracing
Lattice Based Volumetric Global Illumination
Lattice Boltzmann based PDE solver on the GPU
Lattice Boltzmann Method for Simulating Turbulent Flows
Titles: 100
open PDFs: 92
packages: 26