Papers on hgpu.org (.txt-file)
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
Kernel-as-a-Service: A Serverless Interface to GPUs
Kernel-Centric Optimizations for Deep Neural Networks on GPGPU
KernelBench: Can LLMs Write Efficient GPU Kernels?
Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
KERNELGEN – A Toolchain for Automatic GPU-centric Applications Porting
KernelGen – the design and implementation of a next generation compiler platform for accelerating numerical models on GPUs
KernelInterceptor: automating GPU kernel verification by intercepting kernels and their parameters
Kernelized Renyi distance for speaker recognition
KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications
Key derivation functions and their GPU implementation
Key Reconciliation with Low-Density Parity-Check Codes for Long-Distance Quantum Cryptography
Keynote address: Immersive exploration of large datasets
KFusion: Obtaining Modularity and Performance with Regards to General Purpose GPU Computing and Co-processors
Kinematic Modelling of Disc Galaxies using Graphics Processing Units
Kinetics of liquid-solid phase transition in large nickel clusters
Kite: Braided Parallelism for Heterogeneous Systems
KLARAPTOR: A Tool for Dynamically Finding Optimal Kernel Launch Parameters Targeting CUDA Programs
kNN Query Processing in Metric Spaces Using GPUs
Kokkidio: Fast, expressive, portable code, based on Kokkos and Eigen
Kokkos: Enabling performance portability across manycore architectures
Krylov Subspace Accelerated Algebraic Multigrid for Mimetic Finite Differences on GPUs
KUDA: GPU Accelerated Split Race Checker
LAMDA: Learning-Assisted Multi-Stage Autotuning for FPGA Design Closure
LAMMPS’ PPPM Long-Range Solver for the Second Generation Xeon Phi
LAMMPScuda – a new GPU accelerated Molecular Dynamics Simulations Package and its Application to Ion-Conducting Glasses
Landau Gauge Fixing on GPUs and String Tension
Langevin dynamics simulations of biomolecules on graphics processors
Language Modeling with Gated Convolutional Networks
Language virtualization for heterogeneous parallel computing
Large calculation of the flow over a hypersonic vehicle using a GPU
Large data visualization on distributed memory multi-GPU clusters
Large Integer Arithmetic in GPU for Cryptography
Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework
Large neighborhood local search optimization on graphics processing units
Large scale 3D shape retrieval by exploiting multi-core and GPU
Large Scale Artificial Neural Network Training Using Multi-GPUs
Large Scale Bioinformatics Data Mining with Parallel Genetic Programming on Graphics Processing Units
Large Scale DNA Sequence Alignment and Kernel Method Implemented with GPUs
Large Scale Finite Element Analysis Using GPU Parallel Computing
Large Scale GPU Accelerated PPMLR-MHD Simulations for Space Weather Forecast
Large Scale GPU Based Simulations of Turbulent Bubbly Flow in a Square Duct
Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
Large Scale Monte Carlo Tree Search on GPU
Large scale parallel state space search utilizing graphics processing units and solid state disks
Large Scale Physical Modeling Sound Synthesis
Large Scale Plane Wave Pseudopotential Density Functional Theory Calculations on GPU Clusters
Large Scale Simulations of the Euler Equations on GPU Clusters
Large Speed Increase Using Novel GPU Based Algorithms to Simulate Cardiac Excitation Waves in a Rabbit Ventricle
Large steps in GPU-based deformable bodies simulation
Large-eddy simulations with ClimateMachine: a new open-source code for atmospheric simulations on GPUs and CPUs
Large-Scale Compute-Intensive Analysis via a Combined In-Situ and Co-Scheduling Workflow Approach
Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations
Large-Scale Deep Learning on the YFCC100M Dataset
Large-scale deep unsupervised learning using graphics processors
Large-Scale DNS of Gas-Solid Flow on Mole-8.5
Large-scale ferrofluid simulations on graphics processing units
Large-scale FFT on GPU clusters
Large-Scale Geospatial Processing on Multi-Core and Many-Core Processors: Evaluations on CPUs, GPUs and MICs
Large-Scale High-Lundquist Number Reduced MHD Simulations of the Solar Corona Using GPU Accelerated Machines
Large-scale image analysis using docker sandboxing
Large-scale mixer simulations using massively parallel GPU architectures
Large-scale Monte Carlo simulation of two-dimensional classical XY model using multiple GPUs
Large-Scale Motion Modelling using a Graphical Processing Unit
Large-scale multi-dimensional document clustering on GPU clusters
Large-scale Nanostructure Simulations from X-ray Scattering Data On Graphics Processor Clusters
Large-scale network simulation over heterogeneous computing architecture
Large-Scale Paralleled Sparse Principal Component Analysis
Large-Scale Physics-Based Terrain Editing Using Adaptive Tiles on the GPU
Large-Scale Sound Field Rendering in Rectangular Room with Specular Reflection
Large-Scale Stereo Display Wall Using Programmable Graphics Hardware
Large-Scale Stochastic Learning using GPUs
Large-scale transient stability simulation on graphics processing units
Large-scale Virtual Acoustics Simulation at Audio Rates Using Three Dimensional Finite Difference Time Domain and Multiple GPUs
Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation
Larrabee: a many-core x86 architecture for visual computing
Latency considerations of depth-first GPU ray tracing
Lattice Based Volumetric Global Illumination
Lattice Boltzmann based PDE solver on the GPU
Lattice Boltzmann Method for Simulating Turbulent Flows
Lattice Boltzmann Simulation of Binary Mixture Diffusion Using Modern Graphics Processors
Lattice Boltzmann Simulations of Multiphase Flows
Lattice Boltzmann simulations of the permeability and capillary adsorption of cement model microstructures
Lattice Boltzmann Simulations on a GPU: An optimization approach using C++ AMP
Lattice Group Models: GPU Acceleration and Numerics
Lattice QCD on new chips: a community summary
Lattice QCD simulations using the OpenACC platform
Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors
Lattice Quantum Chromodynamics on Intel Xeon Phi based supercomputers
Lattice Simulations using OpenACC compilers
Lattice-based flow field modeling
Lattice-Boltzmann Simulation of the Shallow-Water Equations with Fluid-Structure Interaction on Multi- and Manycore Processors
Titles: 100
open PDFs: 95
packages: 23