Papers on hgpu.org (.txt-file)
Lattice QCD on new chips: a community summary

Lattice QCD simulations using the OpenACC platform

Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

Lattice Quantum Chromodynamics on Intel Xeon Phi based supercomputers

Lattice Simulations using OpenACC compilers

Lattice-based flow field modeling

Lattice-Boltzmann Simulation of the Shallow-Water Equations with Fluid-Structure Interaction on Multi- and Manycore Processors

Lattice-Boltzmann simulation of the shallow-water equations with fluid-structure interaction on multi-and manycore processors

Launch-time Optimization of OpenCL Kernels

Layered Interpretation of Street View Images

LazyTensor: combining eager execution with domain-specific compilers

LBCL: multi-device automatic load balancing

LBM based flow simulation using GPU computing processor

LDetector: A Low Overhead Race Detector For GPU Programs

Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models

Learnergy: Energy-based Machine Learners

Learning a Metric Embedding for Face Recognition using the Multibatch Method

Learning Better Encoding for Approximate Nearest Neighbor Search with Dictionary Annealing

Learning Blood Management in Orthopedic Surgery through Gameplay
Learning hash codes for efficient content reuse detection

Learning Massive Graph Embeddings on a Single Machine

Learning Random Forests on the GPU

Learning Representation for Scene Understanding: Epitomes, CRFs, and CNNs

Learning Sparse Recurrent Neural Networks in Language Modeling

Learning Structured Sparsity in Deep Neural Networks

Learning to Detect Roads in High-Resolution Aerial Images

Learning to Optimize Tensor Programs

Learning Two-View Stereo Matching

Least Squares on GPUs in Multiple Double Precision

Lectures on Parallel Computing

LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations

LeFlow: Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural Networks

LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory

Legion: Programming Distributed Heterogeneous Architectures with Logical Regions

Legolizer: A Real-Time System for Modeling and Rendering LEGO Representations of Boundary Models

Lensed: a code for the forward reconstruction of lenses and sources from strong lensing observations

Leo: A Profile-Driven Dynamic Optimization Framework for GPU Applications

Lessons learned from contrasting a BLAS kernel implementations

Lessons learned in a decade of research software engineering GPU applications

Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame

Let’s sort this out: GPGPU Verification of Radix Sort

Lettuce: PyTorch-based Lattice Boltzmann Framework

Level Sets and Voronoi based Feature Extraction from any Imagery

Level-of-Detail Triangle Strips for Deforming Meshes

Leveraging Binary Translation for Heterogeneous Profiling

Leveraging Computation Sharing and Parallel Processing in Location-Based Services
Leveraging Data-Flow Information for Efficient Scheduling of Task-Parallel Programs on Heterogeneous Systems

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

Leveraging Memory Copy Overlap for Efficient Sparse Matrix-Vector Multiplication on GPUs

Leveraging on High-Performance Computing and Cloud Technologies in Digital Libraries: A Case Study

Leveraging Parallelism with CUDA and OpenCL

Leveraging the potential of task-based programming with OpenMP task graphs

Levy Flights for Particle Swarm Optimisation Algorithms on Graphical Processing Units

LeXInt: GPU-accelerated Exponential Integrators package

libcloudph++ 0.1: single-moment bulk, double-moment bulk, and particle-based warm-rain microphysics library in C++

libCudaOptimize: an Open Source Library of GPU-based Metaheuristics

libhclooc: Software Library Facilitating Out-of-core Implementations of Accelerator Kernels on Hybrid Computing Platforms

libmolgrid: GPU Accelerated Molecular Gridding for Deep Learning Applications

Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication

libWater: Heterogeneous Distributed Computing Made Easy

LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning

Light Loss-Less Data Compression, with GPU Implementation

Light propagation for mixed polygonal and volumetric data

Light Propagation Maps on Parallel Graphics Architectures

Lighting Details Preserving Photon Density Estimation

LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning

Lightning: Scaling the GPU Programming Model Beyond a Single GPU

LightPlay: Efficient Replay with GPUs

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

LightScan: Faster Scan Primitive on CUDA Compatible Manycore Processors

Lightweight bleeding and smoke effect for surgical simulators
Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming

Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs

Lina: a fast design optimisation tool for software-based FPGA programming

linalg: Matrix Computations in Apache Spark

Line-art Illustration of Dynamic and Specular Surfaces

Linear Algebra Algorithms for Hybrid Architectures with XKaapi

Linear algebra operators for GPU implementation of numerical algorithms

Linear Feature Detection on GPUs
Linear genetic programming GPGPU on Microsoft’s Xbox 360

Linear optimization on modern GPUs

Linear Performance-Breakdown Model: A Framework for GPU kernel programs performance analysis

Linear Solvers for Stable Fluids: GPU vs CPU

Linearised inversion with GPUs

Linpack evaluation on a supercomputer with heterogeneous accelerators

linus: Conveniently explore, share, and present large-scale biological trajectory data from a web browser

liquidSVM: A Fast and Versatile SVM package

Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers

LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

Literature Review and Implementation Overview: High Performance Computing with Graphics Processing Units for Classroom and Research Use

Literature review: Build and Travel KD-Tree with CUDA

Titles: 100
open PDFs: 96
packages: 32
