Papers on hgpu.org (.txt-file)
Least Squares on GPUs in Multiple Double Precision
Lectures on Parallel Computing
LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations
LeFlow: Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural Networks
LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory
Legion: Programming Distributed Heterogeneous Architectures with Logical Regions
Legolizer: A Real-Time System for Modeling and Rendering LEGO Representations of Boundary Models
Lensed: a code for the forward reconstruction of lenses and sources from strong lensing observations
Leo: A Profile-Driven Dynamic Optimization Framework for GPU Applications
Lessons learned from contrasting a BLAS kernel implementations
Lessons learned in a decade of research software engineering GPU applications
Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame
Let’s sort this out: GPGPU Verification of Radix Sort
Lettuce: PyTorch-based Lattice Boltzmann Framework
Level Sets and Voronoi based Feature Extraction from any Imagery
Level-of-Detail Triangle Strips for Deforming Meshes
Leveraging Binary Translation for Heterogeneous Profiling
Leveraging Computation Sharing and Parallel Processing in Location-Based Services
Leveraging Data-Flow Information for Efficient Scheduling of Task-Parallel Programs on Heterogeneous Systems
Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications
Leveraging Memory Copy Overlap for Efficient Sparse Matrix-Vector Multiplication on GPUs
Leveraging on High-Performance Computing and Cloud Technologies in Digital Libraries: A Case Study
Leveraging Parallelism with CUDA and OpenCL
Leveraging the potential of task-based programming with OpenMP task graphs
Levy Flights for Particle Swarm Optimisation Algorithms on Graphical Processing Units
LeXInt: GPU-accelerated Exponential Integrators package
libcloudph++ 0.1: single-moment bulk, double-moment bulk, and particle-based warm-rain microphysics library in C++
libCudaOptimize: an Open Source Library of GPU-based Metaheuristics
libhclooc: Software Library Facilitating Out-of-core Implementations of Accelerator Kernels on Hybrid Computing Platforms
libmolgrid: GPU Accelerated Molecular Gridding for Deep Learning Applications
libWater: Heterogeneous Distributed Computing Made Easy
LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning
Light Loss-Less Data Compression, with GPU Implementation
Light propagation for mixed polygonal and volumetric data
Light Propagation Maps on Parallel Graphics Architectures
Lighting Details Preserving Photon Density Estimation
LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning
Lightning: Scaling the GPU Programming Model Beyond a Single GPU
LightPlay: Efficient Replay with GPUs
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
LightScan: Faster Scan Primitive on CUDA Compatible Manycore Processors
Lightweight bleeding and smoke effect for surgical simulators
Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming
Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs
Lina: a fast design optimisation tool for software-based FPGA programming
linalg: Matrix Computations in Apache Spark
Line-art Illustration of Dynamic and Specular Surfaces
Linear Algebra Algorithms for Hybrid Architectures with XKaapi
Linear algebra operators for GPU implementation of numerical algorithms
Linear Feature Detection on GPUs
Linear genetic programming GPGPU on Microsoft’s Xbox 360
Linear optimization on modern GPUs
Linear Performance-Breakdown Model: A Framework for GPU kernel programs performance analysis
Linear Solvers for Stable Fluids: GPU vs CPU
Linearised inversion with GPUs
Linpack evaluation on a supercomputer with heterogeneous accelerators
linus: Conveniently explore, share, and present large-scale biological trajectory data from a web browser
liquidSVM: A Fast and Versatile SVM package
Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers
Literature Review and Implementation Overview: High Performance Computing with Graphics Processing Units for Classroom and Research Use
Literature review: Build and Travel KD-Tree with CUDA
Literature Review: Parallel Computing on linear equations of linear elastic FEM stimulation with CUDA
LithOS: An Operating System for Efficient Machine Learning on GPUs
Live Migration for OpenCL FPGA Accelerators
Live Migration of FPGA Applications
Living Flows: Enhanced Exploration of Edge-Bundled Graphs Based on GPU-Intensive Edge Rendering
LLload: An Easy-to-Use HPC Utilization Tool
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLMPerf: GPU Performance Modeling meets Large Language Models
LLOR: Automated Repair of OpenMP Programs
LLVM-based automation of memory decoupling for OpenCL applications on FPGAs
LN-Annote: An Alternative Approach to Information Extraction from Emails using Locally-Customized Named-Entity Recognition
LNA: Fast Protein Classification Using A Laplacian Characterization of Tertiary Structure
LO-SpMM: Low-cost Search for High-performance SpMM Kernels on GPUs
Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization
Load Balancing for Constraint Solving with GPUs
Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability
Load Balancing in Data Warehouse – Evolution and Perspectives
Load Balancing Utilizing Data Redundancy in Distributed Volume Rendering
Load-Balanced Multi-GPU Ambient Occlusion for Direct Volume Rendering
Local Alignment Tool Based on Hadoop Framework and GPU Architecture
Local Histogram Modification Based Contrast Enhancement with GPU Acceleration
Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid
Local Search Algorithms on Graphics Processing Units. A Case Study: The Permutation Perceptron Problem
Local Volatility FX Basket Option on CPU and GPU
Local vs. Global Optimization: Operator Placement Strategies in Heterogeneous Environments
Locality Analysis for Characterizing Applications Based on Sparse Matrices
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
Locality Aware Work-Stealing Based Scheduling in Hybrid CPU-GPU
Locality optimization on a NUMA architecture for hybrid LU factorization
Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives
Locality-Aware Mapping of Nested Parallel Patterns on GPUs
Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model
Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures
Titles: 100
open PDFs: 95
packages: 32