Papers on hgpu.org (.txt-file)
Leveraging Binary Translation for Heterogeneous Profiling

Leveraging Computation Sharing and Parallel Processing in Location-Based Services
Leveraging Data-Flow Information for Efficient Scheduling of Task-Parallel Programs on Heterogeneous Systems

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

Leveraging Memory Copy Overlap for Efficient Sparse Matrix-Vector Multiplication on GPUs

Leveraging on High-Performance Computing and Cloud Technologies in Digital Libraries: A Case Study

Leveraging Parallelism with CUDA and OpenCL

Leveraging the potential of task-based programming with OpenMP task graphs

Levy Flights for Particle Swarm Optimisation Algorithms on Graphical Processing Units

LeXInt: GPU-accelerated Exponential Integrators package

libcloudph++ 0.1: single-moment bulk, double-moment bulk, and particle-based warm-rain microphysics library in C++

libCudaOptimize: an Open Source Library of GPU-based Metaheuristics

libhclooc: Software Library Facilitating Out-of-core Implementations of Accelerator Kernels on Hybrid Computing Platforms

libmolgrid: GPU Accelerated Molecular Gridding for Deep Learning Applications

Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication

libWater: Heterogeneous Distributed Computing Made Easy

LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning

Light Loss-Less Data Compression, with GPU Implementation

Light propagation for mixed polygonal and volumetric data

Light Propagation Maps on Parallel Graphics Architectures

Lighting Details Preserving Photon Density Estimation

LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning

Lightning: Scaling the GPU Programming Model Beyond a Single GPU

LightPlay: Efficient Replay with GPUs

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

LightScan: Faster Scan Primitive on CUDA Compatible Manycore Processors

Lightweight bleeding and smoke effect for surgical simulators
Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming

Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs

Lina: a fast design optimisation tool for software-based FPGA programming

linalg: Matrix Computations in Apache Spark

Line-art Illustration of Dynamic and Specular Surfaces

Linear Algebra Algorithms for Hybrid Architectures with XKaapi

Linear algebra operators for GPU implementation of numerical algorithms

Linear Feature Detection on GPUs
Linear genetic programming GPGPU on Microsoft’s Xbox 360

Linear optimization on modern GPUs

Linear Performance-Breakdown Model: A Framework for GPU kernel programs performance analysis

Linear Solvers for Stable Fluids: GPU vs CPU

Linearised inversion with GPUs

Linpack evaluation on a supercomputer with heterogeneous accelerators

linus: Conveniently explore, share, and present large-scale biological trajectory data from a web browser

liquidSVM: A Fast and Versatile SVM package

Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers

LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

Literature Review and Implementation Overview: High Performance Computing with Graphics Processing Units for Classroom and Research Use

Literature review: Build and Travel KD-Tree with CUDA

Literature Review: Parallel Computing on linear equations of linear elastic FEM stimulation with CUDA

LithOS: An Operating System for Efficient Machine Learning on GPUs

Live Migration for OpenCL FPGA Accelerators

Live Migration of FPGA Applications

Living Flows: Enhanced Exploration of Edge-Bundled Graphs Based on GPU-Intensive Edge Rendering

LLload: An Easy-to-Use HPC Utilization Tool

LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

LLMPerf: GPU Performance Modeling meets Large Language Models

LLOR: Automated Repair of OpenMP Programs

LLVM-based automation of memory decoupling for OpenCL applications on FPGAs

LN-Annote: An Alternative Approach to Information Extraction from Emails using Locally-Customized Named-Entity Recognition

LNA: Fast Protein Classification Using A Laplacian Characterization of Tertiary Structure

LO-SpMM: Low-cost Search for High-performance SpMM Kernels on GPUs

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization

Load Balancing for Constraint Solving with GPUs

Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability

Load Balancing in Data Warehouse – Evolution and Perspectives

Load Balancing Utilizing Data Redundancy in Distributed Volume Rendering

Load-Balanced Multi-GPU Ambient Occlusion for Direct Volume Rendering

Local Alignment Tool Based on Hadoop Framework and GPU Architecture

Local Histogram Modification Based Contrast Enhancement with GPU Acceleration

Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid

Local Search Algorithms on Graphics Processing Units. A Case Study: The Permutation Perceptron Problem

Local Volatility FX Basket Option on CPU and GPU

Local vs. Global Optimization: Operator Placement Strategies in Heterogeneous Environments

Locality Analysis for Characterizing Applications Based on Sparse Matrices

Locality and parallelism optimization for dynamic programming algorithm in bioinformatics

Locality Aware Work-Stealing Based Scheduling in Hybrid CPU-GPU

Locality optimization on a NUMA architecture for hybrid LU factorization

Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives

Locality-Aware Mapping of Nested Parallel Patterns on GPUs

Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model

Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures

LocalityGuru: A PTX Analyzer for Extracting Thread Block-level Locality in GPGPUs

Locally-Oriented Programming: A Simple Programming Model for Stencil-Based Computations on Multi-Level Distributed Memory Architectures

Location-based Matching in Publish/Subscribe Revisited

LOD Terrain Rendering by Local Parallel Processing on GPU

Log File Regular Expression Pattern Matching And Capture With GPUs

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment

LoGV: Low-overhead GPGPU Virtualization

Long time-scale simulations of in vivo diffusion using GPU hardware

Long Timestep Molecular Dynamics on the Graphical Processing Unit

Long-time Simulations with Complex Code Using Multiple Nodes of Intel Xeon Phi Knights Landing

Loo.py: From Fortran to performance via transformation and substitution rules

Loo.py: transformation-based code generation for GPUs and CPUs

Looking at the surprise: Bottom-up attentional control of an active camera system

Titles: 100
open PDFs: 95
packages: 29
