Papers on hgpu.org (.txt-file)
LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning
Lightning: Scaling the GPU Programming Model Beyond a Single GPU
LightPlay: Efficient Replay with GPUs
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
LightScan: Faster Scan Primitive on CUDA Compatible Manycore Processors
Lightweight bleeding and smoke effect for surgical simulators
Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming
Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs
Lina: a fast design optimisation tool for software-based FPGA programming
linalg: Matrix Computations in Apache Spark
Line-art Illustration of Dynamic and Specular Surfaces
Linear Algebra Algorithms for Hybrid Architectures with XKaapi
Linear algebra operators for GPU implementation of numerical algorithms
Linear Feature Detection on GPUs
Linear genetic programming GPGPU on Microsoft’s Xbox 360
Linear optimization on modern GPUs
Linear Performance-Breakdown Model: A Framework for GPU kernel programs performance analysis
Linear Solvers for Stable Fluids: GPU vs CPU
Linearised inversion with GPUs
Linpack evaluation on a supercomputer with heterogeneous accelerators
linus: Conveniently explore, share, and present large-scale biological trajectory data from a web browser
liquidSVM: A Fast and Versatile SVM package
Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers
LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters
Literature Review and Implementation Overview: High Performance Computing with Graphics Processing Units for Classroom and Research Use
Literature review: Build and Travel KD-Tree with CUDA
Literature Review: Parallel Computing on linear equations of linear elastic FEM stimulation with CUDA
LithOS: An Operating System for Efficient Machine Learning on GPUs
Live Migration for OpenCL FPGA Accelerators
Live Migration of FPGA Applications
Living Flows: Enhanced Exploration of Edge-Bundled Graphs Based on GPU-Intensive Edge Rendering
LLload: An Easy-to-Use HPC Utilization Tool
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLMPerf: GPU Performance Modeling meets Large Language Models
LLOR: Automated Repair of OpenMP Programs
LLVM-based automation of memory decoupling for OpenCL applications on FPGAs
LN-Annote: An Alternative Approach to Information Extraction from Emails using Locally-Customized Named-Entity Recognition
LNA: Fast Protein Classification Using A Laplacian Characterization of Tertiary Structure
LO-SpMM: Low-cost Search for High-performance SpMM Kernels on GPUs
Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization
Load Balancing for Constraint Solving with GPUs
Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability
Load Balancing in Data Warehouse – Evolution and Perspectives
Load Balancing Utilizing Data Redundancy in Distributed Volume Rendering
Load-Balanced Multi-GPU Ambient Occlusion for Direct Volume Rendering
Local Alignment Tool Based on Hadoop Framework and GPU Architecture
Local Histogram Modification Based Contrast Enhancement with GPU Acceleration
Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid
Local Search Algorithms on Graphics Processing Units. A Case Study: The Permutation Perceptron Problem
Local Volatility FX Basket Option on CPU and GPU
Local vs. Global Optimization: Operator Placement Strategies in Heterogeneous Environments
Locality Analysis for Characterizing Applications Based on Sparse Matrices
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
Locality Aware Work-Stealing Based Scheduling in Hybrid CPU-GPU
Locality optimization on a NUMA architecture for hybrid LU factorization
Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives
Locality-Aware Mapping of Nested Parallel Patterns on GPUs
Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model
Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures
LocalityGuru: A PTX Analyzer for Extracting Thread Block-level Locality in GPGPUs
Locally-Oriented Programming: A Simple Programming Model for Stencil-Based Computations on Multi-Level Distributed Memory Architectures
Location-based Matching in Publish/Subscribe Revisited
LOD Terrain Rendering by Local Parallel Processing on GPU
Log File Regular Expression Pattern Matching And Capture With GPUs
LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment
LoGV: Low-overhead GPGPU Virtualization
Long time-scale simulations of in vivo diffusion using GPU hardware
Long Timestep Molecular Dynamics on the Graphical Processing Unit
Long-time Simulations with Complex Code Using Multiple Nodes of Intel Xeon Phi Knights Landing
Loo.py: From Fortran to performance via transformation and substitution rules
Loo.py: transformation-based code generation for GPUs and CPUs
Looking at the surprise: Bottom-up attentional control of an active camera system
LookNN: Neural Network with No Multiplication
Loop Transformation Recipes for Code Generation and Auto-Tuning
LoopBench: An Evaluation of Loop Acceleration in Heterogeneous Systems
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers
Loose capacity-constrained representatives for the qualitative visual analysis in molecular dynamics
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Lossless Compression of Variable-Precision Floating-Point Buffers on GPUs
Lossless data compression on GPGPU architectures
Lossless LZW Data Compression Algorithm on CUDA
Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level
Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation
Low Complexity Corner Detector Using CUDA for Multimedia Applications
Low cost approach to real-time vehicle to vehicle communication using parallel CPU and GPU processing
Low Latency Complex Event Processing on Parallel Hardware
Low latency photon mapping using block hashing
Low viscosity flow simulations for animation
Low-complexity Distributed Tomographic Backprojection for large datasets
Low-cost edge computing using upcycled smartphones
Low-cost, high-speed computer vision using NVIDIA’s CUDA architecture
Low-Frequency MLFMA on Graphics Processors
Titles: 100
open PDFs: 95
packages: 27