Papers on hgpu.org (.txt-file)
Live Migration for OpenCL FPGA Accelerators
Live Migration of FPGA Applications
Living Flows: Enhanced Exploration of Edge-Bundled Graphs Based on GPU-Intensive Edge Rendering
LLload: An Easy-to-Use HPC Utilization Tool
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLVM-based automation of memory decoupling for OpenCL applications on FPGAs
LN-Annote: An Alternative Approach to Information Extraction from Emails using Locally-Customized Named-Entity Recognition
LNA: Fast Protein Classification Using A Laplacian Characterization of Tertiary Structure
LO-SpMM: Low-cost Search for High-performance SpMM Kernels on GPUs
Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization
Load Balancing for Constraint Solving with GPUs
Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability
Load Balancing in Data Warehouse – Evolution and Perspectives
Load Balancing Utilizing Data Redundancy in Distributed Volume Rendering
Load-Balanced Multi-GPU Ambient Occlusion for Direct Volume Rendering
Local Alignment Tool Based on Hadoop Framework and GPU Architecture
Local Histogram Modification Based Contrast Enhancement with GPU Acceleration
Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid
Local Search Algorithms on Graphics Processing Units. A Case Study: The Permutation Perceptron Problem
Local Volatility FX Basket Option on CPU and GPU
Local vs. Global Optimization: Operator Placement Strategies in Heterogeneous Environments
Locality Analysis for Characterizing Applications Based on Sparse Matrices
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
Locality Aware Work-Stealing Based Scheduling in Hybrid CPU-GPU
Locality optimization on a NUMA architecture for hybrid LU factorization
Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives
Locality-Aware Mapping of Nested Parallel Patterns on GPUs
Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model
Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures
LocalityGuru: A PTX Analyzer for Extracting Thread Block-level Locality in GPGPUs
Locally-Oriented Programming: A Simple Programming Model for Stencil-Based Computations on Multi-Level Distributed Memory Architectures
Location-based Matching in Publish/Subscribe Revisited
LOD Terrain Rendering by Local Parallel Processing on GPU
Log File Regular Expression Pattern Matching And Capture With GPUs
LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment
LoGV: Low-overhead GPGPU Virtualization
Long time-scale simulations of in vivo diffusion using GPU hardware
Long Timestep Molecular Dynamics on the Graphical Processing Unit
Long-time Simulations with Complex Code Using Multiple Nodes of Intel Xeon Phi Knights Landing
Loo.py: From Fortran to performance via transformation and substitution rules
Loo.py: transformation-based code generation for GPUs and CPUs
Looking at the surprise: Bottom-up attentional control of an active camera system
LookNN: Neural Network with No Multiplication
Loop Transformation Recipes for Code Generation and Auto-Tuning
LoopBench: An Evaluation of Loop Acceleration in Heterogeneous Systems
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers
Loose capacity-constrained representatives for the qualitative visual analysis in molecular dynamics
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Lossless Compression of Variable-Precision Floating-Point Buffers on GPUs
Lossless data compression on GPGPU architectures
Lossless LZW Data Compression Algorithm on CUDA
Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level
Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation
Low Complexity Corner Detector Using CUDA for Multimedia Applications
Low cost approach to real-time vehicle to vehicle communication using parallel CPU and GPU processing
Low Latency Complex Event Processing on Parallel Hardware
Low latency photon mapping using block hashing
Low viscosity flow simulations for animation
Low-complexity Distributed Tomographic Backprojection for large datasets
Low-cost, high-speed computer vision using NVIDIA’s CUDA architecture
Low-Frequency MLFMA on Graphics Processors
Low-Impact Profiling of Streaming, Heterogeneous Applications
Low-Latency Elliptic Curve Scalar Multiplication
Low-latency Image Recognition with GPU-accelerated Convolutional Networks for Web-based Services
Low-overhead diskless checkpoint for hybrid computing systems
Low-Overhead Trace Collection and Profiling on GPU Compute Kernels
Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II
Low-power Task Scheduling for GPU Energy Reduction
LS-CAT: A Large-Scale CUDA AutoTuning Dataset
LTE Physical Layer Implementation Using GPU Based High Performance Computing
LTTng CLUST: A system-wide unified CPU and GPU tracing tool for OpenCL applications
LU Factorization for Accelerator-based Systems
LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System
LU Factorization with Partial Pivoting for a Multicore System with Accelerators
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
LU, QR, and Cholesky factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi
LUDA: Boost LSM Key Value Store Compactions with GPUs
Lynx: A Dynamic Instrumentation System for Data-Parallel Applications on GPGPU Architectures
Lyra2: Password Hashing Scheme with improved security against time-memory trade-offs
MACC: An OpenACC Transpiler for Automatic Multi-GPU Use
Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey
Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability
Machine Learning Based Intrusion Detection in Controller Area Networks
Machine learning enhanced code optimization for high-level synthesis (ML-ECOHS)
Machine Learning for CUDA+MPI Design Rules
Machine Learning for Predictive Auto-Tuning with Boosted Regression Trees
Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters
Machine Learning from Streaming Data in Heterogeneous Computing Environments
Machine Learning in Compilers: Past, Present and Future
Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Titles: 100
open PDFs: 97
packages: 20