Papers on hgpu.org (.txt-file)
LookNN: Neural Network with No Multiplication

Loop Transformation Recipes for Code Generation and Auto-Tuning

LoopBench: An Evaluation of Loop Acceleration in Heterogeneous Systems

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

Loose capacity-constrained representatives for the qualitative visual analysis in molecular dynamics

Lossless Acceleration for Seq2seq Generation with Aggressive Decoding

Lossless Compression of Variable-Precision Floating-Point Buffers on GPUs

Lossless data compression on GPGPU architectures

Lossless LZW Data Compression Algorithm on CUDA

Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level

Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation

Low Complexity Corner Detector Using CUDA for Multimedia Applications

Low cost approach to real-time vehicle to vehicle communication using parallel CPU and GPU processing

Low Latency Complex Event Processing on Parallel Hardware

Low latency photon mapping using block hashing

Low viscosity flow simulations for animation

Low-complexity Distributed Tomographic Backprojection for large datasets

Low-cost edge computing using upcycled smartphones

Low-cost, high-speed computer vision using NVIDIA’s CUDA architecture

Low-Frequency MLFMA on Graphics Processors

Low-Impact Profiling of Streaming, Heterogeneous Applications

Low-Latency Elliptic Curve Scalar Multiplication

Low-latency Image Recognition with GPU-accelerated Convolutional Networks for Web-based Services

Low-overhead diskless checkpoint for hybrid computing systems

Low-Overhead Trace Collection and Profiling on GPU Compute Kernels

Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II

Low-power Task Scheduling for GPU Energy Reduction

LS-CAT: A Large-Scale CUDA AutoTuning Dataset

LTE Physical Layer Implementation Using GPU Based High Performance Computing

LTTng CLUST: A system-wide unified CPU and GPU tracing tool for OpenCL applications

LU Factorization for Accelerator-based Systems

LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

LU Factorization with Partial Pivoting for a Multicore System with Accelerators

LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware

LU, QR, and Cholesky factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi

LUDA: Boost LSM Key Value Store Compactions with GPUs

Luthier: Bridging Auto-Tuning and Vendor Libraries for Efficient Deep Learning Inference

Lynx: A Dynamic Instrumentation System for Data-Parallel Applications on GPGPU Architectures

Lyra2: Password Hashing Scheme with improved security against time-memory trade-offs

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability

Machine Learning Based Intrusion Detection in Controller Area Networks

Machine learning enhanced code optimization for high-level synthesis (ML-ECOHS)

Machine Learning for CUDA+MPI Design Rules

Machine Learning for Predictive Auto-Tuning with Boosted Regression Trees

Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters

Machine Learning from Streaming Data in Heterogeneous Computing Environments

Machine Learning in Compilers: Past, Present and Future

Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems

Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection

MacroSS: macro-SIMDization of streaming applications

Maestro: Data Orchestration and Tuning for OpenCL Devices

MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs

MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing

MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

Magneto-hydrodynamics simulation in astrophysics

Magnetohydrodynamics on Heterogeneous architectures: a performance comparison

Magnetohydrodynamics simulations on graphics processing units

Maintaining constant frame rates in 3D texture-based volume rendering

Makespan computation for GPU threads running on a single streaming multiprocessor

Making Human Connectome Faster: GPU Acceleration of Brain Network Analysis

Making the case of GPUs in courses on computational physics

MALBEC: a new CUDA-C ray-tracer in General Relativity

MambaCPU: Enhanced Correlation Mining with State Space Models for CPU Performance Prediction

Managing Extreme Heterogeneity in Next Generation HPC Systems

Managing heterogeneous device memory using C++17 memory resources

Managing Multi Instance GPUs for High Throughput and Energy Savings

Managing the Topology of Heterogeneous Cluster Nodes with Hardware Locality (hwloc)

Managing, Profiling, and Optimizing Heterogeneous GPU Workloads

Manas: Mining Software Repositories to Assist AutoML

ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills

Many Cores, Many Models: GPU Programming Model vs. Vendor Compatibility Overview

Many-body quantum chemistry on graphics processing units

Many-Core Algorithms for Combinatorial Optimization

Many-core algorithms for statistical phylogenetics

Many-core applications to online track reconstruction in HEP experiments

Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques

Many-core GPU computing with NVIDIA CUDA
Many-core parallel computing – Can compilers and tools do the heavy lifting?

Many-Core vs. Many-Thread Machines: Stay Away From the Valley

Many-Thread Aware Prefetching Mechanisms for GPGPU Applications

Many-threaded Differential Evolution on the GPU

Many-threaded implementation of differential evolution for the CUDA platform

Manycore high-performance computing in bioinformatics

Manycore processing of repeated k-NN queries over massive moving objects observations

Manycore processing of repeated range queries over massive moving objects observations

MAP-based Brain Tissue Segmentation using Manifold Learning and Hierarchical Max-Flow regularization

Map-reduce as a Programming Model for Custom Computing Machines

MapCG: writing parallel program portable between CPU and GPU

MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs

Mapping a Data-Flow Programming Model onto Heterogeneous Platforms

Mapping a Dataflow Programming Model onto Heterogeneous Architectures

Titles: 100
open PDFs: 98
packages: 23
