Papers on hgpu.org (.txt-file)
Long time-scale simulations of in vivo diffusion using GPU hardware
Long Timestep Molecular Dynamics on the Graphical Processing Unit
Long-time Simulations with Complex Code Using Multiple Nodes of Intel Xeon Phi Knights Landing
Loo.py: From Fortran to performance via transformation and substitution rules
Loo.py: transformation-based code generation for GPUs and CPUs
Looking at the surprise: Bottom-up attentional control of an active camera system
LookNN: Neural Network with No Multiplication
Loop Transformation Recipes for Code Generation and Auto-Tuning
LoopBench: An Evaluation of Loop Acceleration in Heterogeneous Systems
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers
Loose capacity-constrained representatives for the qualitative visual analysis in molecular dynamics
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Lossless Compression of Variable-Precision Floating-Point Buffers on GPUs
Lossless data compression on GPGPU architectures
Lossless LZW Data Compression Algorithm on CUDA
Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level
Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation
Low Complexity Corner Detector Using CUDA for Multimedia Applications
Low cost approach to real-time vehicle to vehicle communication using parallel CPU and GPU processing
Low Latency Complex Event Processing on Parallel Hardware
Low latency photon mapping using block hashing
Low viscosity flow simulations for animation
Low-complexity Distributed Tomographic Backprojection for large datasets
Low-cost, high-speed computer vision using NVIDIA’s CUDA architecture
Low-Frequency MLFMA on Graphics Processors
Low-Impact Profiling of Streaming, Heterogeneous Applications
Low-Latency Elliptic Curve Scalar Multiplication
Low-latency Image Recognition with GPU-accelerated Convolutional Networks for Web-based Services
Low-overhead diskless checkpoint for hybrid computing systems
Low-Overhead Trace Collection and Profiling on GPU Compute Kernels
Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II
Low-power Task Scheduling for GPU Energy Reduction
LS-CAT: A Large-Scale CUDA AutoTuning Dataset
LTE Physical Layer Implementation Using GPU Based High Performance Computing
LTTng CLUST: A system-wide unified CPU and GPU tracing tool for OpenCL applications
LU Factorization for Accelerator-based Systems
LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System
LU Factorization with Partial Pivoting for a Multicore System with Accelerators
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
LU, QR, and Cholesky factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi
LUDA: Boost LSM Key Value Store Compactions with GPUs
Lynx: A Dynamic Instrumentation System for Data-Parallel Applications on GPGPU Architectures
Lyra2: Password Hashing Scheme with improved security against time-memory trade-offs
MACC: An OpenACC Transpiler for Automatic Multi-GPU Use
Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey
Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability
Machine Learning Based Intrusion Detection in Controller Area Networks
Machine learning enhanced code optimization for high-level synthesis (ML-ECOHS)
Machine Learning for CUDA+MPI Design Rules
Machine Learning for Predictive Auto-Tuning with Boosted Regression Trees
Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters
Machine Learning from Streaming Data in Heterogeneous Computing Environments
Machine Learning in Compilers: Past, Present and Future
Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
MacroSS: macro-SIMDization of streaming applications
Maestro: Data Orchestration and Tuning for OpenCL Devices
MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs
MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing
Magneto-hydrodynamics simulation in astrophysics
Magnetohydrodynamics on Heterogeneous architectures: a performance comparison
Magnetohydrodynamics simulations on graphics processing units
Maintaining constant frame rates in 3D texture-based volume rendering
Makespan computation for GPU threads running on a single streaming multiprocessor
Making Human Connectome Faster: GPU Acceleration of Brain Network Analysis
Making the case of GPUs in courses on computational physics
MALBEC: a new CUDA-C ray-tracer in General Relativity
MambaCPU: Enhanced Correlation Mining with State Space Models for CPU Performance Prediction
Managing Extreme Heterogeneity in Next Generation HPC Systems
Managing heterogeneous device memory using C++17 memory resources
Managing the Topology of Heterogeneous Cluster Nodes with Hardware Locality (hwloc)
Managing, Profiling, and Optimizing Heterogeneous GPU Workloads
Manas: Mining Software Repositories to Assist AutoML
ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills
Many Cores, Many Models: GPU Programming Model vs. Vendor Compatibility Overview
Many-body quantum chemistry on graphics processing units
Many-Core Algorithms for Combinatorial Optimization
Many-core algorithms for statistical phylogenetics
Many-core applications to online track reconstruction in HEP experiments
Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques
Many-core GPU computing with NVIDIA CUDA
Many-core parallel computing – Can compilers and tools do the heavy lifting?
Many-Core vs. Many-Thread Machines: Stay Away From the Valley
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
Many-threaded Differential Evolution on the GPU
Many-threaded implementation of differential evolution for the CUDA platform
Manycore high-performance computing in bioinformatics
Manycore processing of repeated k-NN queries over massive moving objects observations
Manycore processing of repeated range queries over massive moving objects observations
MAP-based Brain Tissue Segmentation using Manifold Learning and Hierarchical Max-Flow regularization
Map-reduce as a Programming Model for Custom Computing Machines
Titles: 100
open PDFs: 98
packages: 23