Papers on hgpu.org (.txt-file)
Java with Auto-Parallelization on Graphics Coprocessing Architecture
JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA
JIT-Compilation for Interactive Scientific Visualization
Jit4OpenCL: a compiler from Python to OpenCL
Job Parallelism using Graphical Processing Unit individual Multi-Processors and Highly Localised Memory
Job Parallelism using Graphical Processing Unit Individual Multi-Processors and Localised Memory
Join Algorithms on GPUs: A Revisit After Seven Years
Join Execution Using Fragmented Columnar Indices on GPU and MIC
Joint Forces: From Multithreaded Programming to GPU Computing
Joint-MAP Tomographic Reconstruction with Patch Similarity Based Mixture Prior Model
JPEG 2000 Wireless Image Transmission System using Encryption Domain Authentication
JPEG-GPU:: a GPGPU Implementation of JPEG Core Coding Systems
JSDoop and TensorFlow.js: Volunteer Distributed Web Browser-Based Neural Network Training
Julia as a unifying end-to-end workflow language on the Frontier exascale system
Jump flooding in GPU with applications to Voronoi diagram and distance transform
Just-in-time Acceleration of JavaScript
Just-in-Time Compilation and Link-Time Optimization for OpenMP Target Offloading
K-Means on Commodity GPUs with CUDA
K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching
k+-buffer: Fragment Synchronized k-buffer
K3 Moore’s Law in the Era of GPU Computing
KAdvice: infering synchronization patterns from an existing codebase
KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks
Kalman Filter Tracking on Parallel Architectures
Kalman-Filter-Based Particle Tracking on Parallel Architectures at Hadron Colliders
kANN on the GPU with Shifted Sorting
Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras
Kargus: a Highly-scalable Software-based Intrusion Detection System
KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators
Kd-Jump: a Path-Preserving Stackless Traversal for Faster Isosurface Raytracing on GPUs
KD-tree acceleration structures for a GPU raytracer
Kd-tree Based N-Body Simulations with Volume-Mass Heuristic on the GPU
kEDM: A Performance-portable Implementation of Empirical Dynamic Modeling using Kokkos
Keeneland: Bringing heterogeneous GPU computing to the computational science community
Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU
Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications
Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)
Kernel Tuner: A search-optimizing GPU code auto-tuner
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
Kernel-as-a-Service: A Serverless Interface to GPUs
Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
KERNELGEN – A Toolchain for Automatic GPU-centric Applications Porting
KernelGen – the design and implementation of a next generation compiler platform for accelerating numerical models on GPUs
KernelInterceptor: automating GPU kernel verification by intercepting kernels and their parameters
Kernelized Renyi distance for speaker recognition
KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications
Key derivation functions and their GPU implementation
Key Reconciliation with Low-Density Parity-Check Codes for Long-Distance Quantum Cryptography
Keynote address: Immersive exploration of large datasets
KFusion: Obtaining Modularity and Performance with Regards to General Purpose GPU Computing and Co-processors
Kinematic Modelling of Disc Galaxies using Graphics Processing Units
Kinetics of liquid-solid phase transition in large nickel clusters
Kite: Braided Parallelism for Heterogeneous Systems
KLARAPTOR: A Tool for Dynamically Finding Optimal Kernel Launch Parameters Targeting CUDA Programs
kNN Query Processing in Metric Spaces Using GPUs
Kokkos: Enabling performance portability across manycore architectures
Krylov Subspace Accelerated Algebraic Multigrid for Mimetic Finite Differences on GPUs
KUDA: GPU Accelerated Split Race Checker
LAMDA: Learning-Assisted Multi-Stage Autotuning for FPGA Design Closure
LAMMPS’ PPPM Long-Range Solver for the Second Generation Xeon Phi
LAMMPScuda – a new GPU accelerated Molecular Dynamics Simulations Package and its Application to Ion-Conducting Glasses
Landau Gauge Fixing on GPUs and String Tension
Langevin dynamics simulations of biomolecules on graphics processors
Language Modeling with Gated Convolutional Networks
Language virtualization for heterogeneous parallel computing
Large calculation of the flow over a hypersonic vehicle using a GPU
Large data visualization on distributed memory multi-GPU clusters
Large Integer Arithmetic in GPU for Cryptography
Large neighborhood local search optimization on graphics processing units
Large scale 3D shape retrieval by exploiting multi-core and GPU
Large Scale Artificial Neural Network Training Using Multi-GPUs
Large Scale Bioinformatics Data Mining with Parallel Genetic Programming on Graphics Processing Units
Large Scale DNA Sequence Alignment and Kernel Method Implemented with GPUs
Large Scale Finite Element Analysis Using GPU Parallel Computing
Large Scale GPU Accelerated PPMLR-MHD Simulations for Space Weather Forecast
Large Scale GPU Based Simulations of Turbulent Bubbly Flow in a Square Duct
Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
Large Scale Monte Carlo Tree Search on GPU
Large scale parallel state space search utilizing graphics processing units and solid state disks
Large Scale Physical Modeling Sound Synthesis
Large Scale Plane Wave Pseudopotential Density Functional Theory Calculations on GPU Clusters
Large Scale Simulations of the Euler Equations on GPU Clusters
Large Speed Increase Using Novel GPU Based Algorithms to Simulate Cardiac Excitation Waves in a Rabbit Ventricle
Large steps in GPU-based deformable bodies simulation
Large-eddy simulations with ClimateMachine: a new open-source code for atmospheric simulations on GPUs and CPUs
Large-Scale Compute-Intensive Analysis via a Combined In-Situ and Co-Scheduling Workflow Approach
Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations
Large-Scale Deep Learning on the YFCC100M Dataset
Large-scale deep unsupervised learning using graphics processors
Large-Scale DNS of Gas-Solid Flow on Mole-8.5
Large-scale ferrofluid simulations on graphics processing units
Large-scale FFT on GPU clusters
Titles: 100
open PDFs: 92
packages: 29