Papers on hgpu.org (.txt-file)
Development of Krylov and AMG linear solvers for large-scale sparse matrices on GPUs

Development of methods for the processing of mining images using genetic algorithms

Development of nonlinear filter bank system for real-time beautification of facial video using GPGPU
Development of Parallel Architectures for Radar/Video Signal Processing Applications

Development of Parallel Computation Tools

Development of Virtual Machine Tool for Simulation and Evaluation

Developmental Directions in Parallel Accelerators

Device Placement Optimization with Reinforcement Learning

Device specialization in heterogeneous multi-GPU environments

Devito: automated fast finite difference computation

DFG Implementation on Multi GPU Cluster with Computation-Communication Overlap

DGEMM on Integer Matrix Multiplication Unit

DGEMM without FP64 Arithmetic – using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme

Diagnosing Performance Bottlenecks in HPC Applications

Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method

Diagrammatic Determinantal Quantum Monte Carlo Calculations on GPUs

DIANNE: Distributed Artificial Neural Networks for the Internet of Things

Diderot: A Parallel DSL for Image Analysis and Visualization

Different Optimization Strategies and Performance Evaluation of Reduction on Multicore CUDA Architecture

Differential evolution algorithm on the GPU with C-CUDA
Differential Evolution with parallelised objective functions using CUDA

Diffusion Curves: A Vector Representation for Smooth-Shaded Images
Digital beamforming using a GPU

Digital Marbling: a GPU Approach with Precomputed Velocity Field

Digital Signal Processing using Stream High Performance Computing: A 512-input Broadband Correlator for Radio Astronomy

Digitize Your Body and Action in 3-D at Over 10 FPS: Real Time Dense Voxel Reconstruction and Marker-less Motion Tracking via GPU Acceleration

Diplomat: Mapping of multi-kernel applications using a static dataflow abstraction

Direct Communication Methods for Distributed GPUs

Direct deconvolution of radio synthesis images using L1 minimisation

Direct evaluation of NURBS curves and surfaces on the GPU

Direct GPU Compilation and Execution for Host Applications with OpenMP Parallelism

Direct GPU/FPGA Communication Via PCI Express

Direct N-body code on low-power embedded ARM GPUs

Direct N-body Kernels for Multicore Platforms

Direct N-body simulations of globular clusters: (I) Palomar 14

Direct Numeric Simulation of Sheared Convective Boundary Layer Entrainment with GPUs

Direct Numerical Simulation and Large Eddy Simulation on a Turbulent Wall-Bounded Flow Using Lattice Boltzmann Method and Multiple GPUs

Direct numerical simulation of sub-grid structures in gas-solid flow — GPU implementation of macro-scale pseudo-particle modeling

Direct Numerical Simulation of Turbulence on Heterogenous Computer Systems: Architectures, Algorithms, and Applications

Direct Numerical Simulation of Turbulent Flows with Parallel Algorithms for Various Computing Architectures

Direct Self-Consistent Field Computations on GPU Clusters
Direct solution of the Boltzmann equation for a binary mixture on GPUs

Direct Visualization of Particle-Partition of Unity Data

Direct-to-indirect transfer for cinematic relighting

directCell: hybrid systems with tightly coupled accelerators

Directionally Unsplit Hydrodynamic Schemes with Hybrid MPI/OpenMP/GPU Parallelization in AMR

Directive-based Approach to Heterogeneous Computing

Directive-Based Compilers for GPUs

Directive-Based Data Partitioning and Pipelining and Auto-Tuning for High-Performance GPU Computing

Directive-Based Partitioning and Pipelining for Graphical Processing Units

Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs

Directives Based Programming of GPU Accelerated Systems

DISC: A Dynamic Shape Compiler for Machine Learning Workloads

Disc: Approximative Nearest Neighbor Search using Ellipsoids for Photon Mapping on GPUs

Discontinuous Galerkin Methods on Graphics Processing Units for Nonlinear Hyperbolic Conservation Laws

Discontinuous Galerkin Time Domain for Maxwell’s equations on GPUs
Discrete fourier transform on multicore

Discrete Planning Unit Look-ahead Velocity Control Strategy and Parallelization Research based on GPU

Discrete Shearlet Transform on GPU with Applications in Anomaly Detection and Denoising

Discrete Wavelet Transform on Consumer-Level Graphics Hardware

Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)

Discriminative Convolutional Sum-Product Networks on GPU

Dispersion Simulation and Visualization For Urban Security

Displacement Mapping on the GPU – State of the Art

Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs

Dissecting GPU Memory Hierarchy through Microbenchmarking

Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors

Dissecting the NVIDIA Blackwell Architecture with Microbenchmarks

Dissecting the NVIDIA Hopper Architecture through Microbenchmarking and Multiple Level Analysis

Dissecting the NVidia Turing T4 GPU via Microbenchmarking

Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking

DISTAL: The Distributed Tensor Algebra Compiler

Distance field transform with an adaptive iteration method

Distance Fields Accelerated with OpenCL

Distance Threshold Similarity Searches on Spatiotemporal Trajectories using GPGPU

DistCL: A Framework for the Distributed Execution of OpenCL Kernels

Distortion correction algorithm for UAV remote sensing image based on CUDA

Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments

Distributed computer emulation: Using OpenCL framework
Distributed Deep Learning Strategies For Automatic Speech Recognition

Distributed genetic programming on GPUs using CUDA

Distributed GPU Password Cracking Research Project

Distributed GPU Volume Rendering of ASKAP Spectral Data Cubes

Distributed learning of CNNs on heterogeneous CPU/GPU architectures

Distributed Massive Model Rendering

Distributed multi-node, multi-GPU, heterogeneous system for 3D image reconstruction in Electrical Capacitance Tomography – network performance and application analysis

Distributed OpenCL Distributing OpenCL Platform on Network Scale

Distributed OpenCL: a platform for distributed, heterogeneous computing for domain scientists

Distributed OpenMP Offloading of OpenMC on Intel GPU MAX Accelerators

Distributed Password Cracking Platform

Distributed Texture Memory in a Multi-GPU Environment

Distributed time, conservative parallel logic simulation on GPUs

Distributed Training Large-Scale Deep Architectures

Distributed Training of Deep Neuronal Networks: Theoretical and Practical Limits of Parallel Scalability

Distributed wideband software-defined radio receiver for heterogeneous systems

Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability

Distributed, combined CPU and GPU profiling within HPX using APEX

Titles: 100
open PDFs: 94
packages: 15
