Papers on hgpu.org (.txt-file)
Directive-based Approach to Heterogeneous Computing
Directive-Based Compilers for GPUs
Directive-Based Data Partitioning and Pipelining and Auto-Tuning for High-Performance GPU Computing
Directive-Based Partitioning and Pipelining for Graphical Processing Units
Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs
Directives Based Programming of GPU Accelerated Systems
DISC: A Dynamic Shape Compiler for Machine Learning Workloads
Disc: Approximative Nearest Neighbor Search using Ellipsoids for Photon Mapping on GPUs
Discontinuous Galerkin Methods on Graphics Processing Units for Nonlinear Hyperbolic Conservation Laws
Discontinuous Galerkin Time Domain for Maxwell’s equations on GPUs
Discrete fourier transform on multicore
Discrete Planning Unit Look-ahead Velocity Control Strategy and Parallelization Research based on GPU
Discrete Shearlet Transform on GPU with Applications in Anomaly Detection and Denoising
Discrete Wavelet Transform on Consumer-Level Graphics Hardware
Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)
Discriminative Convolutional Sum-Product Networks on GPU
Dispersion Simulation and Visualization For Urban Security
Displacement Mapping on the GPU – State of the Art
Dissecting GPU Memory Hierarchy through Microbenchmarking
Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors
Dissecting the NVIDIA Hopper Architecture through Microbenchmarking and Multiple Level Analysis
Dissecting the NVidia Turing T4 GPU via Microbenchmarking
Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking
DISTAL: The Distributed Tensor Algebra Compiler
Distance field transform with an adaptive iteration method
Distance Fields Accelerated with OpenCL
Distance Threshold Similarity Searches on Spatiotemporal Trajectories using GPGPU
DistCL: A Framework for the Distributed Execution of OpenCL Kernels
Distortion correction algorithm for UAV remote sensing image based on CUDA
Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments
Distributed computer emulation: Using OpenCL framework
Distributed Deep Learning Strategies For Automatic Speech Recognition
Distributed genetic programming on GPUs using CUDA
Distributed GPU Password Cracking Research Project
Distributed GPU Volume Rendering of ASKAP Spectral Data Cubes
Distributed learning of CNNs on heterogeneous CPU/GPU architectures
Distributed Massive Model Rendering
Distributed multi-node, multi-GPU, heterogeneous system for 3D image reconstruction in Electrical Capacitance Tomography – network performance and application analysis
Distributed OpenCL Distributing OpenCL Platform on Network Scale
Distributed OpenCL: a platform for distributed, heterogeneous computing for domain scientists
Distributed OpenMP Offloading of OpenMC on Intel GPU MAX Accelerators
Distributed Password Cracking Platform
Distributed Texture Memory in a Multi-GPU Environment
Distributed time, conservative parallel logic simulation on GPUs
Distributed Training Large-Scale Deep Architectures
Distributed Training of Deep Neuronal Networks: Theoretical and Practical Limits of Parallel Scalability
Distributed wideband software-defined radio receiver for heterogeneous systems
Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability
Distributed, combined CPU and GPU profiling within HPX using APEX
Divergence Analysis and Optimizations
Divergence Analysis with Affine Constraints
Divide and Conquer G-Buffer Ray Tracing
Divide-and-Conquer 3D Convex Hulls on the GPU
DiVinE-CUDA – A Tool for GPU Accelerated LTL Model Checking
DjiNN and Tonic: DNN as a Service and Its Implications for Future Warehouse Scale Computers
DL: A data layout transformation system for heterogeneous computing
DLIO: A Data-Centric Benchmark for Scientific Deep Learning Applications
DLL: A Blazing Fast Deep Neural Network Library
DMA-Assisted, Intranode Communication in GPU Accelerated Systems
dMath: A Scalable Linear Algebra and Math Library for Heterogeneous GP-GPU Architectures
dMath: Distributed Linear Algebra for DL
DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL
DNN is not all you need: Parallelizing Non-Neural ML Algorithms on Ultra-Low-Power IoT Processors
DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-based CNN Accelerators
Doctor AI: Interpretable Deep Learning for Modeling Electronic Health Records
Document Classification Using KNN on GPU
Document Image Binarization Using Image Segmentation Algorithm in Parallel Environment
Document Stream Clustering using GPUs
Dogwild! – Distributed Hogwild for CPU & GPU
Domain Decomposition method on GPU cluster
Domain Specific Languages for High Performance Computing
Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation
Domain-Specific Code Language Models: Unraveling the Potential for HPC Codes and Tasks
Domain-Specific Languages for Heterogeneous Parallel Computing
Domain-Specific On-Device Object Detection Method
Domain-Specific Optimizations Supporting Real-Time Image Compression
DOPA: GPU-based protein alignment using database and memory access optimizations
dOpenCL – Evaluation of an API-Forwarding Implementation
Dopia: Online Parallelism Management for Integrated CPU/GPU Architectures
Double-Precision Floating-Point Data Visualizations Using Vulkan API
Double-precision FPUs in High-Performance Computing: an Embarrassment of Riches?
Dr.Jit: A Just-In-Time Compiler for Differentiable Rendering
Dragon-Alpha&cu32: A Java-based Tensor Computing Framework With its High-Performance CUDA Library
DRAM Scheduling Policy for GPGPU Architectures Based on a Potential Function
DRiVE: An Example of Distributed Rendering in Virtual Environments
Dropbear: Machine Learning Marketplaces made Trustworthy with Byzantine Model Agreement
Drug Drug Interaction Extraction from Biomedical Literature Using Syntax Convolutional Neural Network
DSDP: A Blind Docking Strategy Accelerated by GPUs
DSPSR: Digital Signal Processing Software for Pulsar Astronomy
DTAM: Dense tracking and mapping in real-time
Dual-RBF based surface reconstruction
Duality based optical flow algorithms with applications
DUODECIM – a structure for point scan compression and rendering
Dust-Dust Collisional Charging and Lightning in Protoplanetary Discs
Dwarfs on Accelerators: Enhancing OpenCL Benchmarking for Heterogeneous Computing Architectures
Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems
Dymaxion++: A Directive-based API to Optimize Data Layout and Memory Mapping for Heterogeneous Systems
Titles: 100
open PDFs: 97
packages: 22