Papers on hgpu.org (.txt-file)
Deep Neural Networks to Enable Real-time Multimessenger Astrophysics

Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups

Deep Shadow Maps from Volumetric Data on the GPU

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Deep Tensor Convolution on Multicores

Deep Voice 3: 2000-Speaker Neural Text-to-Speech

Deep Voice: Real-time Neural Text-to-Speech

Deep-Edge: An Efficient Framework for Deep Learning Model Update on Heterogeneous Edge

Deep, Big, Simple Neural Nets for Handwritten Digit Recognition

Deep, Dense, and Low-Rank Gaussian Conditional Random Fields

DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators

DeepBach: a Steerable Model for Bach chorales generation

DeepBE: Learning Deep Binary Encoding for Multi-Label Classification

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning

DeeperLab: Single-Shot Image Parser

DeepfakeUCL: Deepfake Detection via Unsupervised Contrastive Learning

DeepFont: Identify Your Font from An Image

DeepLearningKit – an GPU Optimized Deep Learning Framework for Apple’s iOS, OS X and tvOS developed in Metal and Swift

DeepLearningKit – an Open Source Deep Learning Framework for Apple’s iOS, OS X and tvOS developed in Metal and Swift

DeepMetabolism: A Deep Learning System to Predict Phenotype from Genome Sequencing

DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications

DeepProf: Performance Analysis for Deep Learning Applications via Mining GPU Execution Patterns

DeepPy: Pythonic deep learning

DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence

DeepSmith: Compiler Fuzzing through Deep Learning

DeepSpark: Spark-Based Deep Learning Supporting Asynchronous Updates and Caffe Compatibility

DeepSpeech: Scaling up end-to-end speech recognition

DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression

DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices

DEF-G: Declarative Framework for GPU Environment

Defocus Magnification with CUDA

Deformable model collision detection using A-buffer

Deformable object simulation in virtual environment
Deformation modeling using global medial representation structures and evaluation by biset mesh matching

Deformation of skeleton based implicit objects

Deforming a High-Resolution Mesh in Real-Time by Mapping onto a Low-Resolution Physical Model

Delaunay Triangulation in R3 on the GPU

Delivering Performance-Portable Stencil Computations on CPUs and GPUs Using Bricks

Delta-stepping: a parallelizable shortest path algorithm
DEM based simulation of concrete structures on GPU

Democratic Population Decisions Result in Robust Policy-Gradient Learning: A Parametric Study with GPU Simulations

Democratizing General Purpose GPU Programming through OpenCL and Scala

Demonstrating Self-Learning Algorithm Adaptivity in a Hardware-Oblivious Database Engine

Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs

Demystifying Dependency Bugs in Deep Learning Stack

Demystifying GPU microarchitecture through microbenchmarking

Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms

Demystifying the MLPerf Benchmark Suite

Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis

Denoising Volumetric Data on GPU

Dense and sparse parallel linear algebra algorithms on graphics processing units

Dense Arithmetic over Finite Fields with the CUMODP Library

Dense Dynamic Programming on Multi GPU

Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach

Dense linear algebra solvers for multicore with GPU accelerators

Dense Matrix Algebra on the GPU

Dense Matrix Computation on a Heterogenous Architecture: A Block Synchronous Approach

Dense optical flow by iterative local window registration
Dense photometric stereo reconstruction on many core GPUs

Dense Photometric Stereo: A Markov Random Field Approach

Dense point trajectories by GPU-accelerated large displacement optical flow

Dense Real-Time Mapping of Object-Class Semantics from RGB-D Video

Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures

DenseCut: Densely Connected CRFs for Realtime GrabCut

Density Estimations for Approximate Query Processing on SIMD Architectures

Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures

Density Functional Theory calculation on many-cores hybrid CPU-GPU architectures

Density-based clustering using graphics processors

Density-based parallel skin lesion border detection with webCL

Deploying Graph Algorithms on GPUs: an Adaptive Solution

Deployment of CPU and GPU-based genetic programming on heterogeneous devices
Deployment of parallel linear genetic programming using GPUs on PC and video game console platforms

Depth Estimation using Open Compute Language (OpenCL)

Depth Images: Representations and Real-Time Rendering

Depth Map Based Superresolution Method in 3D Reconstruction

Depth map enhanced macroblock partitioning for H.264 video coding of computer graphics content

Depth-Dependent Halos: Illustrative Rendering of Dense Line Data

Depth-First Search versus Jurema Search on GPU Branch-and-Bound Algorithms: a case study

Depth-of-Field Blur Effects for First-Person Navigation in Virtual Environments

Deriving Shape Grammars on the GPU

Descend: A Safe GPU Systems Programming Language

Design and Analysis of Soft-Error Resilience Mechanisms for GPU Register File

Design and Development of an Efficient H. 264 Video Encoder for CPU/GPU using OpenCL

Design and Development of Optical Flow Based Obstacle Avoidance Using CUDA

Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU
Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures

Design and implementation of a high-performance stream-based computing platform on multigenerational GPUs

Design and Implementation of a PTX Emulation Library

Design and Implementation of Centrally-Coordinated Peer-to-Peer Live-streaming

Design and Implementation of CNN-FPGA accelerator based on Open Computing Language

Design and Implementation of GPU-Based Prim’s Algorithm

Design and implementation of MPEG audio layer III decoder using graphics processing units
Design and Implementation of ShenWei Universal C/C++

Design and implementation of software-managed caches for multicores with local memory

Design and Implementation of the Futhark Programming Language

Titles: 100
open PDFs: 92
packages: 26
