Papers on hgpu.org (.txt-file)
TrimZero: A Torch Recurrent Module for Efficient Natural Language Processing
Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations
True 4D Image Denoising on the GPU
TTC: A Tensor Transposition Compiler for Multiple Architectures
TuCCompi: A Multi-Layer Programing Model for Heterogeneous Systems with Auto-Tuning Capabilities
Tuned and asynchronous stencil kernels for CPU/GPU systems (thesis)
Tuned and GPU-accelerated parallel data mining from comparable corpora
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
Tuning a Finite Difference Computation for Parallel Vector Processors
Tuning Manifold Harmonics Filters
Tuning Stencil Codes in OpenCL for FPGAs
Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach
Turbo Bayesian Compressed Sensing
Tutorial 3: Methodologies and Performance Impacts of General Purpose Computing on GPUs
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
TVM: End-to-End Optimization Stack for Deep Learning
Two Algorithms for Sorting On Heterogeneous Clusters
Two Approaches to Particle Simulation: OpenMPI and CUDA
Two improved GPU acceleration strategies for force-directed graph layout
Two Level Approach to Efficient Visualization of Protein Dynamics
Two Simple Single-pass GPU methods for Multi-channel Surface Voxelization of Dynamic Scenes
Two Stage Data Mining Technique for Fast Monsoon Onset Prediction
Two-electron integral evaluation on the graphics processor unit
Two-fluid compressible simulations on GPU cluster
Two-Level Approach to Efficient Visualization of Protein Dynamics
Two-stage compression for fast volume rendering of time-varying scalar data
Two-way partitioning of a recursive Gaussian filter in CUDA
Two-Way Real Time Fluid Simulation Using a Heterogeneous Multicore CPU and GPU Architecture
Type-safe Runtime Code Generation: Accelerate to LLVM
U-Net: Convolutional Networks for Biomedical Image Segmentation
UAV Path Planning with Parallel Genetic Algorithms on CUDA Architecture
uBench: Performance Impact of CUDA Block Geometry
UberFlow: a GPU-based particle engine
Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford
UCHPC – UnConventional High Performance Computing for Finite Element Simulations
Ultra-Fast Detection of Higher-Order Epistatic Interactions on GPUs
Ultra-Fast Displaying Spectral Domain Optical Doppler Tomography System Using a Graphics Processing Unit
Ultra-fast FFT protein docking on graphics processors
Ultra-Fast Hybrid CPU-GPU Multiple Scatter Simulation for 3D PET
Ultra-fast treatment plan optimization for volumetric modulated arc therapy (VMAT)
Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml
Ultrasound goes GPU: real-time simulation using CUDA
Ultrasound Image Simulation with GPU-based Ray Tracing
Uncertainty-Aware Guided Volume Segmentation
Uncluttering Graph Layouts Using Anisotropic Diffusion and Mass Transport
Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units
Under the Hood of SYCL – An Initial Performance Analysis With an Unstructured-mesh CFD Application
Understanding and Modeling the Synchronization Cost in the GPU Architecture
Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric
Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures
Understanding GPU Triggering APIs for MPI+X Communication
Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations
Understanding Latency Hiding on GPUs
Understanding Performance Portability of Bioinformatics Applications in SYCL on an NVIDIA GPU
Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models
Understanding software approaches for GPGPU reliability
Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors
Understanding the design trade-offs among current multicore systems for numerical computations
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
Understanding the efficiency of ray traversal on GPUs
Understanding the impact of CUDA tuning techniques for Fermi
Understanding the Impact of Hybrid Programming on Software Energy Efficiency
Understanding the Impact of Input Entropy on FPU, CPU, and GPU Power
Understanding the ISA impact on GPU Architecture
Understanding the Performance of HPC Applications
Understanding the Power of Evolutionary Computation for GPU Code Optimization
Understanding the SIMD Efficiency of Graph Traversal on GPU
Understanding the Topics and Challenges of GPU Programming by Classifying and Analyzing Stack Overflow Posts
Unfolding and Shrinking Neural Machine Translation Ensembles
UNICORN: A Bulk Synchronous Programming Model, Framework and Runtime for Hybrid CPU-GPU Clusters
Unified – A Sharp Turn in the Latest Era of Graphic Processors
Unified Deep Learning with CPU, GPU, and FPGA Technologies
Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment
Unified Parallel C for GPU Clusters: Language Extensions and Compiler Implementation
Unified Particle Physics for Real-Time Applications
Unified Shader Programming in C++
Unified Shared Memory: Friend or Foe?
Unified system of code transformation and execution for heterogeneous multi-core architectures
Unified Tables for Exponential and Logarithm Families
UniFL: Accelerating Federated Learning Using Heterogeneous Hardware Under a Unified Framework
Uniform partitioning of Monte Carlo radiosity on GPUs
Unifying stream based and reconfigurable computing to design application accelerators
Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization case study
Unlocking Bandwidth for GPUs in CC-NUMA Systems
Unsafe Floating-point to Unsigned Integer Casting Check for GPU Programs
Unstructured grid applications on GPU: performance analysis and improvement
Unsupervised Asset Cluster Analysis Implemented with Parallel Genetic Algorithms on the NVIDIA CUDA Platform
Unsupervised Deep Learning of Incompressible Fluid Dynamics
Unsupervised Markovian Segmentation on Graphics Hardware
Up to 700k GPU cores, Kepler, and the Exascale future for simulations of star clusters around black holes
UPC on MIC: Early Experiences with Native and Symmetric Modes
Urban Regional Seismic Damage Prediction Based On GPU-CPU Hybrid Computing
Usable assembly language for GPUs: a success story
Use NVIDIA CUDA technology to create genetic algorithms with extensive population
Titles: 100
open PDFs: 91
packages: 21