Papers on hgpu.org (.txt-file)
Towards Portable Performance for Explicit Hydrodynamics Codes
Towards Porting a Real-World Seismological Application to the Intel MIC Architecture
Towards Predictable Real-Time Performance on Multi-Core Platforms
Towards Rapid Prototyping of Parallel and HPC Applications (GPU Focus)
Towards real time 2D to 3D registration for ultrasound-guided endoscopic and laparoscopic procedures
Towards real time 3D tracking and reconstruction on a GPU using Monte Carlo simulations
Towards real time vision based UUV navigation using GPU technology
Towards real-time radiation therapy: GPU accelerated superposition/convolution
Towards real-time tomography: Fast reconstruction algorithms and GPU implementation
Towards reverse engineering the brain: Modeling abstractions and simulation frameworks
Towards robust automatic detection of vulnerable road users: monocular pedestrian tracking from a moving vehicle
Towards scalar synchronization in SIMT architectures
Towards shared memory consistency models for GPUs
Towards solving the Table Maker’s Dilemma on GPU
Towards systematic exploration of tradeoffs for medical image registration on heterogeneous platforms
Towards Understanding and Mitigating Memory-Access Challenges in Computing Systems
Towards Unified Analysis of GPU Consistency
Towards Unified INT8 Training for Convolutional Neural Network
Towards user transparent parallel multimedia computing on GPU-clusters
Towards Utilizing GPUs in Information Visualization: A Model and Implementation of Image-Space Operations
Towards Utilizing Remote GPUs for CUDA Program Execution
TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
Track finding in ATLAS using GPUs
Tracking 3d Pose of Rigid Object by Sparse Template Matching
Tracking and Clustering Salient Features in Image Sequences
Tracking humans interacting with the environment using efficient hierarchical sampling and layered observation models
Tracking Many Solution Paths of a Polynomial Homotopy on a Graphics Processing Unit
Tradeoff analysis and optimization of power delivery networks with on-chip voltage regulation
Tradeoffs in designing accelerator architectures for visual computing
Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration
Training a Feedback Loop for Hand Pose Estimation
Training a Vision Transformer from scratch in less than 24 hours with 1 GPU
Training DNN Models over Heterogeneous Clusters with Optimal Performance
Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)
Training Neural Networks Without Gradients: A Scalable ADMM Approach
Tranformation of CPU-based Applications To Leverage on Graphics Processors using CUDA
TransAxx: Efficient Transformers with Approximate Computing
TransCAIP: A Live 3D TV System Using a Camera Array and an Integral Photography Display with Interactive Control of Viewing Parameters
Transfer Time Reduction of Data Transfers between CPU and GPU
Transform Coding for Hardware-accelerated Volume Rendering
Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI multi GPU Backends with Subdomain Support
Transformations of High-Level Synthesis Codes for High-Performance Computing
Transforming and Optimizing Irregular Applications for Parallel Architectures
Transforming C OpenMP Programs for Verification in CIVL
Translating GPU binaries to tiered SIMD architectures with Ocelot
Translating OpenMP Device Constructs to OpenCL using Unnecessary Data Transfer Elimination
Translation-invariant two-dimensional discrete wavelet transform on graphics processing units
Transparent Acceleration for Heterogeneous Platforms With Compilation to OpenCL
Transparent Acceleration of Java-based Deep Learning Engines
Transparent Accelerator Migration in a Virtualized GPU Environment
Transparent Checkpoint-Restart for Hardware-Accelerated 3D Graphics
Transparent Checkpointing for OpenGL Applications on GPUs
Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs
Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems
Transparent FPGA Acceleration with TensorFlow
Transparent use of Java objects on the GPU in the JaMP/OpenMP framework
Trapping of giant-planet cores – I. vortex aided trapping at the outer dead zone edge
Tree Structured Analysis on GPU Power Study
Treecode and fast multipole method for N-body simulation with CUDA
TREES: A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization
Trellis: Portability Across Architectures with a High-level Framework
Tri-Hybrid Computational Fluid Dynamics on DOE’s Cray XK7, Titan
Triangular matrix inversion on Graphics Processing Unit
Triangular mesh simplification on the GPU
Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems
Trie Compression for GPU Accelerated Multi-Pattern Matching
TrimZero: A Torch Recurrent Module for Efficient Natural Language Processing
Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations
True 4D Image Denoising on the GPU
TTC: A Tensor Transposition Compiler for Multiple Architectures
TuCCompi: A Multi-Layer Programing Model for Heterogeneous Systems with Auto-Tuning Capabilities
Tuned and asynchronous stencil kernels for CPU/GPU systems (thesis)
Tuned and GPU-accelerated parallel data mining from comparable corpora
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
Tuning a Finite Difference Computation for Parallel Vector Processors
Tuning Manifold Harmonics Filters
Tuning Stencil Codes in OpenCL for FPGAs
Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach
Turbo Bayesian Compressed Sensing
Tutorial 3: Methodologies and Performance Impacts of General Purpose Computing on GPUs
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
TVM: End-to-End Optimization Stack for Deep Learning
Two Algorithms for Sorting On Heterogeneous Clusters
Two Approaches to Particle Simulation: OpenMPI and CUDA
Two improved GPU acceleration strategies for force-directed graph layout
Two Level Approach to Efficient Visualization of Protein Dynamics
Two Simple Single-pass GPU methods for Multi-channel Surface Voxelization of Dynamic Scenes
Two Stage Data Mining Technique for Fast Monsoon Onset Prediction
Two-electron integral evaluation on the graphics processor unit
Two-fluid compressible simulations on GPU cluster
Two-Level Approach to Efficient Visualization of Protein Dynamics
Two-stage compression for fast volume rendering of time-varying scalar data
Two-way partitioning of a recursive Gaussian filter in CUDA
Two-Way Real Time Fluid Simulation Using a Heterogeneous Multicore CPU and GPU Architecture
Type-safe Runtime Code Generation: Accelerate to LLVM
Titles: 100
open PDFs: 89
packages: 20