Papers on hgpu.org (.txt-file)
Towards Efficient Risk Quantification-Using GPUs and Variance Reduction Technique

Towards energy efficiency and productivity for decision making in mobile robot navigation

Towards Enhancing Performance, Programmability, and Portability in Heterogeneous Computing

Towards fast and certified multiple-precision libraries

Towards Faster Cloth Simulation: Examining the Preconditioned Conjugate Gradient

Towards fully user transparent task and data parallel image processing

Towards global composition of performance-aware components for GPU-based systems

Towards Good Practices for Very Deep Two-Stream ConvNets

Towards GPGPU Assisted Computing in Virtualized Environments
Towards GPU Parallelism Abstractions in Rust: A Case Study with Linear Pipelines

Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud

Towards Green Computing: A Survey of Performance and Energy Efficiency of Different Platforms using OpenCL

Towards High Performance Java-based Deep Learning Frameworks

Towards High Speed Aerial Tracking of Agile Targets

Towards High-Performance and Cost-Effective Distributed Storage Systems with Information Dispersal Algorithms

Towards Improving Programmability of Heterogeneous Parallel Architectures

Towards Intelligent Runtime Framework for Distributed Heterogeneous Systems

Towards Interactive Visual Exploration of Parallel Programs using a Domain-specific Language

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

Towards large-scale network analytics

Towards Lattice Quantum Chromodynamics on FPGA devices

Towards making the most of NLP-based device mapping optimization for OpenCL kernels

Towards Memory-Efficient Answering of Tree-Shaped SPARQL Queries using GPUs

Towards metaprogramming for parallel systems on a chip

Towards microsecond biological molecular dynamics simulations on hybrid processors
Towards Modeling Energy Consumption of Xeon Phi

Towards multi-GPU support for visualization

Towards Multi-GPU Support in the Marrow Skeleton Framework

Towards On-Chip Optical FFTs for Convolutional Neural Networks

Towards On-Line Digital Doubles

Towards paradisEO-MO-GPU: a framework for GPU-based local search metaheuristics

Towards Parallel Programming Models for Predictability

Towards Performance Portable Programming for Distributed Heterogeneous Systems

Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems

Towards Performance-Portable, Scalable, and Convenient Linear Algebra

Towards Portable Performance for Explicit Hydrodynamics Codes

Towards Porting a Real-World Seismological Application to the Intel MIC Architecture

Towards Predictable Real-Time Performance on Multi-Core Platforms

Towards Rapid Prototyping of Parallel and HPC Applications (GPU Focus)

Towards real time 2D to 3D registration for ultrasound-guided endoscopic and laparoscopic procedures

Towards real time 3D tracking and reconstruction on a GPU using Monte Carlo simulations

Towards real time vision based UUV navigation using GPU technology
Towards real-time radiation therapy: GPU accelerated superposition/convolution

Towards real-time tomography: Fast reconstruction algorithms and GPU implementation
Towards reverse engineering the brain: Modeling abstractions and simulation frameworks

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Towards robust automatic detection of vulnerable road users: monocular pedestrian tracking from a moving vehicle

Towards scalar synchronization in SIMT architectures

Towards shared memory consistency models for GPUs

Towards solving the Table Maker’s Dilemma on GPU

Towards Studying the Effect of Compiler Optimizations and Software Randomization on GPU Reliability

Towards systematic exploration of tradeoffs for medical image registration on heterogeneous platforms

Towards Understanding and Mitigating Memory-Access Challenges in Computing Systems

Towards Unified Analysis of GPU Consistency

Towards Unified INT8 Training for Convolutional Neural Network

Towards user transparent parallel multimedia computing on GPU-clusters

Towards Utilizing GPUs in Information Visualization: A Model and Implementation of Image-Space Operations

Towards Utilizing Remote GPUs for CUDA Program Execution

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s

Track finding in ATLAS using GPUs

Tracking 3d Pose of Rigid Object by Sparse Template Matching
Tracking and Clustering Salient Features in Image Sequences

Tracking humans interacting with the environment using efficient hierarchical sampling and layered observation models

Tracking Many Solution Paths of a Polynomial Homotopy on a Graphics Processing Unit

Tradeoff analysis and optimization of power delivery networks with on-chip voltage regulation

Tradeoffs in designing accelerator architectures for visual computing

Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration

Training a Feedback Loop for Hand Pose Estimation

Training a Vision Transformer from scratch in less than 24 hours with 1 GPU

Training DNN Models over Heterogeneous Clusters with Optimal Performance

Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)

Training Neural Networks Without Gradients: A Scalable ADMM Approach

Tranformation of CPU-based Applications To Leverage on Graphics Processors using CUDA

TransAxx: Efficient Transformers with Approximate Computing

TransCAIP: A Live 3D TV System Using a Camera Array and an Integral Photography Display with Interactive Control of Viewing Parameters

TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework

Transfer Time Reduction of Data Transfers between CPU and GPU

Transform Coding for Hardware-accelerated Volume Rendering

Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI multi GPU Backends with Subdomain Support

Transformations of High-Level Synthesis Codes for High-Performance Computing

Transforming and Optimizing Irregular Applications for Parallel Architectures

Transforming C OpenMP Programs for Verification in CIVL

Translating GPU binaries to tiered SIMD architectures with Ocelot

Translating OpenMP Device Constructs to OpenCL using Unnecessary Data Transfer Elimination

Translation-invariant two-dimensional discrete wavelet transform on graphics processing units

Transparent Acceleration for Heterogeneous Platforms With Compilation to OpenCL

Transparent Acceleration of Java-based Deep Learning Engines

Transparent Accelerator Migration in a Virtualized GPU Environment

Transparent Checkpoint-Restart for Hardware-Accelerated 3D Graphics

Transparent Checkpointing for OpenGL Applications on GPUs

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems

Transparent FPGA Acceleration with TensorFlow

Transparent use of Java objects on the GPU in the JaMP/OpenMP framework

Trapping of giant-planet cores – I. vortex aided trapping at the outer dead zone edge

Tree Structured Analysis on GPU Power Study

Treecode and fast multipole method for N-body simulation with CUDA

TREES: A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization

Titles: 100
open PDFs: 94
packages: 19
