## Papers on hgpu.org (.txt-file)

Topical perspective on massive threading and parallelism

TopicBERT for Energy Efficient Document Classification

Topology optimization design of 3D electrothermomechanical actuators by using GPU as a co-processor

Topology Optimization with Unstructured Meshes on Graphics Processing Units (GPUs)

Torch7: A Matlab-like Environment for Machine Learning

TorchAudio: Building Blocks for Audio and Speech Processing

TorchBench: Benchmarking PyTorch with High API Surface Coverage

Torchnet: An Open-Source Platform for (Deep) Learning Research

torchode: A Parallel ODE Solver for PyTorch

TorchOpt: An Efficient Library for Differentiable Optimization

Toward a Generic Hybrid CPU-GPU Parallelization of Divide-and-Conquer Algorithms

Toward a GPU-Accelerated Immersed Boundary Method for Wind Forecasting Over Complex Terrain

Toward a Multi-level Parallel Framework on GPU Cluster with PetSC-CUDA for PDE-based Optical Flow Computation

Toward a multicore architecture for real-time ray-tracing

Toward a Practical Implementation of Exemplar-Based Noise Robust ASR

Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems

Toward Acceleration of RSA Using 3D Graphics Hardware

Toward Accurate Platform-Aware Performance Modeling for Deep Neural Networks

Toward Auto-tuned Krylov Basis Computations with minimized Communication on Clusters of Accelerators

Toward Automatic Translation: From OpenACC to OpenMP 4

Toward Better Computation Models for Modern Machines

Toward efficient GPU-accelerated N-body simulations

Toward GPU Accelerated Data Stream Processing

Toward GPU-accelerated Traffic Simulation and Its Real-Time Challenge

Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries

Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs

Toward improved aeromechanics simulations using recent advancements in scientific computing

Toward large-scale Hybrid Monte Carlo simulations of the Hubbard model on graphics processing units

Toward OpenCL Automatic Multi-Device Support

Toward optimised skeletons for heterogeneous parallel architecture with performance cost model

Toward Performance Portability for CPUs and GPUs Through Algorithmic Compositions

Toward Practical Real-Time Photon Mapping: Efficient GPU Density Estimation

Toward Real-Time Dense 3d Reconstruction using Stereo Vision

Toward real-time kernel density estimate display for instrumentation

Towards a Benchmarking Suite for Kernel Tuners

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

Towards a Distributed GPU-Accelerated Matrix Inversion

Towards a functional run-time for dense NLA domain

Towards a GPU-based Implementation of Interaction Nets

Towards a GPU-Based Simulation Framework for Deformable Surface Meshes

Towards a GPU-Parallelization of the neXtSIM-DG Dynamical Core

Towards a More Efficient Use of GPUs

Towards a Performance-Portable FFT Library for Heterogeneous Computing

Towards a Portable and Future-proof Particle-in-Cell Plasma Physics Code

Towards a robust, real-time face processing system using CUDA-enabled GPUs

Towards a Software Transactional Memory for Graphics Processors

Towards a Tunable Multi-Backend Skeleton Programming Framework for Multi-GPU Systems

Towards a Unified CPU-GPU code hybridization: A GPU Based Optimization Strategy Efficient on Other Modern Architectures

Towards a unified framework for rapid 3D computed tomography on commodity GPUs

Towards a Unified Sentiment Lexicon (USL) based on Graphics Processing Units (GPUs)

Towards a Unified Sentiment Lexicon Based on Graphics Processing Units

Towards Accelerated Computation of Atmospheric Equations Using CUDA

Towards accelerating molecular modeling via multi-scale approximation on a GPU

Towards accelerating Smoothed Particle Hydrodynamics simulations for free-surface flows on multi-GPU clusters

Towards acceleration of fault simulation using graphics processing units

Towards ad-hoc GPU acceleration of parallel eigensystem computations

Towards Adaptive GPU Resource Management for Embedded Real-Time Systems

Towards Alignment of Parallelism in SYCL and ISO C++

Towards an automatic generation of dense linear algebra solvers on parallel architectures

Towards an Effective Unified Programming Model for Many-Cores

Towards an embedded biologically-inspired machine vision processor

Towards an interactive and automated script feature analysis of 3D scanned cuneiform tablets

Towards automated kernel selection in machine learning systems: A SYCL case study

Towards Automated Learning of Object Detectors

Towards Automatic C Programs Optimization and Parallelization using the PIPS-PoCC Integration

Towards automatic Digital Surface Model generation using a Graphics Processing Unit

Towards Automatic Learning of Heuristics for Mechanical Transformations of Procedural Code

Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs

Towards Automating Multi-dimensional Data Decomposition for Executing a Single-GPU Code on a Multi-GPU System

Towards Building Error Resilient GPGPU Applications

Towards Chip-on-Chip Neuroscience: Fast Mining of Frequent Episodes Using Graphics Processors

Towards chip-on-chip neuroscience: fast mining of neuronal spike streams using graphics hardware

Towards Co-execution on Commodity Heterogeneous Systems: Optimizations for Time-Constrained Scenarios

Towards Code Generation from Design Models for Embedded Systems on Heterogeneous CPU-GPU Platforms

Towards Comprehensive Parametric Code Generation Targeting Graphics Processing Units in Support of Scientific Computation

Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems

Towards Distortion-Predictable Embedding of Neural Networks

Towards Distributed Heterogenous High-Performance Computing with ViennaCL

Towards Domain-specific Computing for Stencil Codes in HPC

Towards dynamic reconfigurable load-balancing for hybrid desktop platforms

Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA

Towards Efficient GPU Sharing on Multicore Processors

Towards Efficient Indexing of Spatiotemporal Trajectories on the GPU for Distance Threshold Similarity Searches

Towards Efficient Large-Scale Graph Neural Network Computing

Towards Efficient Risk Quantification-Using GPUs and Variance Reduction Technique

Towards energy efficiency and productivity for decision making in mobile robot navigation

Towards Enhancing Performance, Programmability, and Portability in Heterogeneous Computing

Towards fast and certified multiple-precision libraries

Towards Faster Cloth Simulation: Examining the Preconditioned Conjugate Gradient

Towards fully user transparent task and data parallel image processing

Towards global composition of performance-aware components for GPU-based systems

Towards Good Practices for Very Deep Two-Stream ConvNets

Towards GPGPU Assisted Computing in Virtualized Environments

Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud

Towards Green Computing: A Survey of Performance and Energy Efficiency of Different Platforms using OpenCL

Towards High Performance Java-based Deep Learning Frameworks

Towards High Speed Aerial Tracking of Agile Targets

Towards High-Performance and Cost-Effective Distributed Storage Systems with Information Dispersal Algorithms

Towards Improving Programmability of Heterogeneous Parallel Architectures

Towards Intelligent Runtime Framework for Distributed Heterogeneous Systems

Titles: 100

open PDFs: 93

packages: 18