Papers on hgpu.org (.txt-file)
Topical perspective on massive threading and parallelism
TopicBERT for Energy Efficient Document Classification
Topology optimization design of 3D electrothermomechanical actuators by using GPU as a co-processor
Topology Optimization with Unstructured Meshes on Graphics Processing Units (GPUs)
Torch7: A Matlab-like Environment for Machine Learning
TorchAudio: Building Blocks for Audio and Speech Processing
TorchBench: Benchmarking PyTorch with High API Surface Coverage
Torchnet: An Open-Source Platform for (Deep) Learning Research
torchode: A Parallel ODE Solver for PyTorch
TorchOpt: An Efficient Library for Differentiable Optimization
Toward a Generic Hybrid CPU-GPU Parallelization of Divide-and-Conquer Algorithms
Toward a GPU-Accelerated Immersed Boundary Method for Wind Forecasting Over Complex Terrain
Toward a Multi-level Parallel Framework on GPU Cluster with PetSC-CUDA for PDE-based Optical Flow Computation
Toward a multicore architecture for real-time ray-tracing
Toward a Practical Implementation of Exemplar-Based Noise Robust ASR
Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems
Toward Acceleration of RSA Using 3D Graphics Hardware
Toward Accurate Platform-Aware Performance Modeling for Deep Neural Networks
Toward Auto-tuned Krylov Basis Computations with minimized Communication on Clusters of Accelerators
Toward Automatic Translation: From OpenACC to OpenMP 4
Toward Better Computation Models for Modern Machines
Toward efficient GPU-accelerated N-body simulations
Toward GPU Accelerated Data Stream Processing
Toward GPU-accelerated Traffic Simulation and Its Real-Time Challenge
Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries
Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs
Toward improved aeromechanics simulations using recent advancements in scientific computing
Toward large-scale Hybrid Monte Carlo simulations of the Hubbard model on graphics processing units
Toward OpenCL Automatic Multi-Device Support
Toward optimised skeletons for heterogeneous parallel architecture with performance cost model
Toward Performance Portability for CPUs and GPUs Through Algorithmic Compositions
Toward Practical Real-Time Photon Mapping: Efficient GPU Density Estimation
Toward Real-Time Dense 3d Reconstruction using Stereo Vision
Toward real-time kernel density estimate display for instrumentation
Towards a Benchmarking Suite for Kernel Tuners
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
Towards a Distributed GPU-Accelerated Matrix Inversion
Towards a functional run-time for dense NLA domain
Towards a GPU-based Implementation of Interaction Nets
Towards a GPU-Based Simulation Framework for Deformable Surface Meshes
Towards a GPU-Parallelization of the neXtSIM-DG Dynamical Core
Towards a More Efficient Use of GPUs
Towards a Performance-Portable FFT Library for Heterogeneous Computing
Towards a Portable and Future-proof Particle-in-Cell Plasma Physics Code
Towards a robust, real-time face processing system using CUDA-enabled GPUs
Towards a Software Transactional Memory for Graphics Processors
Towards a Tunable Multi-Backend Skeleton Programming Framework for Multi-GPU Systems
Towards a Unified CPU-GPU code hybridization: A GPU Based Optimization Strategy Efficient on Other Modern Architectures
Towards a unified framework for rapid 3D computed tomography on commodity GPUs
Towards a Unified Sentiment Lexicon (USL) based on Graphics Processing Units (GPUs)
Towards a Unified Sentiment Lexicon Based on Graphics Processing Units
Towards Accelerated Computation of Atmospheric Equations Using CUDA
Towards accelerating molecular modeling via multi-scale approximation on a GPU
Towards accelerating Smoothed Particle Hydrodynamics simulations for free-surface flows on multi-GPU clusters
Towards acceleration of fault simulation using graphics processing units
Towards ad-hoc GPU acceleration of parallel eigensystem computations
Towards Adaptive GPU Resource Management for Embedded Real-Time Systems
Towards Alignment of Parallelism in SYCL and ISO C++
Towards an automatic generation of dense linear algebra solvers on parallel architectures
Towards an Effective Unified Programming Model for Many-Cores
Towards an embedded biologically-inspired machine vision processor
Towards an interactive and automated script feature analysis of 3D scanned cuneiform tablets
Towards automated kernel selection in machine learning systems: A SYCL case study
Towards Automated Learning of Object Detectors
Towards Automatic C Programs Optimization and Parallelization using the PIPS-PoCC Integration
Towards automatic Digital Surface Model generation using a Graphics Processing Unit
Towards Automatic Learning of Heuristics for Mechanical Transformations of Procedural Code
Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs
Towards Automating Multi-dimensional Data Decomposition for Executing a Single-GPU Code on a Multi-GPU System
Towards Building Error Resilient GPGPU Applications
Towards Chip-on-Chip Neuroscience: Fast Mining of Frequent Episodes Using Graphics Processors
Towards chip-on-chip neuroscience: fast mining of neuronal spike streams using graphics hardware
Towards Co-execution on Commodity Heterogeneous Systems: Optimizations for Time-Constrained Scenarios
Towards Code Generation from Design Models for Embedded Systems on Heterogeneous CPU-GPU Platforms
Towards Comprehensive Parametric Code Generation Targeting Graphics Processing Units in Support of Scientific Computation
Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems
Towards Distortion-Predictable Embedding of Neural Networks
Towards Distributed Heterogenous High-Performance Computing with ViennaCL
Towards Domain-specific Computing for Stencil Codes in HPC
Towards dynamic reconfigurable load-balancing for hybrid desktop platforms
Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA
Towards Efficient GPU Sharing on Multicore Processors
Towards Efficient Indexing of Spatiotemporal Trajectories on the GPU for Distance Threshold Similarity Searches
Towards Efficient Large-Scale Graph Neural Network Computing
Towards Efficient Risk Quantification-Using GPUs and Variance Reduction Technique
Towards energy efficiency and productivity for decision making in mobile robot navigation
Towards Enhancing Performance, Programmability, and Portability in Heterogeneous Computing
Towards fast and certified multiple-precision libraries
Towards Faster Cloth Simulation: Examining the Preconditioned Conjugate Gradient
Towards fully user transparent task and data parallel image processing
Towards global composition of performance-aware components for GPU-based systems
Towards Good Practices for Very Deep Two-Stream ConvNets
Towards GPGPU Assisted Computing in Virtualized Environments
Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud
Towards Green Computing: A Survey of Performance and Energy Efficiency of Different Platforms using OpenCL
Towards High Performance Java-based Deep Learning Frameworks
Towards High Speed Aerial Tracking of Agile Targets
Towards High-Performance and Cost-Effective Distributed Storage Systems with Information Dispersal Algorithms
Towards Improving Programmability of Heterogeneous Parallel Architectures
Towards Intelligent Runtime Framework for Distributed Heterogeneous Systems
Titles: 100
open PDFs: 93
packages: 18