Papers on hgpu.org (.txt-file)
Synthesizing Subdivision Meshes Using Real Time Tessellation

Synthetic Aperture Beamformation using the GPU

Synthetic Aperture Radar imaging on a CUDA-enabled mobile platform

Synthetic Aperture Radar Processing with GPGPU

Syntix: A Profiling Based Resource Estimator for CUDA Kernels

System Design Principles for Heterogeneous Resource Management and Scheduling in Accelerator-Based Systems

System integration of FastSPECT III, a dedicated SPECT rodent-brain imager based on BazookaSPECT detector technology

System-Level Optimization and Code Generation for Graphics Processors using a Domain-Specific Language

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU

Systematic construction, verification and implementation methodology for LDPC codes

Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture

Systematic Physics Constrained Parameter Estimation of Stochastic Differential Equations

SystemC simulation on GP-GPUs: CUDA vs. OpenCL

Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing

SZx: an Ultra-fast Error-bounded Lossy Compressor for Scientific Datasets

TABLA: A Unified Template-based Framework for Accelerating Statistical Machine Learning

Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Tactics to Directly Map CNN graphs on Embedded FPGAs

Taichi: A Language for High-Performance Computation on Spatially Sparse Data Structures

Takagi Factorization on GPU using CUDA

Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

Taking the graphics processor beyond graphics

Taming irregular EDA applications on GPUs
Taming the complexities of the C11 and OpenCL memory models

Tamp: A Library for Compact Deep Neural Networks with Structured Matrices

Tangible video teleconference system using real-time image-based relighting

Tango: A Deep Neural Network Benchmark Suite for Various Accelerators

Tangram: a High-level Language for Performance Portable Code Synthesis

TAP: A TLP-Aware Cache Management Policy for a CPU-GPU Heterogeneous Architecture

Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors

Tapping the supercomputer under your desk: solving dynamic equilibrium models with graphics processors?

Target Marker: A Visual Marker for Long Distances and Detection in Realtime on Mobile Devices

targetDP: an Abstraction of Lattice Based Parallelism with Portable Performance

Targeted Testing of Compiler Optimizations via Grammar-Level Composition Styles

Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience

Targeting heterogeneous architectures via macro data flow

Task and Data Distribution in Hybrid Parallel Systems

Task management for irregular-parallel workloads on the GPU

Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Task Parallelism and Data Distribution: An Overview of Explicit Parallel Programming Languages

Task Parallelism and Synchronization: An Overview of Explicit Parallel Programming Languages

Task parallelism-based architectures on FPGA to optimize the energy efficiency of AI at the edge

Task Partition Comparison between Multi-core System and GPU
Task Performance with List-Mode Data

Task Scheduling for Heterogeneous Multicore Systems

Task scheduling in hybrid CPU-GPU systems

Task Scheduling of Parallel Processing in CPU-GPU Collaborative Environment
Task Superscalar: An Out-of-Order Task Pipeline

Task superscalar: using processors as functional units

Task-based Conjugate-Gradient for multi-GPUs platforms

Task-based FMM for heterogeneous architectures

Task-Based Parallel Strategies for CFD Application in Heterogeneous CPU/GPU Resources

Task-based, GPU-accelerated and Robust Library for Solving Dense Nonsymmetric Eigenvalue Problems

Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA

TBD: Benchmarking and Analyzing Deep Neural Network Training

TC-CIM: Empowering Tensor Comprehensions for Computing-In-Memory

tcFFT: Accelerating Half-Precision FFT through Tensor Cores

TCUDB: Accelerating Database with Tensor Processors

TDDFT in massively parallel computer architectures: the OCTOPUS project

Teaching An Old Dog New Tricks: Porting Legacy Code to Heterogeneous Compute Architectures With Automated Code Translation

Teaching cardiac electrophysiology modeling to undergraduate students: laboratory exercises and GPU programming for the study of arrhythmias and spiral wave dynamics

Teaching graphics processing and architecture using a hardware prototyping approach

Teaching Parallel Programming in Containers: Virtualization of a Heterogeneous Local Infrastructure

Teaching Parallel Programming Models on a Shallow-Water Code

Teaching Parallel Programming Using Java

Technical aspects of the GPU accelerated surgical simulator

Technical Report about Tiramisu: a Three-Layered Abstraction for Hiding Hardware Complexity from DSL Compilers

Techniques for designing GPGPU games

Techniques for efficient DCT/IDCT implementation on generic GPU

Techniques for Mapping Synthetic Aperture Radar Processing Algorithms to Multi-GPU Clusters

Techniques to maximize memory bandwidth on the Rigel compute accelerator

TEDI: efficient shortest path query answering on graphs

TEG: GPU Performance Estimation Using a Timing Model

Telekine: Secure Computing with Cloud GPUs

Template Library for Multi-GPU Pseudorandom Number Recursion-based Generators

Temporal Blending for Adaptive SPH

Temporally Consistent Disparity and Optical Flow via Efficient Spatio-temporal Filtering

Temporospatial Epidemic Simulations Using Heterogeneous Computing

TENSILE: A Tensor granularity dynamic GPU memory scheduler method towards multiple dynamic workloads system

Tensor Computation Based on Heterogeneous Memory

Tensor Contractions with Extended BLAS Kernels on CPU and GPU

Tensor Processing Units for Financial Monte Carlo

Tensor Voting Accelerated by Graphics Processing Units (GPU)

TensorFlow: A system for large-scale machine learning

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

TensorFlow.js: Machine Learning for the Web and Beyond

TensorNetwork for Machine Learning

TensorNetwork: A Library for Physics and Machine Learning

Tera-scale Astronomical Data Analysis and Visualization

TeraFLOP computing on a desktop PC with GPUs for 3D CFD
Teraflop per second gravitational lensing ray-shooting using graphics processing units

Termination Analysis for GPU Kernels

TESLA GPUs versus MPI with OpenMP for the Forward Modeling of Gravity and Gravity Gradient of Large Prisms Ensemble

Tesla vs. Xeon Phi vs. Radeon A Compiler Writer’s Perspective

Titles: 100
open PDFs: 94
packages: 28
