Papers on hgpu.org (.txt-file)
TAP: A TLP-Aware Cache Management Policy for a CPU-GPU Heterogeneous Architecture

Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors

Tapping the supercomputer under your desk: solving dynamic equilibrium models with graphics processors?

Target Marker: A Visual Marker for Long Distances and Detection in Realtime on Mobile Devices

targetDP: an Abstraction of Lattice Based Parallelism with Portable Performance

Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience

Targeting heterogeneous architectures via macro data flow

Task and Data Distribution in Hybrid Parallel Systems

Task management for irregular-parallel workloads on the GPU

Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Task Parallelism and Data Distribution: An Overview of Explicit Parallel Programming Languages

Task Parallelism and Synchronization: An Overview of Explicit Parallel Programming Languages

Task parallelism-based architectures on FPGA to optimize the energy efficiency of AI at the edge

Task Partition Comparison between Multi-core System and GPU
Task Performance with List-Mode Data

Task Scheduling for Heterogeneous Multicore Systems

Task scheduling in hybrid CPU-GPU systems

Task Scheduling of Parallel Processing in CPU-GPU Collaborative Environment
Task Superscalar: An Out-of-Order Task Pipeline

Task superscalar: using processors as functional units

Task-based Conjugate-Gradient for multi-GPUs platforms

Task-based FMM for heterogeneous architectures

Task-Based Parallel Strategies for CFD Application in Heterogeneous CPU/GPU Resources

Task-based, GPU-accelerated and Robust Library for Solving Dense Nonsymmetric Eigenvalue Problems

Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA

TBD: Benchmarking and Analyzing Deep Neural Network Training

TC-CIM: Empowering Tensor Comprehensions for Computing-In-Memory

tcFFT: Accelerating Half-Precision FFT through Tensor Cores

TCUDB: Accelerating Database with Tensor Processors

TDDFT in massively parallel computer architectures: the OCTOPUS project

Teaching An Old Dog New Tricks: Porting Legacy Code to Heterogeneous Compute Architectures With Automated Code Translation

Teaching cardiac electrophysiology modeling to undergraduate students: laboratory exercises and GPU programming for the study of arrhythmias and spiral wave dynamics

Teaching graphics processing and architecture using a hardware prototyping approach

Teaching Parallel Programming in Containers: Virtualization of a Heterogeneous Local Infrastructure

Teaching Parallel Programming Models on a Shallow-Water Code

Teaching Parallel Programming Using Java

Technical aspects of the GPU accelerated surgical simulator

Technical Report about Tiramisu: a Three-Layered Abstraction for Hiding Hardware Complexity from DSL Compilers

Techniques for designing GPGPU games

Techniques for efficient DCT/IDCT implementation on generic GPU

Techniques for Mapping Synthetic Aperture Radar Processing Algorithms to Multi-GPU Clusters

Techniques to maximize memory bandwidth on the Rigel compute accelerator

TEDI: efficient shortest path query answering on graphs

TEG: GPU Performance Estimation Using a Timing Model

Telekine: Secure Computing with Cloud GPUs

Template Library for Multi-GPU Pseudorandom Number Recursion-based Generators

Temporal Blending for Adaptive SPH

Temporally Consistent Disparity and Optical Flow via Efficient Spatio-temporal Filtering

Temporospatial Epidemic Simulations Using Heterogeneous Computing

TENSILE: A Tensor granularity dynamic GPU memory scheduler method towards multiple dynamic workloads system

Tensor Computation Based on Heterogeneous Memory

Tensor Contractions with Extended BLAS Kernels on CPU and GPU

Tensor Processing Units for Financial Monte Carlo

Tensor Voting Accelerated by Graphics Processing Units (GPU)

TensorFlow: A system for large-scale machine learning

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

TensorFlow.js: Machine Learning for the Web and Beyond

TensorNetwork for Machine Learning

TensorNetwork: A Library for Physics and Machine Learning

Tera-scale Astronomical Data Analysis and Visualization

TeraFLOP computing on a desktop PC with GPUs for 3D CFD
Teraflop per second gravitational lensing ray-shooting using graphics processing units

Termination Analysis for GPU Kernels

TESLA GPUs versus MPI with OpenMP for the Forward Modeling of Gravity and Gravity Gradient of Large Prisms Ensemble

Tesla vs. Xeon Phi vs. Radeon A Compiler Writer’s Perspective

Testing and Exposing Weak Graphics Processing Unit Memory Models

Testing and Mutation Testing for GPU Kernels

Testing fine-grained parallelism for the ADMM on a factor-graph

Testing GPU Numerics: Finding Numerical Differences Between NVIDIA and AMD GPUs

Testing Tesla architecture for scientific computing: The performance of matrix-vector product

Tetrahedral Interpolation for Deformable Image Registration on GPUs

Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents

Texture Cache Approximation on GPUs

Texture compression of light maps using smooth profile functions

Texture-based visualization of uncertainty in flow fields

Texture-Based Visualization of Unsteady 3D Flow by Real-Time Advection and Volumetric Illumination
Texturing and Modeling, Third Edition: A Procedural Approach (The Morgan Kaufmann Series in Computer Graphics)

TH-1: China’s first petaflop supercomputer
The ‘Chimera’: an off-the-shelf CPU/GPGPU/FPGA hybrid computing platform

The 3D Flow Field Around an Embedded Planet

The accelerating implementation of BLAST with stream processor
The Accelerator Wall: Limits of Chip Specialization

The AES Implantation Based on OpenCL for Multi/many Core Architecture
The AGILE library for image reconstruction in biomedical sciences using graphics card hardware acceleration

The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

The AlexNet Moment for Homomorphic Encryption: HCNN, the First Homomorphic CNN on Encrypted Data with GPUs

The Anatomy of a Triton Attention Kernel

The Anatomy of High-Performance 2D Similarity Calculations

The ANTAREX Approach to Autotuning and Adaptivity for Energy Efficient HPC Systems

The ANTAREX Domain Specific Language for High Performance Computing

The Application of AI Technology in GPU Scheduling Algorithm Optimization

The Application of CUDA Architecture in Facial Expression Recognition
The application of GPU particle tracing to diffusion tensor field visualization

The Application Perspective: Seeking Productivity and Performance

Titles: 100
open PDFs: 90
packages: 29
