Papers on hgpu.org (.txt-file)
Task parallelism-based architectures on FPGA to optimize the energy efficiency of AI at the edge
Task Partition Comparison between Multi-core System and GPU
Task Performance with List-Mode Data
Task Scheduling for Heterogeneous Multicore Systems
Task scheduling in hybrid CPU-GPU systems
Task Scheduling of Parallel Processing in CPU-GPU Collaborative Environment
Task Superscalar: An Out-of-Order Task Pipeline
Task superscalar: using processors as functional units
Task-based Conjugate-Gradient for multi-GPUs platforms
Task-based FMM for heterogeneous architectures
Task-Based Parallel Strategies for CFD Application in Heterogeneous CPU/GPU Resources
Task-based, GPU-accelerated and Robust Library for Solving Dense Nonsymmetric Eigenvalue Problems
Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System
Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA
TBD: Benchmarking and Analyzing Deep Neural Network Training
TC-CIM: Empowering Tensor Comprehensions for Computing-In-Memory
tcFFT: Accelerating Half-Precision FFT through Tensor Cores
TCUDB: Accelerating Database with Tensor Processors
TDDFT in massively parallel computer architectures: the OCTOPUS project
Teaching cardiac electrophysiology modeling to undergraduate students: laboratory exercises and GPU programming for the study of arrhythmias and spiral wave dynamics
Teaching graphics processing and architecture using a hardware prototyping approach
Teaching Parallel Programming in Containers: Virtualization of a Heterogeneous Local Infrastructure
Teaching Parallel Programming Models on a Shallow-Water Code
Teaching Parallel Programming Using Java
Technical aspects of the GPU accelerated surgical simulator
Technical Report about Tiramisu: a Three-Layered Abstraction for Hiding Hardware Complexity from DSL Compilers
Techniques for designing GPGPU games
Techniques for efficient DCT/IDCT implementation on generic GPU
Techniques for Mapping Synthetic Aperture Radar Processing Algorithms to Multi-GPU Clusters
Techniques to maximize memory bandwidth on the Rigel compute accelerator
TEDI: efficient shortest path query answering on graphs
TEG: GPU Performance Estimation Using a Timing Model
Telekine: Secure Computing with Cloud GPUs
Template Library for Multi-GPU Pseudorandom Number Recursion-based Generators
Temporal Blending for Adaptive SPH
Temporally Consistent Disparity and Optical Flow via Efficient Spatio-temporal Filtering
Temporospatial Epidemic Simulations Using Heterogeneous Computing
TENSILE: A Tensor granularity dynamic GPU memory scheduler method towards multiple dynamic workloads system
Tensor Computation Based on Heterogeneous Memory
Tensor Contractions with Extended BLAS Kernels on CPU and GPU
Tensor Processing Units for Financial Monte Carlo
Tensor Voting Accelerated by Graphics Processing Units (GPU)
TensorFlow: A system for large-scale machine learning
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
TensorFlow.js: Machine Learning for the Web and Beyond
TensorNetwork for Machine Learning
TensorNetwork: A Library for Physics and Machine Learning
Tera-scale Astronomical Data Analysis and Visualization
TeraFLOP computing on a desktop PC with GPUs for 3D CFD
Teraflop per second gravitational lensing ray-shooting using graphics processing units
Termination Analysis for GPU Kernels
TESLA GPUs versus MPI with OpenMP for the Forward Modeling of Gravity and Gravity Gradient of Large Prisms Ensemble
Tesla vs. Xeon Phi vs. Radeon A Compiler Writer’s Perspective
Testing and Exposing Weak Graphics Processing Unit Memory Models
Testing and Mutation Testing for GPU Kernels
Testing fine-grained parallelism for the ADMM on a factor-graph
Testing GPU Numerics: Finding Numerical Differences Between NVIDIA and AMD GPUs
Testing Tesla architecture for scientific computing: The performance of matrix-vector product
Tetrahedral Interpolation for Deformable Image Registration on GPUs
Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents
Texture Cache Approximation on GPUs
Texture compression of light maps using smooth profile functions
Texture-based visualization of uncertainty in flow fields
Texture-Based Visualization of Unsteady 3D Flow by Real-Time Advection and Volumetric Illumination
Texturing and Modeling, Third Edition: A Procedural Approach (The Morgan Kaufmann Series in Computer Graphics)
TH-1: China’s first petaflop supercomputer
The ‘Chimera’: an off-the-shelf CPU/GPGPU/FPGA hybrid computing platform
The 3D Flow Field Around an Embedded Planet
The accelerating implementation of BLAST with stream processor
The Accelerator Wall: Limits of Chip Specialization
The AES Implantation Based on OpenCL for Multi/many Core Architecture
The AGILE library for image reconstruction in biomedical sciences using graphics card hardware acceleration
The AlexNet Moment for Homomorphic Encryption: HCNN, the First Homomorphic CNN on Encrypted Data with GPUs
The Anatomy of High-Performance 2D Similarity Calculations
The ANTAREX Approach to Autotuning and Adaptivity for Energy Efficient HPC Systems
The ANTAREX Domain Specific Language for High Performance Computing
The Application of AI Technology in GPU Scheduling Algorithm Optimization
The Application of CUDA Architecture in Facial Expression Recognition
The application of GPU particle tracing to diffusion tensor field visualization
The Application Perspective: Seeking Productivity and Performance
The Arcane development framework
The Architecture and Evolution of CPU-GPU Systems for General Purpose Computing
The architecture of the DecentVM: towards a decentralized virtual machine for many-core computing
The Art of Balance: A RateupDB Experience of Building a CPU/GPU Hybrid Database Product
The Astrophysical Multipurpose Software Environment
The battle of the giants: a case study of GPU vs FPGA optimisation for real-time image processing
The BiConjugate gradient method on GPUs
The Boat Hull Model: Adapting the Roofline Model to Enable Performance Prediction for Parallel Computing
The BondMachine toolkit: Enabling Machine Learning on FPGA
The Bones Source-to-Source Compiler Manual
The Case for Higher Computational Density in the Memory-Bound FDTD Method within Multicore Environments
The case for VOS: the vector operating system
The Celerity High-level API: C++20 for Accelerator Clusters
The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units
The Comparisons of OpenCL and OpenMP Computing Paradigm
The Complete Rank Transform: A Tool for Accurate and Morphologically Invariant Matching of Structures
Titles: 100
open PDFs: 89
packages: 27