Papers on hgpu.org (.txt-file)
SYCL compute kernels for ExaHyPE
SYCL in the edge: performance and energy evaluation for heterogeneous acceleration
SYCL in the Edge: Performance Evaluation for Heterogeneous Acceleration
SYCL-Bench 2020: Benchmarking SYCL 2020 on AMD, Intel, and NVIDIA GPUs
SYCL-Bench: A Versatile Cross-Platform Benchmark Suite for Heterogeneous Computing
SYCL-Bench: A Versatile Single-Source Benchmark Suite for Heterogeneous Computing
SYCLops: A SYCL Specific LLVM to MLIR Converter
Sylkan: Towards a Vulkan Compute Target Platform for SYCL
Symbolic Crosschecking of Data-Parallel Floating Point Code
Symbolic crosschecking of floating-point and SIMD code
Symbolic Differentiation in GPU Shaders
Symbolic Testing of OpenCL Code
Symphony: A Scheduler for Client-Server Applications on Coprocessor-based Heterogeneous Clusters
Synchronization and Coordination in Heterogeneous Processors
Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming
Synergia CUDA: GPU-accelerated accelerator modeling package
Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra
Synergistic execution of stream programs on multicores with accelerators
SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving
Synkhronos: a Multi-GPU Theano Extension for Data Parallelism
Synthesis and rendering of bidirectional texture functions on arbitrary surfaces
Synthesis of Custom Networks of Heterogeneous Processing Elements for Complex Physical System Emulation
Synthesis of Embedded Software using Dataflow Schedule Graphs
Synthesis of GPU Programs from High-Level Models
Synthesis of Platform Architectures from OpenCL Programs
Synthesizing Benchmarks for Predictive Modeling
Synthesizing Software from a ForSyDe Model Targeting GPGPUs
Synthesizing Structured Traversals from Attribute Grammars
Synthesizing Subdivision Meshes Using Real Time Tessellation
Synthetic Aperture Beamformation using the GPU
Synthetic Aperture Radar imaging on a CUDA-enabled mobile platform
Synthetic Aperture Radar Processing with GPGPU
Syntix: A Profiling Based Resource Estimator for CUDA Kernels
System Design Principles for Heterogeneous Resource Management and Scheduling in Accelerator-Based Systems
System integration of FastSPECT III, a dedicated SPECT rodent-brain imager based on BazookaSPECT detector technology
System-Level Optimization and Code Generation for Graphics Processors using a Domain-Specific Language
Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU
Systematic construction, verification and implementation methodology for LDPC codes
Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture
Systematic Physics Constrained Parameter Estimation of Stochastic Differential Equations
SystemC simulation on GP-GPUs: CUDA vs. OpenCL
Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing
SZx: an Ultra-fast Error-bounded Lossy Compressor for Scientific Datasets
TABLA: A Unified Template-based Framework for Accelerating Statistical Machine Learning
Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
Tactics to Directly Map CNN graphs on Embedded FPGAs
Taichi: A Language for High-Performance Computation on Spatially Sparse Data Structures
Takagi Factorization on GPU using CUDA
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes
Taking the graphics processor beyond graphics
Taming irregular EDA applications on GPUs
Taming the complexities of the C11 and OpenCL memory models
Tamp: A Library for Compact Deep Neural Networks with Structured Matrices
Tangible video teleconference system using real-time image-based relighting
Tango: A Deep Neural Network Benchmark Suite for Various Accelerators
Tangram: a High-level Language for Performance Portable Code Synthesis
TAP: A TLP-Aware Cache Management Policy for a CPU-GPU Heterogeneous Architecture
Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors
Tapping the supercomputer under your desk: solving dynamic equilibrium models with graphics processors?
Target Marker: A Visual Marker for Long Distances and Detection in Realtime on Mobile Devices
targetDP: an Abstraction of Lattice Based Parallelism with Portable Performance
Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience
Targeting heterogeneous architectures via macro data flow
Task and Data Distribution in Hybrid Parallel Systems
Task management for irregular-parallel workloads on the GPU
Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster
Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout
Task Parallelism and Data Distribution: An Overview of Explicit Parallel Programming Languages
Task Parallelism and Synchronization: An Overview of Explicit Parallel Programming Languages
Task parallelism-based architectures on FPGA to optimize the energy efficiency of AI at the edge
Task Partition Comparison between Multi-core System and GPU
Task Performance with List-Mode Data
Task Scheduling for Heterogeneous Multicore Systems
Task scheduling in hybrid CPU-GPU systems
Task Scheduling of Parallel Processing in CPU-GPU Collaborative Environment
Task Superscalar: An Out-of-Order Task Pipeline
Task superscalar: using processors as functional units
Task-based Conjugate-Gradient for multi-GPUs platforms
Task-based FMM for heterogeneous architectures
Task-Based Parallel Strategies for CFD Application in Heterogeneous CPU/GPU Resources
Task-based, GPU-accelerated and Robust Library for Solving Dense Nonsymmetric Eigenvalue Problems
Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System
Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA
TBD: Benchmarking and Analyzing Deep Neural Network Training
TC-CIM: Empowering Tensor Comprehensions for Computing-In-Memory
tcFFT: Accelerating Half-Precision FFT through Tensor Cores
TCUDB: Accelerating Database with Tensor Processors
TDDFT in massively parallel computer architectures: the OCTOPUS project
Teaching cardiac electrophysiology modeling to undergraduate students: laboratory exercises and GPU programming for the study of arrhythmias and spiral wave dynamics
Teaching graphics processing and architecture using a hardware prototyping approach
Teaching Parallel Programming in Containers: Virtualization of a Heterogeneous Local Infrastructure
Teaching Parallel Programming Models on a Shallow-Water Code
Teaching Parallel Programming Using Java
Technical aspects of the GPU accelerated surgical simulator
Technical Report about Tiramisu: a Three-Layered Abstraction for Hiding Hardware Complexity from DSL Compilers
Techniques for designing GPGPU games
Techniques for efficient DCT/IDCT implementation on generic GPU
Titles: 100
open PDFs: 95
packages: 32