Papers on hgpu.org (.txt-file)
The Parallel Processing Based on CUDA for Convolution Filter FDK Reconstruction of CT
The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures
The PEPPHER Composition Tool: Performance-Aware Dynamic Composition of Applications for GPU-based Systems
The Performance Analysis Based on Heterogeneous Parallel Processors for Anisotropic Diffusion Filters
The performances of R GPU implementations of the GMRES method
The Physics of Singular Dislocation Structures in Continuum Dislocation Dynamics
The Plasma Simulation Code: A modern particle-in-cell code with load-balancing and GPU support
The Possibility of Fast Large-Scale Numerical Simulation Implemented with Graphics Processing Units
The Potential for a GPU-Like Overlay Architecture for FPGAs
The Potential of the Intel Xeon Phi for Supervised Deep Learning
The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications
The Promises of Hybrid Hexagonal/Classical Tiling for GPU
The Q Continuum Simulation: Harnessing the Power of GPU Accelerated Supercomputers
The Reconstruction Toolkit (RTK), an open-source cone-beam CT reconstruction toolkit based on the Insight Toolkit (ITK)
The Reduction Problem in CUDA and Its Simulation with P Systems
The Research of Large-Scale 3D Scenes Rendering Optimization
The Research of Real-Time Shadow Rendering Algorithm of Virtual Scenes
The Rewriting of DataRaceBench Benchmark for OpenCL Program Validations
The Rhombic Dodecahedron Map: An Efficient Scheme for Encoding Panoramic Video
The Risks of WebGL: Analysis, Evaluation and Detection
The Rodinia Benchmark Suite in SYCL
The role of GPU computing in medical image analysis and visualization
The role of multigrid algorithms for LQCD
The Saga of Landau-Gauge Propagators: Gathering New Ammo
The Scalable Heterogeneous Computing (SHOC) benchmark suite
The scoring sequences on profile Hidden Markov Models with delete states elimination by GPUs
The Security of Key Derivation Functions in WINRAR
The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches
The sparse matrix vector product on GPUs
The State of the Art in Interactive Global Illumination
The Stencil Processing Unit: GPGPU Done Right
The Study of the OpenCL Processing Models for the FPGA Devices
The system for visualization of synoptic objects
The Test and Evaluation Uses of Heterogeneous Computing: GPGPUs and Other Approaches
The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method
The Tradeoffs of Fused Memory Hierarchies in Heterogeneous Computing Architectures
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Use of Automated Search in Deriving Software Testing Strategies
The Use of GPUs for Solving the Computed Tomography Problem
The use of overlapping subgrids to accelerate the FDTD on GPU devices
The Vectorization of the Tersoff Multi-Body Potential: An Exercise in Performance Portability
The VerCors Verifier: A Progress Report
The Virtual Marathon: Parallel Computing Supports Crowd Simulations
The Virtual OpenCL (VCL) Cluster Platform
The visible ear surgery simulator
The visual vulnerability spectrum: characterizing architectural vulnerability for graphics hardware
The VOLNA-OP2 Tsunami Code (Version 1.0)
The VRE volume rendering engine
The Yin and Yang of Processing Data Warehousing Queries on GPU Devices
Theano-based Large-Scale Visual Recognition with Multiple GPUs
Theano-MPI: a Theano-based Distributed Training Framework
Theano: A CPU and GPU Math Compiler in Python
Theano: A Python framework for fast computation of mathematical expressions
Theano: Deep Learning on GPUs with Python
TheanoLM – An Extensible Toolkit for Neural Network Language Modeling
Themis: Fair and Efficient GPU Cluster Scheduling for Machine Learning Workloads
Theoretical and Numerical Analysis of Three Approaches to the GPGPU Application of the Explicit FDTD Method
Theory of square, rectangular, and microband electrodes through explicit GPU simulation
Thermal and Athermal Swarms of Self-Propelled Particles
Thermal Safety and Real-Time Predictability on Heterogeneous Embedded SoC Platforms
Theseus: A Library for Differentiable Nonlinear Optimization
Thickness computation of trimmed B-Rep model using GPU ray tracing
THOR: A New and Flexible Global Circulation Model to Explore Planetary Atmospheres
THOR: A Transparent Heterogeneous Open Resource framework
Thorough Evaluation of GPU Shared Memory Load and Store Instructions
Thousand core chips: a technology perspective
Thread Block Compaction for Efficient SIMT Control Flow
Thread-safe lattice Boltzmann for high-performance computing on GPUs
Thread-Scalable Evaluation of Multi-Jet Observables
Three Contributions to the Theory and Practice of Optimizing Compilers
Three Dimensional Fast Fourier Transform CUDA Implementation
Three dimensional tracking of gold nanoparticles using digital holographic microscopy
Three storage formats for sparse matrices on GPGPUs
Three-Dimension Fountain Simulation Based on GPU and Particle System
Three-Dimensional Image Warping on Programmable Graphics Hardware
Three-dimensional LBM simulations of buoyancy-driven flow using Graphics processing units
Three-Dimensional Modeling of Long-Wave Runup: Simulation of Tsunami Inundation with GPU-SPHysics
Throughput-Effective On-Chip Networks for Manycore Accelerators
Throughput-Oriented Analytical Models for Performance Estimation on Programmable Hardware Accelerators
ThunderGBM: Fast GBDTs and Random Forests on GPUs
ThunderSVM: A Fast SVM Library on GPUs and CPUs
Thwarting Piracy: Anti-debugging Using GPU-assisted Self-healing Codes
Tight Binding Molecular Dynamics on CPU and GPU clusters
Tile Based Procedural Terrain Generation in Real-Time
Tile-based Lightweight Integer Compression in GPU
Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System
Tiling for Performance Tuning on Different Models of GPUs
Tiling optimizations for stencil computations
Time dependent simulation of the Driven Lid Cavity at High Reynolds Number
Time Predictability of GPU Kernel on an HSA Compliant Platform
Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project
Time-stepping methods for the simulation of the self-assembly of nano-crystals in Matlab on a GPU
Time-varying clustering for local lighting and material design
TimeGraph: GPU scheduling for real-time multi-tasking environments
Tinker-HP: Accelerating Molecular Dynamics Simulations of Large Complex Systems with Advanced Point Dipole Polarizable Force Fields using GPUs and Multi-GPUs systems
TinyDL: Just-In-Time Deep Learning Solution For Constrained Embedded Systems
Tiramisu: A Code Optimization Framework for High Performance Systems
Titan: A Parallel Asynchronous Library for Multi-Agent and Soft-Body Robotics using NVIDIA CUDA
Titles: 100
open PDFs: 92
packages: 30