Papers on hgpu.org (.txt-file)
Towards Adaptive GPU Resource Management for Embedded Real-Time Systems
Towards Alignment of Parallelism in SYCL and ISO C++
Towards an automatic generation of dense linear algebra solvers on parallel architectures
Towards an Effective Unified Programming Model for Many-Cores
Towards an embedded biologically-inspired machine vision processor
Towards an interactive and automated script feature analysis of 3D scanned cuneiform tablets
Towards automated kernel selection in machine learning systems: A SYCL case study
Towards Automated Learning of Object Detectors
Towards Automatic C Programs Optimization and Parallelization using the PIPS-PoCC Integration
Towards automatic Digital Surface Model generation using a Graphics Processing Unit
Towards Automatic Learning of Heuristics for Mechanical Transformations of Procedural Code
Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs
Towards Automating Multi-dimensional Data Decomposition for Executing a Single-GPU Code on a Multi-GPU System
Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing
Towards Building Error Resilient GPGPU Applications
Towards Chip-on-Chip Neuroscience: Fast Mining of Frequent Episodes Using Graphics Processors
Towards chip-on-chip neuroscience: fast mining of neuronal spike streams using graphics hardware
Towards Co-execution on Commodity Heterogeneous Systems: Optimizations for Time-Constrained Scenarios
Towards Code Generation from Design Models for Embedded Systems on Heterogeneous CPU-GPU Platforms
Towards Comprehensive Parametric Code Generation Targeting Graphics Processing Units in Support of Scientific Computation
Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems
Towards Distortion-Predictable Embedding of Neural Networks
Towards Distributed Heterogenous High-Performance Computing with ViennaCL
Towards Domain-specific Computing for Stencil Codes in HPC
Towards dynamic reconfigurable load-balancing for hybrid desktop platforms
Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA
Towards Efficient GPU Sharing on Multicore Processors
Towards Efficient Indexing of Spatiotemporal Trajectories on the GPU for Distance Threshold Similarity Searches
Towards Efficient Large-Scale Graph Neural Network Computing
Towards Efficient Risk Quantification-Using GPUs and Variance Reduction Technique
Towards energy efficiency and productivity for decision making in mobile robot navigation
Towards Enhancing Performance, Programmability, and Portability in Heterogeneous Computing
Towards fast and certified multiple-precision libraries
Towards Faster Cloth Simulation: Examining the Preconditioned Conjugate Gradient
Towards fully user transparent task and data parallel image processing
Towards global composition of performance-aware components for GPU-based systems
Towards Good Practices for Very Deep Two-Stream ConvNets
Towards GPGPU Assisted Computing in Virtualized Environments
Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud
Towards Green Computing: A Survey of Performance and Energy Efficiency of Different Platforms using OpenCL
Towards High Performance Java-based Deep Learning Frameworks
Towards High Speed Aerial Tracking of Agile Targets
Towards High-Performance and Cost-Effective Distributed Storage Systems with Information Dispersal Algorithms
Towards Improving Programmability of Heterogeneous Parallel Architectures
Towards Intelligent Runtime Framework for Distributed Heterogeneous Systems
Towards Interactive Visual Exploration of Parallel Programs using a Domain-specific Language
Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors
Towards large-scale network analytics
Towards Lattice Quantum Chromodynamics on FPGA devices
Towards making the most of NLP-based device mapping optimization for OpenCL kernels
Towards Memory-Efficient Answering of Tree-Shaped SPARQL Queries using GPUs
Towards metaprogramming for parallel systems on a chip
Towards microsecond biological molecular dynamics simulations on hybrid processors
Towards Modeling Energy Consumption of Xeon Phi
Towards multi-GPU support for visualization
Towards Multi-GPU Support in the Marrow Skeleton Framework
Towards On-Chip Optical FFTs for Convolutional Neural Networks
Towards On-Line Digital Doubles
Towards paradisEO-MO-GPU: a framework for GPU-based local search metaheuristics
Towards Parallel Programming Models for Predictability
Towards Performance Portable Programming for Distributed Heterogeneous Systems
Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems
Towards Performance-Portable, Scalable, and Convenient Linear Algebra
Towards Portable Performance for Explicit Hydrodynamics Codes
Towards Porting a Real-World Seismological Application to the Intel MIC Architecture
Towards Predictable Real-Time Performance on Multi-Core Platforms
Towards Rapid Prototyping of Parallel and HPC Applications (GPU Focus)
Towards real time 2D to 3D registration for ultrasound-guided endoscopic and laparoscopic procedures
Towards real time 3D tracking and reconstruction on a GPU using Monte Carlo simulations
Towards real time vision based UUV navigation using GPU technology
Towards real-time radiation therapy: GPU accelerated superposition/convolution
Towards real-time tomography: Fast reconstruction algorithms and GPU implementation
Towards reverse engineering the brain: Modeling abstractions and simulation frameworks
Towards robust automatic detection of vulnerable road users: monocular pedestrian tracking from a moving vehicle
Towards scalar synchronization in SIMT architectures
Towards shared memory consistency models for GPUs
Towards solving the Table Maker’s Dilemma on GPU
Towards Studying the Effect of Compiler Optimizations and Software Randomization on GPU Reliability
Towards systematic exploration of tradeoffs for medical image registration on heterogeneous platforms
Towards Understanding and Mitigating Memory-Access Challenges in Computing Systems
Towards Unified Analysis of GPU Consistency
Towards Unified INT8 Training for Convolutional Neural Network
Towards user transparent parallel multimedia computing on GPU-clusters
Towards Utilizing GPUs in Information Visualization: A Model and Implementation of Image-Space Operations
Towards Utilizing Remote GPUs for CUDA Program Execution
TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
Track finding in ATLAS using GPUs
Tracking 3d Pose of Rigid Object by Sparse Template Matching
Tracking and Clustering Salient Features in Image Sequences
Tracking humans interacting with the environment using efficient hierarchical sampling and layered observation models
Tracking Many Solution Paths of a Polynomial Homotopy on a Graphics Processing Unit
Tradeoff analysis and optimization of power delivery networks with on-chip voltage regulation
Tradeoffs in designing accelerator architectures for visual computing
Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration
Training a Feedback Loop for Hand Pose Estimation
Training a Vision Transformer from scratch in less than 24 hours with 1 GPU
Training DNN Models over Heterogeneous Clusters with Optimal Performance
Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)
Titles: 100
open PDFs: 93
packages: 18