Papers on hgpu.org (.txt-file)
An Embedded Stream Processor Core Based on Logarithmic Arithmetic for a Low-Power 3-D Graphics SoC
An Embedding Method for Interactive Simulation on Dynamic Surfaces
An emotionally biased ant colony algorithm for pathfinding in games
An Empirical Performance Evaluation of GPU-Enabled Graph-Processing Systems
An Empirical Study of Intel Xeon Phi
An Empirically Guided Optimization Framework for FPGA OpenCL
An Empirically Optimized Radix Sort for GPU
An End-to-End Programming Model for AI Engine Architectures
An End-to-End System for Unconstrained Face Verifcation with Deep Convolutional Neural Networks
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
An Energy Consumption Model for GPU Computing at Instruction Level
An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches
An energy model for graphics processing units
An Energy Optimization of a GPU Application by Grid Design Space Exploration
An Energy-Efficient Heterogeneous System for Embedded Learning and Classification
An Environment to Support GPU and Multicore Programming for Rapid, High Performance, Application Deployment
An EoS-meter of QCD transition from deep learning
An error correction solver for linear systems: Evaluation of mixed precision implementations
An Evaluation of Directive-Based Parallelization on the GPU Using a Parboil Benchmark
An evaluation of GPU acceleration for sparse reconstruction
An Evaluation of the GAMA/StarPU Frameworks for Heterogeneous Platforms: the Progressive Photon Mapping Algorithm
An Evaluative Comparison of Performance Portability across GPU Programming Models
An events based algorithm for distributing concurrent tasks on multi-core architectures
An Evolutionary Approach to Parallel Computing Using GPU
An Evolutionary Optimization Strategy Using Graphics Processing Units to Efficiently Investigate Gene-Gene Interactions in Genetic Association Studies
An Execution Model and Runtime For Heterogeneous Many-Core Systems
An execution model for adaptive load-balancing on multicore and multi-GPU systems
An Execution Model for OpenCL 2.0
An Experiment in Parallelizing the Fast Fourier Transform
An Experimental Distributed Visualization System for Petascale Computing
An experimental study of group-by and aggregation on CPU-GPU processors
An Experimental Study of SYCL Task Graph Parallelism for Large-Scale Machine Learning Workloads
An experimental study on performance portability of OpenCL kernels
An Explicit Algorithm for Porous Media Flow Simulation using GPUs
An exploration of CUDA and CBEA for a gravitational wave data-analysis application (Einstein@Home)
An exploration of CUDA and CBEA for a gravitational wave source-modelling application
An Exploration of OpenCL for a Numerical Relativity Application
An Exploration of OpenCL on Multiple Hardware Platforms for a Numerical Relativity Application
An Exploratory Study of High Performance Graphics Application Programming Interfaces
An extended GPU radiosity solver
An Extensible Component-based Approach to Simulation Systems on Heterogeneous Clusters
An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs
An FPGA Accelerator for Molecular Dynamics Simulation Using OpenCL
An FPGA Implementation of Information Theoretic Visual-Saliency System and Its Optimization
An FPGA-based processing pipeline for high definition stereo video
An FPGA-based Torus Communication Network
An FPGA-specific algorithm for direct generation of multi-variate Gaussian random numbers
An hardware architecture for 3D object tracking and motion estimation
An hybrid AES-256-GCM implementation for NEON CPU & CUDA GPU
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
An image-warping VR-architecture: design, implementation and applications
An implementation and its evaluation of password cracking tool parallelized on GPGPU
An implementation for quad-tree based solid object coloring using CUDA
An implementation of a reordering approach for increasing the product of diagonal entries in a sparse matrix
An Implementation of Coincidence Algorithm on Graphic Processing Units
An Implementation of Conflict-Free Offline Permutation on the GPU
An Implementation of Differential Evolution for Independent Tasks Scheduling on GPU
An implementation of level set based topology optimization using GPU
An Implementation of Real-Time Phased Array Radar Fundamental Functions on a DSP-Focused, High-Performance, Embedded Computing Platform
An implementation of tensor product patch smoothers on GPU
An Implementation of the Discontinuous Galerkin Method on Graphics Processing Units
An Implementation of the Smooth Particle Mesh Ewald Method on GPU Hardware
An implementation of the tile QR factorization for a GPU and multiple CPUs
An implicit multigrid solver for high-order compressible flow simulations on GPUs
An implicit Tensor-Mass solver on the GPU for soft bodies simulation
An Improved CUDA-Based Implementation of Differential Evolution on GPU
An Improved Image Segmentation Algorithm Based on GPU Parallel Computing
An improved implementation of Preconditioned Conjugate Gradient Method on GPU
An Improved Magma Gemm For Fermi Graphics Processing Units
An Improved Monte Carlo Ray Tracing for Large-Scale Rendering in Hadoop
An Improved Parallel Algorithm using GPU for Siting Observers on Terrain
An improved parallel contrast-aware halftoning
An Improved Parallel Implementation of 3D DRIE Simulation on GPU
An improved scheme of an interactive finite element model for 3D soft-tissue cutting and deformation
An Improved Study of Physically Based Fluid Simulation on GPU
An improved study of real-time fluid simulation on GPU
An improved visual inspection system using visual servo
An in-depth performance analysis of irregular workloads on VLIW APU
An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures
An Incompressible Navier-Stokes Equations Solver on the GPU Using CUDA
An initial performance review of software components for a heterogeneous computing platform
An innovative compilation tool-chain for embedded multi-core architectures
An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing
An Integrated Framework for Feature Extraction, Object Recognition and Stereo Vision with GPU support
An integrated GPU power and performance model
An intelligent semi-automatic application porting system for application accelerators
An Interest Point Based Illumination Condition Matching Approach to Photometric Registration Within Augmented Reality Worlds
An Interface for Halo Exchange Pattern
An Intermediate Library for Multi-GPUs Computing Skeletons
An Interrupt-Driven Work-Sharing For-Loop Scheduler
An Introduction to GPU Accelerated Surgical Simulation
An Introduction to High Performance Computing on AWS
An Introduction to the OpenCL Programming Model
An introductory tour of interactive rendering
An Investigation into Concurrent Expectation Propagation
An Investigation of Atomic Synchronization for Sort-Based Group-By Aggregation on GPUs
Titles: 100
open PDFs: 86
packages: 9