Papers on hgpu.org (.txt-file)
An experimental study of group-by and aggregation on CPU-GPU processors
An Experimental Study of SYCL Task Graph Parallelism for Large-Scale Machine Learning Workloads
An experimental study on performance portability of OpenCL kernels
An Explicit Algorithm for Porous Media Flow Simulation using GPUs
An exploration of CUDA and CBEA for a gravitational wave data-analysis application (Einstein@Home)
An exploration of CUDA and CBEA for a gravitational wave source-modelling application
An Exploration of OpenCL for a Numerical Relativity Application
An Exploration of OpenCL on Multiple Hardware Platforms for a Numerical Relativity Application
An Exploratory Study of High Performance Graphics Application Programming Interfaces
An extended GPU radiosity solver
An Extensible Component-based Approach to Simulation Systems on Heterogeneous Clusters
An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs
An FPGA Accelerator for Molecular Dynamics Simulation Using OpenCL
An FPGA Implementation of Information Theoretic Visual-Saliency System and Its Optimization
An FPGA-based processing pipeline for high definition stereo video
An FPGA-based Torus Communication Network
An FPGA-specific algorithm for direct generation of multi-variate Gaussian random numbers
An hardware architecture for 3D object tracking and motion estimation
An hybrid AES-256-GCM implementation for NEON CPU & CUDA GPU
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
An image-warping VR-architecture: design, implementation and applications
An implementation and its evaluation of password cracking tool parallelized on GPGPU
An implementation for quad-tree based solid object coloring using CUDA
An implementation of a reordering approach for increasing the product of diagonal entries in a sparse matrix
An Implementation of Coincidence Algorithm on Graphic Processing Units
An Implementation of Conflict-Free Offline Permutation on the GPU
An Implementation of Differential Evolution for Independent Tasks Scheduling on GPU
An implementation of level set based topology optimization using GPU
An Implementation of Real-Time Phased Array Radar Fundamental Functions on a DSP-Focused, High-Performance, Embedded Computing Platform
An Implementation of the Discontinuous Galerkin Method on Graphics Processing Units
An Implementation of the Smooth Particle Mesh Ewald Method on GPU Hardware
An implementation of the tile QR factorization for a GPU and multiple CPUs
An implicit multigrid solver for high-order compressible flow simulations on GPUs
An implicit Tensor-Mass solver on the GPU for soft bodies simulation
An Improved CUDA-Based Implementation of Differential Evolution on GPU
An Improved Image Segmentation Algorithm Based on GPU Parallel Computing
An improved implementation of Preconditioned Conjugate Gradient Method on GPU
An Improved Magma Gemm For Fermi Graphics Processing Units
An Improved Monte Carlo Ray Tracing for Large-Scale Rendering in Hadoop
An Improved Parallel Algorithm using GPU for Siting Observers on Terrain
An improved parallel contrast-aware halftoning
An Improved Parallel Implementation of 3D DRIE Simulation on GPU
An improved scheme of an interactive finite element model for 3D soft-tissue cutting and deformation
An Improved Study of Physically Based Fluid Simulation on GPU
An improved study of real-time fluid simulation on GPU
An improved visual inspection system using visual servo
An in-depth performance analysis of irregular workloads on VLIW APU
An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures
An Incompressible Navier-Stokes Equations Solver on the GPU Using CUDA
An initial performance review of software components for a heterogeneous computing platform
An innovative compilation tool-chain for embedded multi-core architectures
An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing
An Integrated Framework for Feature Extraction, Object Recognition and Stereo Vision with GPU support
An integrated GPU power and performance model
An intelligent semi-automatic application porting system for application accelerators
An Interest Point Based Illumination Condition Matching Approach to Photometric Registration Within Augmented Reality Worlds
An Interface for Halo Exchange Pattern
An Intermediate Library for Multi-GPUs Computing Skeletons
An Interrupt-Driven Work-Sharing For-Loop Scheduler
An Introduction to GPU Accelerated Surgical Simulation
An Introduction to High Performance Computing on AWS
An Introduction to the OpenCL Programming Model
An introductory tour of interactive rendering
An Investigation into Concurrent Expectation Propagation
An Investigation of Atomic Synchronization for Sort-Based Group-By Aggregation on GPUs
An investigation of GPU-based stiff chemical kinetics integration methods
An Investigation of the Performance Portability of OpenCL
An Investigation of Unified Memory Access Performance in CUDA
An MDE Approach for Automatic Code Generation from MARTE to OpenCL
An MPI-Based Python Framework for Distributed Training with Keras
An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)
An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters
An MPI-CUDA Implementation for the Compression of DEM
An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems
An N log N Parallel Fast Direct Solver for Kernel Matrices
An octree-based proxy for collision detection in large-scale particle systems
An On-Demand Fast Parallel Pseudo Random Number Generator with Applications
An open framework for rapid prototyping of signal processing applications
An open source finite-difference time-domain solver for room acoustics using graphics processing units
An open source MATLAB program for fast numerical Feynman integral calculations for open quantum system dynamics on GPUs
An Open-source FPGA Library for Data Sorting
An Open-Source GPU-Accelerated Feature Extraction Tool
An OpenCL 3D FFT for Molecular Dynamics Simulations on Multiple FPGAs
An OpenCL design of the Bob Jenkins lookup3 hash function using the Xilinx SDAccel Development Environment
An OpenCL Fast Fourier Transformation
An OpenCL framework for heterogeneous multicores with local memory
An OpenCL implementation for the solution of TDSE on GPU and CPU architectures
An OpenCL implementation of a forward sampling algorithm for CP-logic
An OpenCL Method of Parallel Sorting Algorithms for GPU Architecture
An OpenCL Runtime and Scheduler for Embedded Multicore DSP Parallel Systems
An OpenCL-Based FPGA Accelerator for Faster R-CNN
An OpenCL-based Implementation of H.264 Encoder
An OpenCL-based Monte Carlo dose calculation engine (oclMC) for coupled photon-electron transport
An OpenCL(TM) Deep Learning Accelerator on Arria 10
An OpenMP Programming Environment on Mobile Devices
An optimal k-exclusion real-time locking protocol motivated by multi-GPU systems
An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU implementation
Titles: 100
open PDFs: 90
packages: 14