## Papers on hgpu.org (.txt-file)

A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation

A Comparison of GPU Execution Time Prediction using Machine Learning and Analytical Modeling

A Comparison of Gradient Estimation Methods for Volume Rendering on Unstructured Meshes

A Comparison of High-Level Design Tools for SoC-FPGA on Disparity Map Calculation Example

A Comparison of Many-threaded Differential Evolution and Genetic Algorithms on CUDA

A Comparison of Massively Parallel Programming Models Through Applications in Sound Propagation and Jitter Measurement

A Comparison of Modern GPU and CPU Architectures: And the Common Convergence of Both

A comparison of period finding algorithms

A Comparison of Potential Interfaces for Batched BLAS Computations

A Comparison of Sequential and GPU Implementations of Iterative Methods to Compute Reachability Probabilities

A Comparison of Serial & Parallel Particle Filters for Time Series Analysis

A Comparison of Statistical Techniques for Detecting Side-Channel Information Leakage in Cryptographic Devices

A Comparison of Support Vector Machines Training GPU-Accelerated Open Source Implementations

A Comparison of the performance of HPC Accelerators

A Comparison of Two Methods for Geometric Milling Simulation Accelerated by GPU

A Comparison of xPU Platforms Exemplified with Ray Tracing Algorithms

A Compile-Time Managed Multi-Level Register File Hierarchy

A Compiler and Runtime for Heterogeneous Computing

A compiler for high performance computing with many-core accelerators

A Compiler for Throughput Optimization of Graph Algorithms on GPUs

A compiler framework for optimization of affine loop nests for gpgpus

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

A Complete and Efficient CUDA-Sharing Solution for HPC Clusters

A Complete Descritpion of the UnPython and Jit4GPU Framework

A complete modular resultant algorithm targeted for realization on graphics hardware

A comprehensive analysis and parallelization of an image retrieval algorithm

A Comprehensive Performance Analysis of HSA and OpenCL 2.0

A Comprehensive Performance Comparison of CUDA and OpenCL

A comprehensive study of Dynamic Memory Management in OpenCL kernels

A Comprehensive Survey on Various Evolutionary Algorithms on GPU

A Computational Comparison of Basis Updating Schemes for the Simplex Algorithm on a CPU-GPU System

A Computational Model of Afterimages

A Computational Realization of a Semi-Lagrangian Method for Solving the Advection Equation

A computationally efficient and scalable approach for privacy preserving kNN classification

A Computationally Efficient Approach for Exemplar-based Color Image Inpainting using GPU

A Computationally Efficient Parallel Kernel Regression for Image Reconstruction

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

A computing origami: Optimized code generation for emerging parallel platforms

A constant-space belief propagation algorithm for stereo matching

A Consumer Application for GPGPUs: Desktop Search

A Contour-Guided Deformable Image Registration Algorithm for Adaptive Radiotherapy

A control-structure splitting optimization for GPGPU

A convex formulation for color image segmentation in the context of passive emitter localization

A Convex Relaxation Approach to Space Time Multi-view 3D Reconstruction

A Convolutional Neural Network Cascade for Face Detection

A CPU and GPU Heterogeneous Processing of Multimedia Data by using OpenCL

A CPU-GPU Hybrid Runtime for the Aeminium Language

A Cross-Input Adaptive Framework for GPU Programs Optimization

A CUDA Back-End for the Equelle Compiler

A CUDA Based Implementation of an Image Authentication Algorithm

A CUDA based Solution to the Multidimensional Knapsack Problem Using the Ant Colony Optimization

A CUDA Implementation of Independent Component Analysis in the Time-Frequency Domain

A CUDA implementation of the High Performance Conjugate Gradient benchmark

A CUDA Kernel Scheduler Exploiting Static Data Dependencies

A CUDA Monte Carlo simulator for radiation therapy dosimetry based on Geant4

A CUDA SIMT Interpreter for Genetic Programming

A CUDA SIMT interpreter for genetic programming. Revised

A CUDA-Based Cooperative Evolutionary Multi-Swarm Optimization Applied to Engineering Problems

A CUDA-Based Implementation of Stable Fluids in 3D with Internal and Moving Boundaries

A CUDA-based parallel implementation of K-nearest neighbor algorithm

A CUDA-Based Real Parameter Optimization Benchmark

A CUDA-enabled Parallel Implementation of Collaborative Filtering

A curved-element unstructured discontinuous Galerkin method on GPUs for the Euler equations

A Customized 3D GPU Poisson Solver for Free BCs

A Data Communication Scheduler for Stream Programs on CPU-GPU Platform

A Data Parallel Algorithm for Seismic Raytracing

A data parallel approach to genetic programming using programmable graphics hardware

A data parallel view on polyhedral process networks

A Data-Driven Model for Anisotropic Heterogeneous Subsurface Scattering

A Data-oriented Method for Scheduling Dependent Tasks on High-density Multi-GPU Systems

A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms

A Data-Parallel Extension to Ruby for GPGPU

A Data-Parallel Graphics Pipeline Implemented in OpenCL

A dataflow-like programming model for future hybrid clusters

A declarative API for particle systems

A decompression pipeline for accelerating out-of-core volume rendering of time-varying data

A Deep Generative Deconvolutional Image Model

A design case study: CPU vs. GPGPU vs. FPGA

A Design Framework for Mapping Dataflow Graphs onto Heterogeneous Multiprocessor Platforms

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

A design pattern language for engineering (parallel) software: merging the PLPP and OPL projects

A design tool for efficient mapping of multimedia applications onto heterogeneous platforms

A Detailed GPU Cache Model Based on Reuse Distance Theory

A development of an accelerator board dedicated for multi-precision arithmetic operations and its application to Feynman loop integrals II

A directionally adaptive edge anti-aliasing filter

A Discussion of Selected Vienna-Libraries for Computational Science

A Distributed Approximation Algorithm for Mixed Packing-Covering Linear Programs

A distributed computing approach to improve the performance of the Parallel Ocean Program (v2.1)

A Distributed CPU-GPU Framework for Pairwise Alignments on Large-Scale Sequence Datasets

A Distributed Data Mining Framework Accelerated with Graphics Processing Units

A Distributed GPU-based Framework for real-time 3D Volume Rendering of Large Astronomical Data Cubes

A distributed multi-GPU system for high speed electron microscopic tomographic reconstruction

A Diversified Multi-Start Algorithm for Unconstrained Binary Quadratic Problems Leveraging the Graphics Processor Unit

A Domain Specific Approach to Heterogeneous Computing: From Availability to Accessibility

A Domain Specific Language for Performance Portable Molecular Dynamics Algorithms

A Domain-Specific Approach To Heterogeneous Parallelism

A Domain-Specific Language and Compiler for Stencil Computations on Short-Vector SIMD and GPU Architectures

Titles: 100

open PDFs: 90

packages: 14