Papers on hgpu.org (.txt-file)
A Comparative Study of Neighborhood Filters for Artifact Reduction in Iterative Low-Dose CT

A Comparative Study of OpenACC Implementations

A Comparative Study of Parallel Algorithms for the Girth Problem

A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation

A Comparative Study on Exact Triangle Counting Algorithms on the GPU

A Comparison between GPU-based Volume Ray Casting Implementations: Fragment Shader, Compute Shader, OpenCL, and CUDA

A comparison between parallelization approaches in molecular dynamics simulations on GPUs

A Comparison of Algebraic Multigrid Preconditioners using Graphics Processing Units and Multi-Core Central Processing Units

A comparison of CPU and GPU performance for Fourier pseudospectral simulations of the Navier-Stokes, Cubic Nonlinear Schrodinger and Sine Gordon Equations

A Comparison of CPU and OpenCL Parallelization Methods for Correlation and Graph Layout Algorithms used in the Network Analysis of High Dimensional Data

A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation

A Comparison of GPU Execution Time Prediction using Machine Learning and Analytical Modeling

A Comparison of Gradient Estimation Methods for Volume Rendering on Unstructured Meshes

A Comparison of High-Level Design Tools for SoC-FPGA on Disparity Map Calculation Example

A comparison of HPC-based quantum computing simulators using Quantum Volume

A Comparison of Many-threaded Differential Evolution and Genetic Algorithms on CUDA

A Comparison of Massively Parallel Programming Models Through Applications in Sound Propagation and Jitter Measurement

A Comparison of Modern GPU and CPU Architectures: And the Common Convergence of Both

A Comparison of OpenCL, CUDA, and HIP as Compilation Targets for a Functional Array Language

A Comparison of Optimal Scanline Voxelization Algorithms

A comparison of period finding algorithms

A Comparison of Potential Interfaces for Batched BLAS Computations

A Comparison of Sequential and GPU Implementations of Iterative Methods to Compute Reachability Probabilities

A Comparison of Serial & Parallel Particle Filters for Time Series Analysis

A Comparison of Statistical Techniques for Detecting Side-Channel Information Leakage in Cryptographic Devices

A Comparison of Support Vector Machines Training GPU-Accelerated Open Source Implementations

A Comparison of the performance of HPC Accelerators

A Comparison of the Performance of the Molecular Dynamics Simulation Package GROMACS Implemented in the SYCL and CUDA Programming Models

A Comparison of Two Methods for Geometric Milling Simulation Accelerated by GPU

A Comparison of xPU Platforms Exemplified with Ray Tracing Algorithms
A Compile-Time Managed Multi-Level Register File Hierarchy

A Compiler and Runtime for Heterogeneous Computing

A compiler for high performance computing with many-core accelerators

A Compiler for Throughput Optimization of Graph Algorithms on GPUs

A compiler framework for optimization of affine loop nests for gpgpus

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

A Compiler Infrastructure for Accelerator Generators

A Compiler Infrastructure for Embedded Multicore SoCs

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

A Complete and Efficient CUDA-Sharing Solution for HPC Clusters

A Complete Descritpion of the UnPython and Jit4GPU Framework

A complete modular resultant algorithm targeted for realization on graphics hardware

A comprehensive analysis and parallelization of an image retrieval algorithm

A Comprehensive Benchmark of Deep Learning Libraries on Mobile Devices

A Comprehensive Deep Learning Library Benchmark and Optimal Library Selection

A Comprehensive Performance Analysis of HSA and OpenCL 2.0

A Comprehensive Performance Comparison of CUDA and OpenCL

A comprehensive study of Dynamic Memory Management in OpenCL kernels

A Comprehensive Survey on Various Evolutionary Algorithms on GPU

A Computational Comparison of Basis Updating Schemes for the Simplex Algorithm on a CPU-GPU System

A Computational Model of Afterimages

A Computational Realization of a Semi-Lagrangian Method for Solving the Advection Equation

A computationally efficient and scalable approach for privacy preserving kNN classification

A Computationally Efficient Approach for Exemplar-based Color Image Inpainting using GPU

A Computationally Efficient Parallel Kernel Regression for Image Reconstruction

A Compute Graph Simulation and Implementation Framework Targeting AMD Versal AI Engines

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

A Computing Kernel for Network Binarization on PyTorch

A computing origami: Optimized code generation for emerging parallel platforms

A constant-space belief propagation algorithm for stereo matching

A Consumer Application for GPGPUs: Desktop Search

A Container-Based Workflow for Distributed Training of Deep Learning Algorithms in HPC Clusters

A Contour-Guided Deformable Image Registration Algorithm for Adaptive Radiotherapy

A control-structure splitting optimization for GPGPU
A convex formulation for color image segmentation in the context of passive emitter localization

A Convex Relaxation Approach to Space Time Multi-view 3D Reconstruction

A Convolutional Neural Network Cascade for Face Detection

A CPU and GPU Heterogeneous Processing of Multimedia Data by using OpenCL

A CPU-GPU Hybrid Runtime for the Aeminium Language

A CPU+FPGA OpenCL Heterogeneous Computing Platform for Multi-Kernel Pipeline

A Cross-Input Adaptive Framework for GPU Programs Optimization

A Cross-platform Evaluation of Graphics Shader Compiler Optimization

A CUDA Back-End for the Equelle Compiler

A CUDA Based Implementation of an Image Authentication Algorithm

A CUDA based Solution to the Multidimensional Knapsack Problem Using the Ant Colony Optimization

A CUDA Implementation of Independent Component Analysis in the Time-Frequency Domain

A CUDA implementation of the High Performance Conjugate Gradient benchmark

A CUDA Kernel Scheduler Exploiting Static Data Dependencies

A CUDA Monte Carlo simulator for radiation therapy dosimetry based on Geant4

A CUDA SIMT Interpreter for Genetic Programming

A CUDA SIMT interpreter for genetic programming. Revised

A CUDA-Based Cooperative Evolutionary Multi-Swarm Optimization Applied to Engineering Problems

A CUDA-Based Implementation of Stable Fluids in 3D with Internal and Moving Boundaries
A CUDA-based parallel implementation of K-nearest neighbor algorithm
A CUDA-Based Real Parameter Optimization Benchmark

A CUDA-enabled Parallel Implementation of Collaborative Filtering

A curved-element unstructured discontinuous Galerkin method on GPUs for the Euler equations

A Customized 3D GPU Poisson Solver for Free BCs

A Data Communication Scheduler for Stream Programs on CPU-GPU Platform
A Data Parallel Algorithm for Seismic Raytracing

A data parallel approach to genetic programming using programmable graphics hardware

A data parallel view on polyhedral process networks
A Data-Driven Model for Anisotropic Heterogeneous Subsurface Scattering

A Data-oriented Method for Scheduling Dependent Tasks on High-density Multi-GPU Systems

A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms

A Data-Parallel Extension to Ruby for GPGPU

A Data-Parallel Graphics Pipeline Implemented in OpenCL

A dataflow-like programming model for future hybrid clusters

Titles: 100
open PDFs: 92
packages: 21
