Papers on hgpu.org (.txt-file)
A declarative API for particle systems

A decompression pipeline for accelerating out-of-core volume rendering of time-varying data

A Deep Generative Deconvolutional Image Model

A Deep Learning Approach for Automatic Code Optimization in the Tiramisu Compiler

A deep learning approach to autonomous lunar landing

A Deep Learning Based Cost Model for Automatic Code Optimization

A Deep Learning Model for Loop Interchange

A design case study: CPU vs. GPGPU vs. FPGA
A Design Framework for Mapping Dataflow Graphs onto Heterogeneous Multiprocessor Platforms

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

A design pattern language for engineering (parallel) software: merging the PLPP and OPL projects

A design tool for efficient mapping of multimedia applications onto heterogeneous platforms

A Detailed GPU Cache Model Based on Reuse Distance Theory

A development of an accelerator board dedicated for multi-precision arithmetic operations and its application to Feynman loop integrals II

A Development Platform for Embedded Domain-Specific Languages

A directionally adaptive edge anti-aliasing filter

A Discussion of Selected Vienna-Libraries for Computational Science

A Distributed Approximation Algorithm for Mixed Packing-Covering Linear Programs

A Distributed Architecture for Smart Recycling Using Machine Learning

A distributed computing approach to improve the performance of the Parallel Ocean Program (v2.1)

A Distributed CPU-GPU Framework for Pairwise Alignments on Large-Scale Sequence Datasets

A Distributed Data Mining Framework Accelerated with Graphics Processing Units

A Distributed GPU-based Framework for real-time 3D Volume Rendering of Large Astronomical Data Cubes

A distributed multi-GPU system for high speed electron microscopic tomographic reconstruction
A Distributed-memory Tridiagonal Solver Based on a Specialised Data Structure Optimised for CPU and GPU Architectures

A Diversified Multi-Start Algorithm for Unconstrained Binary Quadratic Problems Leveraging the Graphics Processor Unit

A Domain Specific Approach to Heterogeneous Computing: From Availability to Accessibility

A Domain Specific Language for Performance Portable Molecular Dynamics Algorithms

A Domain-Extensible Compiler with Controllable Automation of Optimisations

A Domain-Specific Approach To Heterogeneous Parallelism

A Domain-Specific Language and Compiler for Stencil Computations on Short-Vector SIMD and GPU Architectures

A domain-specific language for geospatial computations on the GPU

A Domain-specific Language to Facilitate Software Defined Radio Parallel Executable Patterns Deployment on Heterogeneous Architectures

A Duality Based Approach for Realtime TV-L1 Optical Flow

A Dynamic Approach to Weighted Suffix Tree Construction Algorithm

A Dynamic Hash Table for the GPU

A Dynamic IP Lookup Architecture using Parallel Multiple Hash in GPU-based Software Router

A Dynamic Offload Scheduler for spatial multitasking on Intel Xeon Phi Coprocessor

A Dynamic Programming Model To Solve Optimisation Problems Using GPUs

A Dynamic Resource Management System for Network-Attached Accelerator Clusters

A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms

A dynamically configurable coprocessor for convolutional neural networks
A Fair Comparison of Modern CPUs and GPUs Running the Genetic Algorithm under the Knapsack Benchmark

A Fast 3D Spatial Analysis Technique Using Graphic Process Units

A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms

A Fast and Accurate GHT Implementation on CUDA

A Fast and Efficient SIFT Detector Using the Mobile GPU

A Fast and Efficient Simulation Framework for Modeling Heat Transport

A Fast and Generic GPU-Based Parallel Reduction Implementation

A fast and intuitive visual programming language (VPL) for constructing Computer Vision and Image processing systems on GPUs

A Fast and Rigorously Parallel Surface Voxelization Technique for GPU-Accelerated CFD Simulations

A fast and robust seed flooding algorithm on GPU for Voronoi diagram generation

A Fast and Secure Way to Prevent SQL Injection Attacks using Bitslice Technique and GPU Support

A Fast and Simple Approach to Merge and Merge Sort using Wide Vector Instructions

A Fast Batched Cholesky Factorization on a GPU

A Fast GEMM Implementation On a Cypress GPU

A fast GEMM implementation on the cypress GPU

A fast GPU algorithm for graph connectivity

A Fast GPU Implementation for Solving Sparse Ill-Posed Linear Equation Systems

A fast GPU-based Monte Carlo simulation of proton transport with detailed modeling of non-elastic interactions

A Fast GPU-Based Motion Estimation Algorithm for H.264/AVC

A Fast GVF Snake Algorithm on the GPU

A Fast High Quality Pseudo Random Number Generator for Graphics Processing Units

A fast high quality pseudo random number generator for nVidia CUDA

A fast hybrid time-synchronous/event approach to parallel discrete event simulation of queuing networks

A Fast Implementation of Parallel Discrete-Event Simulation on GPGPU

A Fast Implementation of the Octagon Abstract Domain on Graphics Hardware

A Fast Jet Finder Algorithm Using Graphic Processing Unit

A fast marching method based back projection algorithm for photoacoustic tomography in heterogeneous media

A Fast Method For Computing Principal Curvatures From Range Images

A Fast Mixed-Band Lifting Wavelet Transform on the GPU

A Fast Parallel Implementation of Queue-based Morphological Reconstruction using GPUs

A Fast Poisson Solver with Periodic Boundary Conditions for GPU Clusters in Various Configurations

A Fast Similarity Join Algorithm Using Graphics Processing Units

A fast stereo matching algorithm suitable for embedded real-time systems
A fast Texture-by-numbers synthesis method based on texture optimization

A Fast, GPU based, Dictionary Attack to OpenPGP Secret Keyrings
A Feedback Approach to Task Partitioning in Heterogeneous Architectures

A Field Guide to Genetic Programming

A fight for performance and accuracy of the matrix multiplication routines: CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem

A File System Using GPU-Accelerated File-wise Reliability Scheme

A Financial Benchmark for GPGPU Compilation

A Fine Grained Cycle Sharing System with Cooperative Multitasking on GPUs

A finite volume approach for the simulation of nonlinear dissipative acoustic wave propagation

A First Look at Bugs in LLM Inference Engines

A first look at integrated GPUs for green high-performance computing
A First Order Primal-Dual Algorithm for Nonconvex TV^q Regularization

A First Step Towards GPU-assisted Query Optimization

A Fixed-Complexity Sphere Decoder for MIMO Systems on Graphics Processing Units
A flexible algorithm for calculating pair interactions on SIMD architectures

A flexible high-performance Lattice Boltzmann GPU code for the simulations of fluid flows in complex geometries

A Flexible Kernel for Adaptive Mesh Refinement on GPU

A Flexible Multi-Volume Shader Framework for Arbitrarily Intersecting Multi-Resolution Datasets

A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters

A flexible simulation framework for graphics architectures

A fluid simulation system based on the MPS method

A Foray into Efficient Mapping of Algorithms to Hardware Platforms on Heterogeneous Systems

A Framework for 3D Model-Based Visual Tracking Using a GPU-Accelerated Particle Filter
Titles: 100
open PDFs: 91
packages: 14
