Papers on hgpu.org (.txt-file)
A memory access model for highly-threaded many-core architectures

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

A Memory Centric Kernel Framework for Accelerating Short-Range, Interactive Particle Simulation
A Memory Efficient Algorithm for Adaptive Multidimensional Integration with Multiple GPUs

A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU

A Memory Model for Scientific Algorithms on Graphics Processors

A memory optimization technique for software-managed scratchpad memory in GPUs

A Memory-Efficient Algorithm for Large-Scale Symmetric Tridiagonal Eigenvalue Problem on Multi-GPU Systems

A meshless hierarchical representation for light transport

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

A Method for Accelerating Bronchoscope Tracking Based on Image Registration by GPGPU

A method for decompilation of AMD GCN kernels to OpenCL

A Method for Large-Scale Terrain Rendering Based-on GPU
A method for speeding up beam-tracing simulation using thread-level parallelization

A Method to Improve Interest Point Detection and its GPU Implementation

A methodology for comparing optimization algorithms for auto-tuning

A Methodology for Translating C-Programs to OpenCL

A Metric for Performance Portability

A Micro-benchmark Suite for AMD GPUs

A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading

A middleware for efficient stream processing in CUDA

A minimal model for acoustic forces on Brownian particles

A Mixed Hierarchical Algorithm for Nearest Neighbor Search

A mixed precision semi-Lagrangian algorithm and its performance on accelerators

A Mixed-Precision Algorithm for the Solution of Lyapunov Equations on Hybrid CPU-GPU Platforms

A ML-based resource utilization OpenCL GPU-kernel fusion model

A mobile robot navigation with use of CUDA parallel architecture

A Model Extraction Attack on Deep Neural Networks Running on GPUs

A Model for Real Time Ocean Breaking Waves Animation

A model of dynamic compilation for heterogeneous compute platforms

A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs

A Modeling Approach based on UML/MARTE for GPU Architecture

A Modular Framework for Deformation and Fracture using GPU Shaders

A modular GPU raytracer using OpenCL for non-interactive graphics

A Modular System Architecture for Online Parallel Vision Pipelines

A molecular docking system using CUDA
A Monte Carlo Neutron Transport Code for Eigenvalue Calculations on a Dual-GPU System and CUDA Environment

A Moving Least Squares Based Approach for Contour Visualization of Multi-Dimensional Data

A MPI back-end for the OpenACC accULL. Exploiting OpenACC semantics in Message Passing Clusters

A Multi GPU Read Alignment Algorithm with Model-based Performance Optimization

A multi-agent architecture for scheduling of high performance services in a GPU cluster

A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations

A multi-GPU acceleration for 3D imaging of the prostate

A Multi-GPU Compute Solution for Optimized Genomic Selection Analysis

A Multi-GPU Programming Library for Real-Time Applications

A Multi-GPU Sources Reconstruction Method for Imaging Applications

A Multi-GPU Spectrometer System for Real-Time Wide Bandwidth Radio Signal Analysis

A multi-lane traffic simulation model via continuous cellular automata

A multi-platform linear algebra toolbox for finite element solvers on heterogeneous clusters
A Multi-Stage CUDA Kernel for Floyd-Warshall

A multi-Teraflop Constituency Parser using GPUs

A Multi-View Stereo Implementation on Massively Parallel Hardware

A multigrid solver for boundary value problems using programmable graphics hardware

A multiphysics and multiscale software environment for modeling astrophysical systems

A Mutable Hardware Abstraction to Replace Threads

A near real-time framework for extracting tip-sample forces in dynamic atomic force microscopy (dAFM)

A Nearest Neighbor Data Structure for Graphics Hardware

A Neighborhood Grid Data Structure for Massive 3D Crowd Simulation on GPU

A Network Intrusion Detection System Framework based on Hadoop and GPGPU

A Networked Dataflow Simulation Environment for Signal Processing and Data Mining Applications

A new adaptive model for real-time fluid simulation with complex boundaries

A New Approach for Color Character Extraction Based on Parallel Clustering
A new approach for sparse matrix vector product on NVIDIA GPUs

A New Approach of Performance Analysis of Certain Graph Algorithms

A new approach to the lattice Boltzmann method for graphics processing units
A New Architecture for Games and Simulations Using GPUs

A New Architecture for Optimization Modeling Frameworks

A New Class of Parallel Scheduling Algorithms

A New Compilation Path: From Python/NumPy to OpenCL

A New Cooperative Evolutionary Multi-Swarm Optimizer Algorithm Based on CUDA Parallel Architecture Applied to Solve Engineering Optimization Problems

A new CUDA-based GPU implementation of the two-dimensional Athena code

A New Data Layout For Set Intersection on GPUs

A New Digital Repository for Hyperspectral Imagery with Unmixing-Based Retrieval Functionality Implemented on GPUs

A New Era in Scientific Computing: Domain Decomposition Methods in Hybrid CPU-GPU Architectures
A new GPU-accelerated hydrodynamical code for numerical simulation of interacting galaxies

A New GPU-based Approach to the Shortest Path Problem

A New GPU-Based Neighbor Search Algorithm for Fluid Simulations
A new gravitational N-body simulation algorithm for investigation of cosmological chaotic advection

A New High Performance GPU-based Approach to Prime Numbers Generation

A new method for GPU based irregular reductions and its application to k-means clustering

A New method to Design Accurate Images with Tree Structural Transformations

A New Morphological Anomaly Detection Algorithm for Hyperspectral Images and its GPU Implementation

A New Non-Blocking Approach on GPU Dynamical Memory Management

A New Parallel Implementation of DSI Based Disparity Computation Using CUDA

A New Parallel Method of Smith-Waterman Algorithm on a Heterogeneous Platform

A new parallel tool for classification of remotely sensed imagery

A new parallel video understanding and retrieval system
A new parallelisation technique for heterogeneous CPUs

A new physics engine with automatic process distribution between CPU-GPU
A new ray-tracing scheme for 3D diffuse radiation transfer on highly parallel architectures

A new representation of intensity atlas for GPU-accelerated instance generation
A New Software Based GPU Framework

A New Sparse Matrix Vector Multiplication GPU Algorithm Designed for Finite Element Problems

A New Tool for Classification of Satellite Images Available from Google Maps: Efficient Implementation in Graphics Processing Units

A new way in few-body scattering calculations: discretized Faddeev equations solved on GPU

A Newcomer In The PGAS World – UPC++ vs UPC: A Comparative Study

A Non-linear GPU Thread Map for Triangular Domains

A Normalized Particle Swarm Optimization Algorithm to Price Complex Chooser Option and Accelerating its Performance with GPU

Titles: 100
open PDFs: 88
packages: 14
