Papers on hgpu.org (.txt-file)
The density matrix renormalization group algorithm on kilo-processor architectures: implementation and trade-offs

The Design and Implementation of a GPU-enabled Multi-objective Tabu-search Intended for Real World and High-dimensional Applications

The Design and Implementation of a Verification Technique for GPU Kernels

The design and verification of Mumax3

The development and expansion of HOOMD-blue through six years of GPU proliferation

The discrete dipole approximation code DDscat.C++: features, limitations and plans

The distributed diagonal force decomposition method for parallelizing molecular dynamics simulations
The Distribution of OpenCL Kernel Execution Across Multiple Devices

The Dual-Path Execution Model for Efficient GPU Control Flow

The Dynamical Kernel Scheduler – Part 1

The Ecological Footprint of Neural Machine Translation Systems

The effects of nutrient chemotaxis on bacterial aggregation patterns with non-linear degenerate cross diffusion

The Fast and Wideband MoM Based on GPU and Two-Path AFS Acceleration

The fast evaluation of hidden Markov models on GPU
The fast multipole method on parallel clusters, multicore processors, and graphics processing units

The Fast Multipole Method on the Cell processor

The Fat-Link Computation On Large GPU Clusters for Lattice QCD

The Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming

The Flocking Based and GPU Accelerated Internet Traffic Classification

The Framework and Compilation Techniques for Directive-based GPU Cluster Programming

The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries

The Future in Mobile Multicore Computing

The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?

The GASPI API specification and its implementation GPI 2.0

The Geant4 Visualisation System – a multi-driver graphics system

The GeForce 6 series GPU architecture

The Genetic Convolutional Neural Network Model Based on Random Sample

The GENGA Code: Gravitational Encounters in N-body simulations with GPU Acceleration

The GPU as a high performance computational resource

The GPU as numerical simulation engine

The GPU Computing Revolution: From Multi-Core CPUs To Many-Core Graphics Processors

The GPU Enhanced Parallel Computing for Large Scale Data Clustering

The GPU enters computing’s mainstream
The GPU on biomedical image processing for color and phenotype analysis

The GPU on irregular computing: performance issues and contributions

The GPU on the simulation of cellular computing models
The GPU vs Phi Debate: Risk Analytics Using Many-Core Computing

The GPU-based High-performance Pattern-matching Algorithm for Intrusion Detection

The GPU-based Parallel Ant Colony System

The GPU-based String Matching System in Advanced AC Algorithm
The gputools package enables GPU computing in R

The GPUVerify Method: a Tutorial Overview

The Graphics Card as a Streaming Computer

The Graphics Processor as a Mathematical Coprocessor in MATLAB
The Heisenberg spin glass model on GPU: myths and actual facts

The Hierarchical Memory Machine Model for GPUs

The Hitchhiker’s Guide to Cross-Platform OpenCL Application Development

The impact of accelerator processors for high-throughput molecular modeling and simulation

The impact of diverse memory architectures on multicore consumer software: an industrial perspective from the video games domain

The Impact of GPU DVFS on the Energy and Performance of Deep Learning: an Empirical Study

The impact of GPU/Multicore in Signal Processing: a quantitative approach

The Impact of Modern Consumer GPUs on Commonly Used Secure Password Standards

The Implement of Common Beam Forming Using GPU

The implementation and optimization of Bitonic sort algorithm based on CUDA

The Implementation of a Real-Time Polyphase Filter

The implementation of Multi-Scale Retinex image enhancement algorithm based on GPU via CUDA

The Infrared behavior of SU(3) Nf=12 gauge theory -about the existence of conformal fixed point-

The integrated implementation of surgical simulations through modeling by means of imaging, comprehension, visualization, deformation, and collision detection in virtual environments

The International Exascale Software Project roadmap

The K-Anonymity Approach in Preserving the Privacy of E-Services that Implement Data Mining

The Landscape of GPU-Centric Communication

The Lattice Boltzmann Equation Method for Complex Flows

The Lattice Boltzmann Simulation on Multi-GPU Systems

The lattice-Boltzmann method for simulating gaseous phenomena

The Linear Direct Sparse Solver on GPU for Bundle Adjustment Method

The Living Application: a Self-Organising System for Complex Grid Tasks

The magic volume lens: an interactive focus+context technique for volume rendering

The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface

The method of improving performace of the GPU-accelerated 2D FDTD simulator
The Model of Computation of CUDA and its Formal Semantics

The MOPED framework: Object recognition and pose estimation for manipulation

The More We Share, The More We Have: Improving GPU performance through Register Sharing

The MOSIX Cluster Operating System for High-Performance Computing on Linux Clusters, Multi-Clusters, GPU Clusters and Clouds

The MOSIX Virtual OpenCL (VCL) Cluster Platform

The multi-GPU System with ExpEther

The Multi2Sim Simulation Framework: A CPU-GPU Model for Heterogeneous Computing

The multikernel: a new OS architecture for scalable multicore systems

The New Compiler Stack: A Survey on the Synergy of LLMs and Compilers

The nonequispaced FFT on graphics processing units

The OoO VLIW JIT Compiler for GPU Inference

The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in Materials Science

The openip open source image processing library

The OpenMP Cluster Programming Model

The Optimization of Algorithms in the Process of Temporal Data Mining Using the Compute Unified Device Architecture

The optimization of parallel Smith-Waterman sequence alignment using on-chip memory of GPGPU
The orthorectified technology for UAV aerial remote sensing image based on the Programmable GPU

The Parallel Bayesian Toolbox for High-performance Bayesian Filtering in Metrology

The Parallel Processing Based on CUDA for Convolution Filter FDK Reconstruction of CT
The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures

The PEPPHER Composition Tool: Performance-Aware Dynamic Composition of Applications for GPU-based Systems

The Performance Analysis Based on Heterogeneous Parallel Processors for Anisotropic Diffusion Filters

The performances of R GPU implementations of the GMRES method

The Physics of Singular Dislocation Structures in Continuum Dislocation Dynamics

The Plasma Simulation Code: A modern particle-in-cell code with load-balancing and GPU support

Titles: 100
open PDFs: 89
packages: 22
