Papers on hgpu.org (.txt-file)
ALICE HLT High Speed Tracking on GPU

Alignator: A GPU powered software package for robust fiducial-less alignment of cryo tilt-series
Alignment invariant image comparison implemented on the GPU

All You Need Is Binary Search! A Practical View on Lightweight Database Indexing on GPUs

All-pairs Shortest Path Algorithm based on MPI+CUDA Distributed Parallel Programming Model

All-Pairs Shortest Path Algorithms Using CUDA

All-pairs shortest-paths for large graphs on the GPU

Alpaka – An Abstraction Library for Parallel Kernel Acceleration

Alpha-Beta Divergences Discover Micro and Macro Structures in Data

ALPINIST: An Annotation-Aware GPU Program Optimizer

ALPyNA: Acceleration of Loops in Python for Novel Architectures

Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes

Ambient Occlusion and Edge Cueing for Enhancing Real Time Molecular Visualization

AMD MI300X GPU Performance Analysis

Ameliorating Memory Contention of OLAP operators on GPU Processors

American Basket Option Pricing on a multi GPU Cluster

American Options Based on Malliavin Calculus and Nonparametric Variance Reduction Methods

American Options Pricing on Multi-core Graphic Cards
AMGCL – A C++ library for efficient solution of large sparse linear systems

AMGCL: an Efficient, Flexible, and Extensible Algebraic Multigrid Implementation

An 8.6 mW 25 Mvertices/s 400-MFLOPS 800-MOPS 8.91 mm Multimedia Stream Processor Core for Mobile Applications

An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code

An abstract object oriented runtime system for heterogeneous parallel architecture

An Accelerated 3D Navier-Stokes Solver for Flows in Turbomachines

An Accelerated IHS Transform Fusion of Remote Sensing Image Data Based on GPU
An acceleration of the algorithm for the nurse rerostering problem on a graphics processing unit

An Accelerator based on the rho-VEX Processor: an Exploration using OpenCL

An adaptative game loop architecture with automatic distribution of tasks between CPU and GPU

An Adaptative Multi-GPU based Branch-and-Bound. A Case Study: the Flow-Shop Scheduling Problem

An adaptive Expectation-Maximization algorithm with GPU implementation for electron cryomicroscopy
An Adaptive Framework for Managing Heterogeneous Many-Core Clusters

An adaptive framework for visualizing unstructured grids with time-varying scalar fields

An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

An Adaptive Multi-Spline Refinement Algorithm in Simulation Based Sailboat Trajectory Optimization Using Onboard Multi-Core Computer Systems

An Adaptive Multiresolution Mesh Representation for CPU-GPU Coupled Computation

An adaptive octree textures painting algorithm
An adaptive performance modeling tool for GPU architectures

An Adaptive Step Size GPU ODE Solver for Simulating the Electric Cardiac Activity

An algebraic parallel treecode in arbitrary dimensions

An Algorithm for Detecting Cycles in Undirected Graphs using CUDA Technology

An Algorithm for Fast Edit Distance Computation on GPUs

An algorithm-architecture co-design framework for gridding reconstruction using FPGAs

An Analysis of Conventional and Heterogeneous Workloads on Production Supercomputing Resources

An Analysis of OpenACC Programming Model: Image Processing Algorithms as a Case Study

An Analysis of Programmer Productivity versus Performance for High Level Data Parallel Programming

An Analysis of Variation Between Cores For Intel Xeon Phi Knights Corner And Xeon Phi Knights Landing

An Analytical Approach of Mars Rovers by Using GPU Technology and Genetic Algorithm

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

An application of graphical numerical accelerators in simulations of ion-transport through biological membranes

An Approach for Maximizing Performance on Heterogeneous Clusters of CPU and GPU

An approach for the effective utilization of GP-GPUs in parallel combined simulation
An Approach for Traffic Forecast with GPU Computing & Cellular Automata Model

An approach of tool paths generation for CNC machining based on CUDA
An Approach to Efficient FEM Simulations on Graphics Processing Units Using CUDA

An approach to performance portability through generic programming

An Architectural Journey into RISC Architectures for HPC Workloads

An architecture design of GPU-accelerated VoD streaming servers with network coding
An Architecture for Distributed Behavioral Models with GPUs

An architecture for real time fluid simulation using multiple GPUs

An asymmetric distributed shared memory model for heterogeneous parallel systems

An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing

An Asynchronous Event Communication Technique for Soft Real-Time GPGPU Applications

An Auto-Programming Approach to Vulkan

An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

An auto-tuning framework for parallel multicore stencil computations

An Auto-tuning Solution to Data Streams Clustering in OpenCL

An Automated Approach for SIMD Kernel Generation for GPU based Software Acceleration

An Automated Tool for Converting Directive Based C Code Into Parallel CUDA Code

An Automated Video Surveillance System Using Viewpoint Feature Histogram and CUDA-enabled GPUs

An Automatic Host and Device Memory Allocation Method for OpenMPC

An Automatic Input-Sensitive Approach for Heterogeneous Task Partitioning

An Automatic OpenCL Compute Kernel Generator for Basic Linear Algebra Operations

An Automatic Speech Recognition Application Framework for Highly Parallel Implementations on the GPU

An Autotuning Framework for Intel Xeon Phi Platforms

An effective GPU implementation of breadth-first search

An Effective Model of CPU/GPU Collaborative Computing in GPU Clusters

An Efficient Acceleration of Digital Fonensics Search Using GPGPU

An Efficient Approach for Generating Pencil Filter and Its Implementation on GPU
An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units

An Efficient Cell List Implementation for Monte Carlo Simulation on GPUs

An Efficient Common Substrings Algorithm for On-the-Fly Behavior-Based Malware Detection and Analysis

An Efficient Deterministic Parallel Algorithm for Adaptive Multidimensional Numerical Integration on GPUs

An Efficient Dispatcher for Large Scale GraphProcessing on OpenCL-based FPGAs

An Efficient Fine-grained Parallel Genetic Algorithm Based on GPU-Accelerated
An efficient GPU acceptance-rejection algorithm for the selection of the next reaction to occur for Stochastic Simulation Algorithms

An efficient GPU algorithm for tetrahedron-based Brillouin-zone integration

An Efficient GPU Implementation of Modified Discrete Cosine Transform Using CUDA

An efficient GPU implementation of the revised simplex method
An efficient GPU-based approach for interactive global illumination

An efficient GPU-based time domain solver for the acoustic wave equation

An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs

An Efficient Implementation of Double Precision 1-D FFT for GPUs Using CUDA

An Efficient Implementation of GPU Virtualization in High Performance Clusters
An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases

Titles: 100
open PDFs: 84
packages: 12
