Papers on hgpu.org (.txt-file)
Python Non-Uniform Fast Fourier Transform (PyNUFFT): An Accelerated Non-Cartesian MRI Package on a Heterogeneous Platform (CPU/GPU)

Python Workflows on HPC Systems

Python-Based Quantum Chemistry Calculations with GPU Acceleration

PyTorch Hyperparameter Tuning – A Tutorial for spotPython

PyTorch: An Imperative Style, High-Performance Deep Learning Library

PyTorchPipe: a framework for rapid prototyping of pipelines combining language and vision

PyTransit: Fast and Easy Exoplanet Transit Modelling in Python

q-state Potts model metastability study using optimized GPU-based Monte Carlo algorithms

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays

QCD on GPUs: cost effective supercomputing

QCD simulations with staggered fermions on GPUs

QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers

QGTC: Accelerating Quantized GNN via GPU Tensor Core

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

QMCPACK: An open source ab initio Quantum Monte Carlo package for the electronic structure of atoms, molecules, and solids

QP: A Heterogeneous Multi-Accelerator Cluster

QPACE 2 and Domain Decomposition on the Intel Xeon Phi

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators

QSL Squasher: A Fast Quasi-Separatrix Layer Map Calculator

Quadratic Pseudo-Boolean Optimization for Scene Analysis using CUDA

Qualcomm Snapdragon Mobile Platform OpenCL General Programming and Optimization

Quality comparison and acceleration for digital hologram generation method based on segmentation

Quality-score guided error correction for short-read sequencing data using CUDA
Quantifying NUMA and contention effects in multi-GPU systems

Quantifying OpenMP: Statistical Insights into Usage and Adoption

Quantifying the Energy Efficiency of FFT on Heterogeneous Platforms

Quantifying the Energy Efficiency of Object Recognition and Optical Flow

Quantifying the Impact of GPUs on Performance and Energy Efficiency in HPC Clusters

Quantile Mechanics II: Changes of Variables in Monte Carlo methods and a GPU-Optimized Normal Quantile

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

Quantum Boolean Image Denoising

Quantum chemical many-body theory on heterogeneous nodes

Quantum Chemistry for Solvated Molecules on Graphical Processing Units (GPUs) using Polarizable Continuum Models

Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation

Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation

Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics

Quantum computer simulation using the CUDA programming model
Quantum Monte Carlo on graphical processing units

Quantum.Ligand.Dock: protein-ligand docking with quantum entanglement refinement on a GPU system

Quartile and Outlier Detection on Heterogeneous Clusters Using Distributed Radix Sort

Quasars spectra classification with the help of GPU computing

Quasi-maximum Accuracy Floating-point Computations with GPGPU for Applications in Digital Signal Processing

Quasi-real-time analysis of dynamic near field scattering data using a graphics processing unit

QUDA programming for staggered quarks

Query Optimization in Heterogeneous CPU/GPU Environment for Time Series Databases

Query Processing on Tensor Computation Runtimes

Query-Driven Visualization of Time-Varying Adaptive Mesh Refinement Data

Quick-CULLIDE: fast inter- and intra-object collision culling using graphics hardware

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

QuickProbs – A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

Quine-McCluskey algorithm on GPGPU

QYMSYM: A GPU-Accelerated Hybrid Symplectic Integrator That Permits Close Encounters

R2GUESS: A Graphics Processing Unit-Based R Package for Bayesian Variable Selection Regression of Multivariate Responses

Radeon PRO Solid State Graphics (SSG) API User Manual

Radial Basis Function Networks GPU-Based Implementation
Radiation Modeling Using the Uintah Heterogeneous CPU/GPU Runtime System

Radiative Heat Transfer Simulation Using Programmable Graphics Hardware
Radio astronomy beam forming on GPUs

Radio Astronomy Beam Forming on Many-Core Architectures

Radiometric Compensation through Inverse Light Transport

Radionuclides migration modelling using artificial neural networks and parallel computing

RadixBoost: A Hardware Acceleration Structure for Scalable Radix Sort on Graphic Processors

Rain Scene Animation through Particle Systems and Surface Flow Simulation by SPH

Raising the Bar for Using GPUs in Software Packet Processing

Raising the level of many-core programming with compiler technology: meeting a grand challenge
Raising the Performance of the Tinker-HP Molecular Modeling Package on Intel’s HPC Architectures: a Living Review [Article v1.0]

Random Address Permute-Shift Technique for the Shared Memory on GPUs

Random Fields Generation on the GPU with the Spectral Turning Bands Method

Random Finite Set Based Bayesian Filtering with OpenCL in a Heterogeneous Platform

Random Forests of Very Fast Decision Trees on GPU for Mining Evolving Big Data Streams

Random number generators for massively parallel simulations on GPU

Random Walks based Multi-Image Segmentation: Quasiconvexity Results and GPU-based Solutions

Random Walks for Image Cosegmentation

Random Walks for Interactive Organ Segmentation in Two and Three Dimensions: Implementation and Validation

Random-access rendering of general vector graphics

Randomized selection on the GPU

Range Cell Migration Correction using texture mapping on GPU
Range query processing in a multi-GPU environment

Rank k Cholesky Up/Down-dating on the GPU: gpucholmodV0.2

RankBoost Acceleration on both NVIDIA CUDA and ATI Stream Platforms

Rapid Computation of Sodium Bioscales Using GPU-Accelerated Image Reconstruction

Rapid evaluation and evolution of neural models using graphics card hardware

Rapid Modelling of Interactive Geological Illustrations with Faults and Compaction

Rapid motion compensation for prostate biopsy using GPU
Rapid Multipole Graph Drawing on the GPU

Rapid Performance of a Generalized Distance Calculation

Rapid Rabbit: Highly Optimized GPU Accelerated Cone-Beam CT Reconstruction

Rapid RNA Folding: Analysis and Acceleration of the Zuker Recurrence

Rapid star map simulation based on GPU
Rapid Texture-based Volume Rendering
RapidMind: Portability across Architectures and its Limitations

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

RAR password decryption by utilizing GPU
Raspberry Pi based System for Visual Object Detection and Tracking

RASR/NN: The RWTH Neural Network Toolkit for Speech Recognition

Raster Time Series: Learning and Processing

Raster2Mesh: Rasterization based CVT meshing

Titles: 100
open PDFs: 90
packages: 27
