Papers on hgpu.org (.txt-file)
Numerical Model of Shallow Water: the Use of NVIDIA CUDA Graphics Processors

Numerical Modeling of Atmospheric Vortices

Numerical modeling of gravitational wave sources accelerated by OpenCL

Numerical Ocean Modeling and Simulation with CUDA

Numerical Parallel Processing Based on GPU with CUDA Architecture
Numerical Precision and Benchmarking Very-High-Order Integration of Particle Dynamics on GPU Accelerators

Numerical resolution of conservation laws with OpenCL

Numerical Simulation for the MHD System in 2D Using OpenCL

Numerical simulation of 3D particulate flows based on GPU technology

Numerical Simulation of Melting with Natural Convection Based on Lattice Boltzmann Method and Performed with CUDA Enabled GPU

Numerical Simulation of the Complex Ginzburg-Landau Equation on GPUs with CUDA

Numerical Simulation of the Frank-Kamenetskii PDE: GPU vs. CPU Computing

Numerical simulations of acoustic waves with the graphic acceleration GAMER code

Numerical solution of PDEs with hybrid and heterogeneous computing models

Numerical Solutions of Heat and Mass Transfer with the Third Kind Boundary and Initial Conditions in Capillary Porous Media Using Programmable Graphics Hardware

Numerical Study of Geometric Multigrid Methods on CPU–GPU Heterogeneous Computers

NUPAR: A Benchmark Suite for Modern GPU Architectures

NVIDIA CUDA software and gpu parallel computing architecture

NVIDIA SimNet: an AI-accelerated multi-physics simulation framework

NVIDIA Tensor Core Programmability, Performance & Precision

NVIDIA Tesla: A Unified Graphics and Computing Architecture
Object Detection Based Handwriting Localization

Object Oriented Framework for CUDA based Pyramidal Image Blending

Object oriented framework for real-time image processing on GPU

Object Space Based Collision Detection for Cloth Simulation on the GPU

Object support for OpenMP-style programming of GPU clusters in Java

Object-oriented stream programming using aspects
Object-oriented stream programming using Aspects: a high-productivity programming paradigm for hybrid platforms

Objective-Driven Workload Allocation in Heterogeneous Computing Systems

Obsidian: GPU Kernel Programming in Haskell (thesis)

Obsidian: GPU Programming in Haskell

Obtaining a 35x Speedup in 2D Phase Unwrapping Using Commodity Graphics Processors

OCCA: A unified approach to multi-threading languages

Ocean wave simulation in real-time using GPU

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Ocelot/HyPE: Optimized Data Processing on Heterogeneous Hardware

OCLoptimizer: An Iterative Optimization Tool for OpenCL

OCT on CUDA: Speeding up the image reconstruction algorithm for an Optical Coherence Tomography system using NVIDIA’s CUDA platform

Octree Light Propagation Volumes

Odeint – Solving ordinary differential equations in C++

Odyssey: A Public GPU-Based Code for General-Relativistic Radiative Transfer in Kerr Spacetime

Off-axis quantitative phase imaging processing using CUDA: toward real-time applications

Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads

Offload Compiler Runtime for the Intel Xeon Phi Coprocessor

Offloading Critical Security Operations to the GPU

Offloading IDS Computation to the GPU

Offloading Java to Graphics Processors

Offloading Region Matching of Data Distribution Management with CUDA

Offset, Bisector and Medial Axis Construction on NURBS Surface Based on GPU
OKL: A Unified Language for Parallel Architectures

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

Omniwise: Predicting GPU Kernels Performance with LLMs

OMP2HMPP: Compiler Framework for Energy-Performance Trade-off Analysis of Automatically Generated Codes

OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions

On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures

On algorithmic reductions in task-parallel programming models

On Benchmarking the Matrix Multiplication Algorithm using OpenMP, MPI and CUDA Programming Languages

On Binaural Spatialization and the Use of GPGPU for Audio Processing

On continuous maximum flow image segmentation algorithm

On CUDA implementation of a multichannel room impulse response reshaping algorithm based on p-norm optimization

On Demand Solid Texture Synthesis Using Deep 3D Networks

On Development, Feasibility, and Limits of Highly Efficient CPU and GPU Programs in Several Fields

On Dynamic Load Balancing on Graphics Processors

On Efficient GPGPU Computing for Integrated Heterogeneous CPU-GPU Microprocessors

On Expressing Different Concurrency Paradigms on Virtual Execution Systems
On Expressing Different Concurrency Paradigms on Virtual Execution Systems (thesis)

On GPU Fourier Transformations

On GPU-Accelerated Fast Direct Solvers and Their Applications in Image Denoising

On GPU’s viability as a middleware accelerator

On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest

On learning optimized reaction diffusion processes for effective image restoration

On Leveraging GPUs for Security: discussing k-anonymity and pattern matching

On Longest Repeat Queries Using GPU

On Migration and Consolidation of VMs in Hybrid CPU-GPU Environments

On modelling of anisotropic viscoelasticity for soft tissue simulation: numerical solution and GPU execution

On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters

On optimization techniques for the matrix multiplication on hybrid CPU+GPU platforms

On Optimizing Complex Stencils on GPUs

On Parallel Software Verification using Boolean Equation Systems

On Password Guessing with GPUs and FPGAs

On Performance of GPU and DSP Architectures for Computationally Intensive Applications

On Pre-Trained Image Features and Synthetic Images for Deep Learning

On Reinforcement Learning for Full-length Game of StarCraft

On Runtime Systems for Task-based Programming on Heterogeneous Platforms

On Scheduling Ring-All-Reduce Learning Jobs in Multi-Tenant GPU Clusters with Communication Contention

On Simplifying and Optimizing Programs for Heterogeneous Computing Systems

On sorting and load balancing on GPUs

On Static Timing Analysis of GPU Kernels

On testing GPU memory for hard and soft errors

On the Accelerating of Two-dimensional Smart Laplacian Smoothing on the GPU

On the accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit and novel 16-bit number formats

On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms

On the Choice of Tensor Estimation for Corner Detection, Optical Flow and Denoising

Titles: 100
open PDFs: 93
packages: 14
