Papers on hgpu.org (.txt-file)
Object Space Based Collision Detection for Cloth Simulation on the GPU
Object support for OpenMP-style programming of GPU clusters in Java
Object-oriented stream programming using aspects
Object-oriented stream programming using Aspects: a high-productivity programming paradigm for hybrid platforms
Objective-Driven Workload Allocation in Heterogeneous Computing Systems
Obsidian: GPU Kernel Programming in Haskell (thesis)
Obsidian: GPU Programming in Haskell
Obtaining a 35x Speedup in 2D Phase Unwrapping Using Commodity Graphics Processors
OCCA: A unified approach to multi-threading languages
Ocean wave simulation in real-time using GPU
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Ocelot/HyPE: Optimized Data Processing on Heterogeneous Hardware
OCLoptimizer: An Iterative Optimization Tool for OpenCL
OCT on CUDA: Speeding up the image reconstruction algorithm for an Optical Coherence Tomography system using NVIDIA’s CUDA platform
Octree Light Propagation Volumes
Odeint – Solving ordinary differential equations in C++
Odyssey: A Public GPU-Based Code for General-Relativistic Radiative Transfer in Kerr Spacetime
Off-axis quantitative phase imaging processing using CUDA: toward real-time applications
Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads
Offload Compiler Runtime for the Intel Xeon Phi Coprocessor
Offloading Critical Security Operations to the GPU
Offloading IDS Computation to the GPU
Offloading Java to Graphics Processors
Offloading Region Matching of Data Distribution Management with CUDA
Offset, Bisector and Medial Axis Construction on NURBS Surface Based on GPU
OKL: A Unified Language for Parallel Architectures
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures
Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs
OMP2HMPP: Compiler Framework for Energy-Performance Trade-off Analysis of Automatically Generated Codes
OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions
On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures
On algorithmic reductions in task-parallel programming models
On Benchmarking the Matrix Multiplication Algorithm using OpenMP, MPI and CUDA Programming Languages
On Binaural Spatialization and the Use of GPGPU for Audio Processing
On continuous maximum flow image segmentation algorithm
On CUDA implementation of a multichannel room impulse response reshaping algorithm based on p-norm optimization
On Demand Solid Texture Synthesis Using Deep 3D Networks
On Development, Feasibility, and Limits of Highly Efficient CPU and GPU Programs in Several Fields
On Dynamic Load Balancing on Graphics Processors
On Efficient GPGPU Computing for Integrated Heterogeneous CPU-GPU Microprocessors
On Expressing Different Concurrency Paradigms on Virtual Execution Systems
On Expressing Different Concurrency Paradigms on Virtual Execution Systems (thesis)
On GPU Fourier Transformations
On GPU-Accelerated Fast Direct Solvers and Their Applications in Image Denoising
On GPU’s viability as a middleware accelerator
On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest
On learning optimized reaction diffusion processes for effective image restoration
On Leveraging GPUs for Security: discussing k-anonymity and pattern matching
On Longest Repeat Queries Using GPU
On Migration and Consolidation of VMs in Hybrid CPU-GPU Environments
On modelling of anisotropic viscoelasticity for soft tissue simulation: numerical solution and GPU execution
On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters
On optimization techniques for the matrix multiplication on hybrid CPU+GPU platforms
On Optimizing Complex Stencils on GPUs
On Parallel Software Verification using Boolean Equation Systems
On Password Guessing with GPUs and FPGAs
On Performance of GPU and DSP Architectures for Computationally Intensive Applications
On Pre-Trained Image Features and Synthetic Images for Deep Learning
On Reinforcement Learning for Full-length Game of StarCraft
On Runtime Systems for Task-based Programming on Heterogeneous Platforms
On Scheduling Ring-All-Reduce Learning Jobs in Multi-Tenant GPU Clusters with Communication Contention
On Simplifying and Optimizing Programs for Heterogeneous Computing Systems
On sorting and load balancing on GPUs
On Static Timing Analysis of GPU Kernels
On testing GPU memory for hard and soft errors
On the Accelerating of Two-dimensional Smart Laplacian Smoothing on the GPU
On the accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit and novel 16-bit number formats
On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms
On the Choice of Tensor Estimation for Corner Detection, Optical Flow and Denoising
On the Compilation Performance of Current SYCL Implementations
On the Correctness of the SIMT Execution Model of GPUs
On the Cryptanalysis of Public-Key Cryptography
On the design of architecture-aware algorithms for emerging applications
On the design of sparse hybrid linear solvers for modern parallel architectures
On the Development and Implementation of High-Order Flux Reconstruction Schemes for Computational Fluid Dynamics
On the Effect of Using Multiple GPUs in Solving QAPs with CUDA
On the Effectiveness of OpenMP teams for Programming Embedded Manycore Accelerators
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
On the Efficacy of GPU-Integrated MPI for Scientific Applications
On the Efficiency of CPU and Hybrid CPU-GPU Systems in Computational Biology Tasks
On the efficiency of iterative ordered subset reconstruction algorithms for acceleration on GPUs
On the energy efficiency of graphics processing units for scientific computing
On the evaluation of matrix polynomials using several GPGPUs
On the Fly Porn Video Blocking Using Distributed Multi-GPU and Data Mining Approach
On the GPGPU parallelization issues of finite element approximate inverse preconditioning
On the limits of GPU acceleration
On the numerical sensitivity of computer simulations on hybrid and parallel computing systems
On the numerical solution of chaotic dynamical systems using extend precision floating point arithmetic and very high order numerical methods
On the origin of yet another channel
On the Parallelization of Integer Polynomial Multiplication
On the Performance and Energy-efficiency of Multi-core SIMD CPUs and CUDA-enabled GPUs
On the performance of a highly-scalable Computational Fluid Dynamics code on AMD, ARM and Intel processors
On the performance of GPU public-key cryptography
On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures
On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation
Titles: 100
open PDFs: 94
packages: 19