Papers on hgpu.org (.txt-file)
On the Compilation Performance of Current SYCL Implementations

On the Correctness of the SIMT Execution Model of GPUs

On the Cryptanalysis of Public-Key Cryptography

On the design of architecture-aware algorithms for emerging applications

On the design of sparse hybrid linear solvers for modern parallel architectures

On the Development and Implementation of High-Order Flux Reconstruction Schemes for Computational Fluid Dynamics

On the Effect of Using Multiple GPUs in Solving QAPs with CUDA

On the Effectiveness of OpenMP teams for Programming Embedded Manycore Accelerators

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

On the Efficacy of GPU-Integrated MPI for Scientific Applications

On the Efficiency of CPU and Hybrid CPU-GPU Systems in Computational Biology Tasks

On the efficiency of iterative ordered subset reconstruction algorithms for acceleration on GPUs

On the energy efficiency of graphics processing units for scientific computing

On the evaluation of matrix polynomials using several GPGPUs

On the Fly Porn Video Blocking Using Distributed Multi-GPU and Data Mining Approach

On the GPGPU parallelization issues of finite element approximate inverse preconditioning
On the limits of GPU acceleration

On the numerical sensitivity of computer simulations on hybrid and parallel computing systems

On the numerical solution of chaotic dynamical systems using extend precision floating point arithmetic and very high order numerical methods

On the origin of yet another channel

On the Parallelization of Integer Polynomial Multiplication

On the Partitioning of GPU Power among Multi-Instances

On the Performance and Energy-efficiency of Multi-core SIMD CPUs and CUDA-enabled GPUs

On the performance of a highly-scalable Computational Fluid Dynamics code on AMD, ARM and Intel processors

On the performance of GPU public-key cryptography

On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures

On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation

On the Portability of GPU-Accelerated Applications via Automated Source-to-Source Translation

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

On the Programmability and Performance of Heterogeneous Platforms

On the programmability of multi-GPU computing systems

On the Relation between Anisotropic Diffusion and Iterated Adaptive Filtering

On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU

On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit

On the Simulations of Evolution-Communication P Systems with Energy without Antiport Rules for GPUs

On the technology roadmap of Free-Viewpoint 3DTV receivers
On the Three P’s of Parallel Programming for Heterogeneous Computing: Performance, Productivity, and Portability

On the type of the temperature phase transition in phi-4 model

On the Usage of GPUs for Efficient Motion Estimation in Medical Image Sequences

On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization

On the Use of an Algebraic Language Interface for Waveform Definition

On the use of deep Boltzmann machines for road signs classification

On the Use of GPUs in Realizing Cost-Effective Distributed RAID

On the Use of Graphic Processing Units for the Efficient Implementation of MIMO Detectors

On the Use of Graphics Processing Units (GPUs) for Molecular Dynamics Simulation of Spherical Particles

On the Use of Remote GPUs and Low-Power Processors for the Acceleration of Scientific Applications

On the Use of Small 2D Convolutions on GPUs

On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods

On the Validation and Applications of a Parallel Flexible Multi-Body Dynamics Implementation

On the Visualization of Social and other Scale-Free Networks

On the Way to Future’s High Energy Particle Physics Transport Code

On Using GPU to Compute Options and Derivatives

On Vectorization of Deep Convolutional Neural Networks for Vision Tasks

On-Demand Generating and Scheduling Optimised Parallel Applications on Heterogeneous Platforms

On-Demand Source Code Generation & Scheduling Optimised Parallel Applications on Heterogeneous Platforms

On-line free-viewpoint video: From single to multiple view rendering

On-the-Fly Computing on Many-Core Processors in Nuclear Applications

On-the-fly elimination of dynamic irregularities for GPU computing

On-the-fly Generation and Rendering of Infinite Cities on the GPU

On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-based FPGAs

Oncilla: A GAS Runtime for Efficient Resource Allocation and Data Movement in Accelerated Clusters

One machine, one minute, three billion tetrahedra

One Stone Two Birds: Synchronization Relaxation and Redundancy Removal in GPU-CPU Translation

One weird trick for parallelizing convolutional neural networks

One-shot tuner for deep learning compilers

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Onesweep: A Faster Least Significant Digit Radix Sort for GPUs

Online Adaptive Code Generation and Tuning

Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach

Online Performance Projection for Clusters with Heterogeneous GPUs

Online rapid prototyping of 3D objects using GPU-based 3D cloud computing: Application to 3D face modelling

Online video synthesis for removing occluding objects using multiple uncalibrated cameras via plane sweep algorithm

OP2: An Active Library Framework for Solving Unstructured Mesh-based Applications on Multi-Core and Many-Core Architectures

Opal: A Modular Framework for Optimizing Performance using Analytics and LLMs

Open Source Face Recognition API

Open SYCL on heterogeneous GPU systems: A case of study

Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

OpenABLext: An automatic code generation framework for agent-based simulations on CPU-GPU-FPGA heterogeneous platforms

OpenACC – First Experiences with Real-World Applications
OpenACC cache Directive: Opportunities and Optimizations

OpenACC Implementations Comparison

OpenACC offloading of the MFC compressible multiphase flow solver on AMD and NVIDIA GPUs

OpenACC-based GPU Acceleration of a 3-D Unstructured Discontinuous Galerkin Method

OpenCL – An effective programming model for data parallel computations at the Cell Broadband Engine

OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

OpenCL 2.0 for FPGAs using OCLAcc

OpenCL Accelerated Multi-GPU Cone-Beam Reconstruction

OpenCL Acceleration for TensorFlow

OpenCL Actors – Adding Data Parallelism to Actor-based Programming with CAF

OpenCL and parallel primitives for digital TV applications
OpenCL and the 13 Dwarfs: A Work in Progress

OpenCL API Extensions to achieve Multi-level Parallelism for Efficient Implementation of Strassen’s Matrix Multiplication on GPUs

OpenCL Based Digital Image Projection Acceleration

OpenCL Based High-Quality HEVC Motion Estimation on GPU

OpenCL based machine learning labeling of biomedical datasets

Titles: 100
open PDFs: 96
packages: 24
