Papers on hgpu.org (.txt-file)
On the design of architecture-aware algorithms for emerging applications
On the design of sparse hybrid linear solvers for modern parallel architectures
On the Development and Implementation of High-Order Flux Reconstruction Schemes for Computational Fluid Dynamics
On the Effect of Using Multiple GPUs in Solving QAPs with CUDA
On the Effectiveness of OpenMP teams for Programming Embedded Manycore Accelerators
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
On the Efficacy of GPU-Integrated MPI for Scientific Applications
On the Efficiency of CPU and Hybrid CPU-GPU Systems in Computational Biology Tasks
On the efficiency of iterative ordered subset reconstruction algorithms for acceleration on GPUs
On the energy efficiency of graphics processing units for scientific computing
On the evaluation of matrix polynomials using several GPGPUs
On the Fly Porn Video Blocking Using Distributed Multi-GPU and Data Mining Approach
On the GPGPU parallelization issues of finite element approximate inverse preconditioning
On the limits of GPU acceleration
On the numerical sensitivity of computer simulations on hybrid and parallel computing systems
On the numerical solution of chaotic dynamical systems using extend precision floating point arithmetic and very high order numerical methods
On the origin of yet another channel
On the Parallelization of Integer Polynomial Multiplication
On the Partitioning of GPU Power among Multi-Instances
On the Performance and Energy-efficiency of Multi-core SIMD CPUs and CUDA-enabled GPUs
On the performance of a highly-scalable Computational Fluid Dynamics code on AMD, ARM and Intel processors
On the performance of GPU public-key cryptography
On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures
On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation
On the Portability of GPU-Accelerated Applications via Automated Source-to-Source Translation
On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms
On the Programmability and Performance of Heterogeneous Platforms
On the programmability of multi-GPU computing systems
On the Relation between Anisotropic Diffusion and Iterated Adaptive Filtering
On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU
On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit
On the Simulations of Evolution-Communication P Systems with Energy without Antiport Rules for GPUs
On the technology roadmap of Free-Viewpoint 3DTV receivers
On the Three P’s of Parallel Programming for Heterogeneous Computing: Performance, Productivity, and Portability
On the type of the temperature phase transition in phi-4 model
On the Usage of GPUs for Efficient Motion Estimation in Medical Image Sequences
On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization
On the Use of an Algebraic Language Interface for Waveform Definition
On the use of deep Boltzmann machines for road signs classification
On the Use of GPUs in Realizing Cost-Effective Distributed RAID
On the Use of Graphic Processing Units for the Efficient Implementation of MIMO Detectors
On the Use of Graphics Processing Units (GPUs) for Molecular Dynamics Simulation of Spherical Particles
On the Use of Remote GPUs and Low-Power Processors for the Acceleration of Scientific Applications
On the Use of Small 2D Convolutions on GPUs
On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods
On the Validation and Applications of a Parallel Flexible Multi-Body Dynamics Implementation
On the Visualization of Social and other Scale-Free Networks
On the Way to Future’s High Energy Particle Physics Transport Code
On Using GPU to Compute Options and Derivatives
On Vectorization of Deep Convolutional Neural Networks for Vision Tasks
On-Demand Generating and Scheduling Optimised Parallel Applications on Heterogeneous Platforms
On-Demand Source Code Generation & Scheduling Optimised Parallel Applications on Heterogeneous Platforms
On-line free-viewpoint video: From single to multiple view rendering
On-the-Fly Computing on Many-Core Processors in Nuclear Applications
On-the-fly elimination of dynamic irregularities for GPU computing
On-the-fly Generation and Rendering of Infinite Cities on the GPU
On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-based FPGAs
Oncilla: A GAS Runtime for Efficient Resource Allocation and Data Movement in Accelerated Clusters
One machine, one minute, three billion tetrahedra
One Stone Two Birds: Synchronization Relaxation and Redundancy Removal in GPU-CPU Translation
One weird trick for parallelizing convolutional neural networks
One-shot tuner for deep learning compilers
oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation
Onesweep: A Faster Least Significant Digit Radix Sort for GPUs
Online Adaptive Code Generation and Tuning
Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach
Online Performance Projection for Clusters with Heterogeneous GPUs
Online rapid prototyping of 3D objects using GPU-based 3D cloud computing: Application to 3D face modelling
Online video synthesis for removing occluding objects using multiple uncalibrated cameras via plane sweep algorithm
OP2: An Active Library Framework for Solving Unstructured Mesh-based Applications on Multi-Core and Many-Core Architectures
Open Source Face Recognition API
Open SYCL on heterogeneous GPU systems: A case of study
Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark
OpenABLext: An automatic code generation framework for agent-based simulations on CPU-GPU-FPGA heterogeneous platforms
OpenACC – First Experiences with Real-World Applications
OpenACC cache Directive: Opportunities and Optimizations
OpenACC Implementations Comparison
OpenACC offloading of the MFC compressible multiphase flow solver on AMD and NVIDIA GPUs
OpenACC-based GPU Acceleration of a 3-D Unstructured Discontinuous Galerkin Method
OpenCL – An effective programming model for data parallel computations at the Cell Broadband Engine
OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture
OpenCL 2.0 for FPGAs using OCLAcc
OpenCL Accelerated Multi-GPU Cone-Beam Reconstruction
OpenCL Acceleration for TensorFlow
OpenCL Actors – Adding Data Parallelism to Actor-based Programming with CAF
OpenCL and parallel primitives for digital TV applications
OpenCL and the 13 Dwarfs: A Work in Progress
OpenCL API Extensions to achieve Multi-level Parallelism for Efficient Implementation of Strassen’s Matrix Multiplication on GPUs
OpenCL Based Digital Image Projection Acceleration
OpenCL Based High-Quality HEVC Motion Estimation on GPU
OpenCL based machine learning labeling of biomedical datasets
OpenCL embedded profile prototype in mobile device
OpenCL Evaluation for Numerical Linear Algebra Library Development
Titles: 100
open PDFs: 95
packages: 24