Papers on hgpu.org (.txt-file)
Omniwise: Predicting GPU Kernels Performance with LLMs

OMP2HMPP: Compiler Framework for Energy-Performance Trade-off Analysis of Automatically Generated Codes

OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions

On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures

On algorithmic reductions in task-parallel programming models

On Benchmarking the Matrix Multiplication Algorithm using OpenMP, MPI and CUDA Programming Languages

On Binaural Spatialization and the Use of GPGPU for Audio Processing

On continuous maximum flow image segmentation algorithm

On CUDA implementation of a multichannel room impulse response reshaping algorithm based on p-norm optimization

On Demand Solid Texture Synthesis Using Deep 3D Networks

On Development, Feasibility, and Limits of Highly Efficient CPU and GPU Programs in Several Fields

On Dynamic Load Balancing on Graphics Processors

On Efficient GPGPU Computing for Integrated Heterogeneous CPU-GPU Microprocessors

On Expressing Different Concurrency Paradigms on Virtual Execution Systems
On Expressing Different Concurrency Paradigms on Virtual Execution Systems (thesis)

On GPU Fourier Transformations

On GPU-Accelerated Fast Direct Solvers and Their Applications in Image Denoising

On GPU’s viability as a middleware accelerator

On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest

On learning optimized reaction diffusion processes for effective image restoration

On Leveraging GPUs for Security: discussing k-anonymity and pattern matching

On Longest Repeat Queries Using GPU

On Migration and Consolidation of VMs in Hybrid CPU-GPU Environments

On modelling of anisotropic viscoelasticity for soft tissue simulation: numerical solution and GPU execution

On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters

On optimization techniques for the matrix multiplication on hybrid CPU+GPU platforms

On Optimizing Complex Stencils on GPUs

On Parallel Software Verification using Boolean Equation Systems

On Password Guessing with GPUs and FPGAs

On Performance of GPU and DSP Architectures for Computationally Intensive Applications

On Pre-Trained Image Features and Synthetic Images for Deep Learning

On Reinforcement Learning for Full-length Game of StarCraft

On Runtime Systems for Task-based Programming on Heterogeneous Platforms

On Scheduling Ring-All-Reduce Learning Jobs in Multi-Tenant GPU Clusters with Communication Contention

On Simplifying and Optimizing Programs for Heterogeneous Computing Systems

On sorting and load balancing on GPUs

On Static Timing Analysis of GPU Kernels

On testing GPU memory for hard and soft errors

On the Accelerating of Two-dimensional Smart Laplacian Smoothing on the GPU

On the accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit and novel 16-bit number formats

On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms

On the Choice of Tensor Estimation for Corner Detection, Optical Flow and Denoising

On the Compilation Performance of Current SYCL Implementations

On the Correctness of the SIMT Execution Model of GPUs

On the Cryptanalysis of Public-Key Cryptography

On the design of architecture-aware algorithms for emerging applications

On the design of sparse hybrid linear solvers for modern parallel architectures

On the Development and Implementation of High-Order Flux Reconstruction Schemes for Computational Fluid Dynamics

On the Effect of Using Multiple GPUs in Solving QAPs with CUDA

On the Effectiveness of OpenMP teams for Programming Embedded Manycore Accelerators

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

On the Efficacy of GPU-Integrated MPI for Scientific Applications

On the Efficiency of CPU and Hybrid CPU-GPU Systems in Computational Biology Tasks

On the efficiency of iterative ordered subset reconstruction algorithms for acceleration on GPUs

On the energy efficiency of graphics processing units for scientific computing

On the evaluation of matrix polynomials using several GPGPUs

On the Fly Porn Video Blocking Using Distributed Multi-GPU and Data Mining Approach

On the GPGPU parallelization issues of finite element approximate inverse preconditioning
On the limits of GPU acceleration

On the numerical sensitivity of computer simulations on hybrid and parallel computing systems

On the numerical solution of chaotic dynamical systems using extend precision floating point arithmetic and very high order numerical methods

On the origin of yet another channel

On the Parallelization of Integer Polynomial Multiplication

On the Partitioning of GPU Power among Multi-Instances

On the Performance and Energy-efficiency of Multi-core SIMD CPUs and CUDA-enabled GPUs

On the performance of a highly-scalable Computational Fluid Dynamics code on AMD, ARM and Intel processors

On the performance of GPU public-key cryptography

On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures

On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation

On the Portability of GPU-Accelerated Applications via Automated Source-to-Source Translation

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

On the Programmability and Performance of Heterogeneous Platforms

On the programmability of multi-GPU computing systems

On the Relation between Anisotropic Diffusion and Iterated Adaptive Filtering

On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU

On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit

On the Simulations of Evolution-Communication P Systems with Energy without Antiport Rules for GPUs

On the technology roadmap of Free-Viewpoint 3DTV receivers
On the Three P’s of Parallel Programming for Heterogeneous Computing: Performance, Productivity, and Portability

On the type of the temperature phase transition in phi-4 model

On the Usage of GPUs for Efficient Motion Estimation in Medical Image Sequences

On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization

On the Use of an Algebraic Language Interface for Waveform Definition

On the use of deep Boltzmann machines for road signs classification

On the Use of GPUs in Realizing Cost-Effective Distributed RAID

On the Use of Graphic Processing Units for the Efficient Implementation of MIMO Detectors

On the Use of Graphics Processing Units (GPUs) for Molecular Dynamics Simulation of Spherical Particles

On the Use of Remote GPUs and Low-Power Processors for the Acceleration of Scientific Applications

On the Use of Small 2D Convolutions on GPUs

On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods

On the Validation and Applications of a Parallel Flexible Multi-Body Dynamics Implementation

On the Visualization of Social and other Scale-Free Networks

On the Way to Future’s High Energy Particle Physics Transport Code

On Using GPU to Compute Options and Derivatives

On Vectorization of Deep Convolutional Neural Networks for Vision Tasks

On-Demand Generating and Scheduling Optimised Parallel Applications on Heterogeneous Platforms

On-Demand Source Code Generation & Scheduling Optimised Parallel Applications on Heterogeneous Platforms

On-line free-viewpoint video: From single to multiple view rendering

Titles: 100
open PDFs: 96
packages: 16
