high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Performance Evaluation of Python ParallelProgramming Models: Charm4Py and mpi4py

Performance Evaluation of Query Processing Algorithms on GPGPUs

Performance Evaluation of Quicksort with GPU Dynamic Parallelism for Gene-Expression Quantile Normalization

Performance Evaluation of R with Intel Xeon Phi Coprocessor

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Performance Evaluation of the Intel Many Integrated Core Architecture for 3D Image Reconstruction in Computed Tomography

Performance evaluation of the multi-device OpenCL FDTD solver

Performance Evaluation of the NVIDIA GeForce 8800 GTX GPU for Machine Learning

Performance Evaluation of the Ocean-Land-Atmosphere Model Using Graphics Processing Units

Performance Evaluations of Document-Oriented Databases using GPU and Cache Structure

Performance Evaluations of Graph Database using CUDA and OpenMP-Compatible Libraries

Performance Exploration of Selected Manually and Automatically Parallelized Codes on GPUs

Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors

Performance Impact of Data Layout on the GPU-accelerated IDW Interpolation

Performance impact of dynamic parallelism on different clustering algorithms

Performance Impact of Memory Channels on Sparse and Irregular Algorithms

Performance Improvement of Data Mining in Weka through GPU Acceleration

Performance Improvement of Multichannel Audio by Graphics Processing Units

Performance Improvement of Optical Algorithms on Multicore Platforms

Performance Improvement of TOUGH2 Simulation with Graphics Processing Unit

Performance improvements for iterative electron tomography reconstruction using graphics processing units (GPUs)

Performance improvements of real-time crowd simulations

Performance in GPU Architectures: Potentials and Distances

Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

Performance modeling of atomic additions on GPU scratchpad memory

Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures

Performance Modelling and Traffic Characterisation of Optical Networks

Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures

Performance models for CPU-GPU data transfers

Performance models for CUDA streams on NVIDIA GeForce series

Performance Models for Heterogeneous Iterative Programs

Performance Monitoring of Multi-FPGA Systems

Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

Performance of a GPU-based Direct Summation Algorithm for Computation of Small Angle Scattering Profile

Performance of Confidential Computing GPUs

Performance of CPU and GPU HPC Architectures for off-design aircraft simulation

Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms

Performance of GPU for Pricing Financial Derivatives: Convertible Bonds

Performance of GTX Titan X GPUs and Code Optimization

Performance of Implicit Solver Strategies on GPUs

Performance of inverse atomistic scale fracture modeling on GPGPU architectures

Performance of Kepler GTX Titan GPUs and Xeon Phi System

Performance of OpenCL

Performance of Optical Flow Techniques on Graphics Hardware

Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

Performance Optimisations for Heterogeneous Managed Runtime Systems

Performance Optimization of 3-D Lattice Boltzmann Flow Solver on a GPU

Performance Optimization of Clustering On GPU

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU

Performance Optimization of GPU ELF-Codes

Performance Optimization of Memory Intensive Applications on FPGA Accelerator

Performance Optimization of Vision Apps on Mobile Application Processor

Performance Optimization using Multimodal Modeling and Heterogeneous GNN

Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs

Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores

Performance portability analysis of SYCL with a classical CG on CPU, GPU, and FPGA

Performance Portability and Evaluation of Heterogeneous Components of SeisSol Targeted to Upcoming Intel HPC GPUs

Performance Portability Challenges for Fortran Applications

Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler

Performance portability evaluation of blocked stencil computations on GPUs

Performance Portability in Accelerated Parallel Kernels

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework

Performance Portability of the Aeras Atmosphere Model to Next Generation Architectures using Kokkos

Performance Portability Strategies for Computational Fluid Dynamics (CFD) Applications on HPC Systems

Performance portability study of epistasis detection using SYCL on NVIDIA GPU

Performance Portability Study of Linear Algebra Kernels in OpenCL

Performance portability through machine learning guided kernel selection in SYCL libraries

Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study

Performance Portability with the Chapel Language

Performance Portable GPU Code Generation for Matrix Multiplication

Performance Portable Gradient Computations Using Source Transformation

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Performance potential for simulating spin models on GPU

Performance prediction of deep learning applications training in GPU as a service systems

Performance Predictions for General-Purpose Computation on GPUs

Performance study of filtered back-projection algorithms implemented on GPUs

Performance study of interference on GPU and CPU resources with multiple applications

Performance Study of LU Decomposition on the Programmable GPU

Performance study of mapping irregular computations on GPUs

Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA

Performance study of using the Direct Compute API for implementing Support vector machines on GPUs

Performance study on GPU offloading techniques using the Gauss matrix inverse algorithm

Performance Testing of GPU-Based Approximate Matching Algorithm on Network Traffic

Performance Tradeoff Spectrum of Integer and Floating Point Applications

Performance Tradeoff Spectrum of Integer and Floating Point Applications Kernels on Various GPUs

Performance Traps in OpenCL for CPUs

Performance Tuning for CUDA-Accelerated Neighborhood Denoising Filters

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies

Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs

Performance-Analysis-Based Acceleration of Image Quality Assessment

Performance-aware component composition for GPU-based systems

Performance-Correctness Challenges in Emerging Heterogeneous Multicore Processors

Performance-efficient mechanisms for managing irregularity in throughput processors

Performance-Oriented Neural Architecture Search

Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

Performance/power assessment of CNN packages on embedded automotive platforms

Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement

Performant low-order matrix-free finite element kernels on GPU architectures

Brief statistics for this page

Titles: 100

Download open PDFs: 94

Package packages: 18

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)