high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Performance Analysis of Join Algorithms on GPUs

Performance Analysis of kNN on large datasets using CUDA & Pthreads

Performance analysis of matrix-free conjugate gradient kernels using SYCL

Performance analysis of memory transfers and GEMM subroutines on NVIDIA Tesla GPU cluster

Performance analysis of multi-core CPUs and GPU computing on SF-FDTD scheme for third order nonlinear materials and periodic media

Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded Modes

Performance analysis of parallel gravitational N-body codes on large GPU cluster

Performance Analysis of Parallel Sorting Algorithms using GPU Computing

Performance Analysis of Roberts Edge Detection Using CUDA and OpenGL

Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters

Performance Analysis of Sobel Edge Detection Filter on GPU using CUDA & OpenGL

Performance Analysis of Sobel Edge Filter on Heterogeneous System Using OpenCL

Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)

Performance analysis of SSE instructions in multi-core CPUs and GPU computing on FDTD scheme for solid and fluid vibration problems

Performance Analysis of the OP2 Framework on Many-core Architectures

Performance Analysis on Energy Efficient High-Performance Architectures

Performance Analysis on Several GPU Architectures of an Algorithm for Noise Removal

Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations

Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (Part 2: Double Precision GPUs)

Performance and accuracy of Lattice-Boltzmann kernels on multi- and manycore architectures

Performance and Efficiency Analysis of Modern Accelerators: Fine-Grained Parallelism on the Intel Xeon Phi

Performance and energy footprint assessment of FPGAs and GPUs on HPC systems using Astrophysics application

Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors

Performance and numerical accuracy evaluation of heterogeneous multicore systems for Krylov orthogonal basis computation

Performance and Numerical Aspects of Decompositional Factorizations with FP64 Floating-Point Emulation in INT8

Performance and Portability of Accelerated Lattice Boltzmann Applications with OpenACC

Performance and Power Analysis of ATI GPU: A Statistical Approach

Performance and Power Comparisons Between Fermi and Cypress GPUs

Performance and Power Comparisons Between Nvidia and ATI GPUs

Performance and power consumption investigation for execution of integer operations on CPU and GPU processors for multimedia applications

Performance and Power Efficiency Analysis of the Symmetric Cryptograph on Two Stream Processor Architectures

Performance and Power Evaluation of AI Accelerators for Training Deep Learning Models

Performance and Power Optimization of GPU Architectures for General-purpose Computing

Performance and Productivity of Parallel Python Programming: A study with a CFD Test Case

Performance and Quality of Random Number Generators

Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units

Performance and Scalability of GPU-Based Convolutional Neural Networks

Performance Assessment of A Multi-block Incompressible Navier-Stokes Solver using Directive-based GPU Programming in a Cluster Environment

Performance assessment of CUDA and OpenACC in large scale combustion simulations

Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Performance Assessment of using OpenCL on FPGA Systems for ODE Solvers

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Performance benchmarking of deep learning framework on Intel Xeon Phi

Performance Characterization and Optimization of Atomic Operations on AMD GPUs

Performance characterization of data-intensive kernels on AMD Fusion architectures

Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

Performance Comparison Between Cg-based and CUDA-based Matrix Multiplications

Performance Comparison for Neuroscience Application Benchmarks

Performance comparison of CFD-DEM solver MFiX-Exa, on GPUs and CPUs

Performance Comparison of Cholesky Decomposition on GPUs and FPGAs

Performance Comparison of Different OpenCL Implementations of LBM Simulation on Commodity Computer Hardware

Performance comparison of FPGA, GPU and CPU in image processing

Performance comparison of gauss-Jordan elimination method using OpenMP and CUDA

Performance comparison of GPU and FPGA architectures for the SVM training problem

Performance Comparison of GPU, DSP and FPGA implementations of image processing and computer vision algorithms in embedded systems

Performance Comparison of GPUs with a Genetic Algorithm based on CUDA

Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study

Performance comparison of Lattice Boltzmann fluid flow simulation using OpenCL and CUDA frameworks

Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors

Performance Comparison with OpenMP Parallelization for Multi-core Systems

Performance Considerations When Using a Dedicated Ray Traversal Engine

Performance Counters based Power Modeling of Mobile GPUs using Deep Learning

Performance Debugging Frameworks for FPGA High-Level Synthesis

Performance Debugging of GPGPU Applications with the Divergence Map

Performance Degradation Analysis of GPU Kernels

Performance Drawbacks for Matrix Multiplication using Set Associative Cache in GPU devices

Performance Efficient DNA Sequence Detection on GPU Using Parallel Pattern Matching Approach

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs

Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results

Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems

Performance enhancement of MAGIC FDTD-PIC plasma-wave simulations using GPU processing

Performance Evaluation and Analysis of Sparse Matrix and Graph Kernels on Heterogeneous Processors

Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs

Performance Evaluation and Optimization of HPCG benchmark on CPU + MIC platform

Performance evaluation and optimization of random memory access on multicores with high productivity

Performance Evaluation and Tuning of An OpenCL based Matrix Multiplier

Performance Evaluation of Advanced Features in CUDA Unified Memory

Performance Evaluation of Blocking and NonBlocking Concurrent Queues on GPUs

Performance Evaluation of Concurrent Lock-free Data Structures on GPUs

Performance Evaluation of Container-based Virtualization for High Performance Computing Environments

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Performance evaluation of CUDA programming for machining simulation

Performance evaluation of deep learning on smartphones

Performance Evaluation of Deep Learning Tools in Docker Containers

Performance Evaluation of Discrete Wavelet Transform Based on Image Compression Technique on Both CPU and GPU

Performance Evaluation of Edge Detection Techniques on GPU Using OpenCL

Performance Evaluation of Feature Extraction Algorithm on GPGPU

Performance evaluation of GPU memory hierarchy using the FFT

Performance evaluation of H.264/AVC decoding and visualization using the GPU

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

Performance evaluation of image processing algorithms on the GPU

Performance Evaluation of Intel Xeon Phi Coprocessor using XKaapi

Performance Evaluation of Mixed Precision Algorithms for Solving Sparse Linear Systems

Performance Evaluation of OpenMP’s Target Construct on GPUs – Exploring Compiler Optimizations

Performance Evaluation of OpenMP’s Target Construct on GPUs: Exploring Compiler Optimizations

Performance Evaluation of Optimized Implementations of Finite Difference Method for Wave Propagation Problems on GPU Architecture

Performance Evaluation of Parallel AES Implementations over CUDA GPU Framework

Performance Evaluation of Parallel Count Sort using GPU Computing with CUDA

Performance Evaluation of Particle Swarm Optimization Algorithms on GPU Using CUDA

Brief statistics for this page

Titles: 100

Download open PDFs: 92

Package packages: 10

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)