high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Patient-Specific Non-Linear Finite Element Modelling for Predicting Soft Organ Deformation in Real-Time; Application to Non-Rigid Neuroimage Registration

Pattern Matching in OpenCL: GPU vs CPU Energy Consumption on Two Mobile Chipsets

Pattern Recognition with Embedded Systems Technology: A Survey

Pattern Recognition with OpenCL Heterogeneous Platform

Pattern-based Programming Abstractions for Heterogeneous Parallel Computing

Patterns and Rewrite Rules for Systematic Code Generation (From High-Level Functional Patterns to High-Performance OpenCL Code)

Patterns of Inefficient Performance Behavior in GPU Applications

PATUS: A Code Generation and Auto-Tuning Framework For Parallel Stencil Computations

PATUS: A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern Microarchitectures

PCIeHLS: an OpenCL HLS framework

PConG: A novel platform available for pervasive computing based on GPU

PDAWL: Profile-based Iterative Dynamic Adaptive WorkLoad Balance on Heterogeneous Architectures

PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations

Pedestrian Detection at Warp Speed: Exceeding 500 Detections per Second

Pedestrian detection system based on stereo vision for mobile robot

Pegasus: coordinated scheduling for virtualized accelerator-based systems

PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming

People detection method using graphics processing units for a mobile robot with an omnidirectional camera

PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems

PEPSC: A Power-Efficient Processor for Scientific Computing

PErasure: a Parallel Cauchy Reed-Solomon Coding Library for GPUs

Perception of Acoustical Spatial Attributes and Impression in Virtually Rendered Sound Field

Perception-aware Depth Cueing for Illustrative Vascular Visualization

Perceptual enhancement of two-level volume rendering

Perceptually Optimized Real-Time Computer Graphics

PERCH 2.0: Fast and Accurate GPU-based Perception via Search for Object Pose Estimation

Percolation study of samples on 2D lattices using GPUs

perf4sight: A toolflow to model CNN training performance on Edge GPUs

Perfect Hashing Structures for Parallel Similarity Searches

Perfect spatial hashing

PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions

Performance Acceleration of Kernel Polynomial Method Applying Graphics Processing Units

Performance Analysis and Automatic Tuning of Hash Aggregation on GPUs

Performance Analysis and Benchmarking of the Intel SCC

Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs

Performance Analysis and Improvement of Parallel Differential Evolution

Performance Analysis and Optimisation of the OP2 Framework on Many-core Architectures

Performance analysis and optimization of a CFD application

Performance Analysis and Optimization of a Distributed Processing Framework for Data Mining Accelerated with Graphics Processing Units

Performance Analysis and Optimization of Hermite Methods on NVIDIA GPUs Using CUDA

Performance analysis and optimization of highly diverging algorithms on GPUs

Performance analysis and optimization of the OP2 framework on many-core architectures

Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model

Performance Analysis and Optimization Opportunities for NVIDIA Automotive GPUs

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

Performance Analysis and Tuning For: General-Purpose Graphics Processing Units (GPGPU)

Performance Analysis Cluster and GPU Computing Environment on Molecular Dynamic Simulation of BRV-1 and REM2 with GROMACS

Performance Analysis for GPU-based Ray-triangle Algorithms

Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi

Performance Analysis of a High-level Abstractions-based Hydrocode on Future Computing Systems

Performance Analysis of a Hybrid MPI/CUDA Implementation of the NAS-LU Benchmark

Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

Performance Analysis of a Large Memory Application on Multiple Architectures

Performance Analysis of a New Real-Time Elastographic Time Constant Estimator

Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling

Performance Analysis of a Particle-in-Cell Plasma Physics Code on Homogeneous and Heterogeneous HPC Systems

Performance Analysis of a Stereo Matching Implementation in OpenCL

Performance Analysis of a Symmetric Cryptographic Algorithm on Multicore Architectures

Performance Analysis of a Symmetric Cryptography Algorithm on GPU and GPU Cluster

Performance analysis of accelerated image registration using GPGPU

Performance Analysis of an Astrophysical Simulation Code on the Intel Xeon Phi Architecture

Performance Analysis of an Ultrasound Reconstruction Algorithm for Non Destructive Testing

Performance Analysis of CUDA and OpenCL By Implementation of Cryptographic Algorithms

Performance Analysis of Deep Learning Workloads on Leading-edge Systems

Performance Analysis of General-Purpose Computation on Commodity Graphics Hardware: A Case Study Using Bioinformatics

Performance analysis of GPGPU and CPU On AES Encryption

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

Performance Analysis of GPU compared to Single-core and Multi-core CPU for Natural Language Applications

Performance Analysis of GPU-Accelerated Filter-Based Source Finding for HI Spectral Line Image Data

Performance Analysis of GPU-based SAR and Interferometric SAR image processing

Performance Analysis of IBM Cell Broadband Engine on Sequence Alignment

Brief statistics for this page

Titles: 100

Download open PDFs: 86

Package packages: 9

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)