high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Analysis and Optimization Techniques for Massively Parallel Processors

Analysis and Parameter Prediction of Compiler Transformation for Graphics Processors

Analysis and performance estimation of the conjugate gradient method on multiple GPUs

Analysis and Review of Sorting Algorithms

Analysis of 3-dimensional electromagnetic fields in dispersive media using cuda

Analysis of a Computational Biology Simulation Technique on Emerging Processing Architectures

Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

Analysis of Genetic Expression with Microarrays using GPU Implemented Algorithms

Analysis of GPGPU Platforms Efficiency in General-Purpose Computations

Analysis of GPU accelerated OpenCL applications on the Intel HD 4600 GPU

Analysis of GPU Parallel Computing based on Matlab

Analysis of GPU-based convolution for acoustic wave propagation modeling with finite differences: Fortran to CUDA-C step-by-step

Analysis of High Level implementations for Recursive Methods on GPUs

Analysis of illumination conditions at the lunar south pole using parallel computing techniques

Analysis of KECCAK Tree Hashing on GPU Architectures

Analysis of Metallic Nanostructures by a Discontinuous Galerkin Time-Domain Maxwell Solver on Graphics Processing Units

Analysis of Multicore CPU and GPU Toward Parallelization of Total Focusing Method Ultrasound Reconstruction

Analysis of Parallel Montgomery Multiplication in CUDA

Analysis of Parallel Sorting Algorithms on Heterogeneous Processors with OpenCL

Analysis of periodic anisotropic media by means of split-field FDTD method and GPU computing

Analysis of periodic structures with GPU accelerating

Analysis of Real-Time Stereo Vision Algorithms On GPU

Analysis of RSA algorithm using GPU programming

Analysis of Single Phase Fluid Flow and Heat Transfer in Slip Flow Regime by Parallel Implementation of Lattice Boltzmann Method on GPUs

Analysis of SuperLU Solvers on Intel MIC Architecture

Analysis of Surface Folding Patterns of DICCCOLS Using the GPU-Optimized Geodesic Field Estimate

Analysis of the Performance of the Fish School Search Algorithm Running in Graphic Processing Units

Analysis-Driven Design of Parallel Floating-Point Matrix Multiplication for Implementation in Reconfigurable Logic

Analysis-driven Engineering of Comparison-based Sorting Algorithms on GPUs

Analytic Anti-Aliasing of Linear Functions on Polytopes

Analytic Antialiasing for Selective High Fidelity Rendering

Analytic Visibility on the GPU

Analytical motion blur rasterization with compression

Analytical Performance Estimation during Code Generation on Modern GPUs

Analytical Study of Various High Performance Computing Paradigms

Analyzing and Improving the Performance of Spatial Database Processing

Analyzing CUDA workloads using a detailed GPU simulator

Analyzing CUDA’s Compiler through the Visualization of Decoded GPU Binaries

Analyzing GPU Performance in Virtualized Environments: A Case Study

Analyzing GPU Tensor Core Potential for Fast Reductions

Analyzing Locality of Memory References in GPU Architectures

Analyzing Memory Accesses for Performance and Correctness of Parallel Programs

Analyzing Modern NVIDIA GPU cores

Analyzing Optimization Techniques for Power Efficiency on Heterogeneous Platforms

Analyzing Password Strength and Efficient Password Cracking

Analyzing program flow within a many-kernel OpenCL application

Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter

Analyzing Soft-Error Vulnerability on GPGPU Microarchitecture

Analyzing the CUDA Applications with its Latency and Bandwidth Tolerance

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with Protein Database Search

Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation

Analyzing Use of OpenCL on the Cell Broadband Engine and a Proposal for OpenCL Extensions

Anatomizing Deep Learning Inference in Web Browsers

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

Anatomy of High-Performance Many-Threaded Matrix Multiplication

Android Malware Classification Using Parallelized Machine Learning Methods

ANGHABENCH: a Suite with One Million Compilable C Benchmarks for Code-Size Reduction

Animating physically based explosions in real-time

Animation of Orthogonal Texture Patterns for Vector Field Visualization

Anisotropic interfacial tension, contact angles, and line tensions: A graphics-processing-unit-based Monte Carlo study of the Ising model

Anisotropic Kuwahara Filtering on the GPU

Anisotropic mesh coarsening and refinement on GPU architecture

Anisotropic noise

AnnotationGym: A Generic Framework for Automatic Source Code Annotation

Anomalous behaviour detection using spatiotemporal oriented energies, subset inclusion histogram comparison and event-driven processing

Anomalous metastability in a temperature-driven transition

Anomalous Structure and Scaling of Ring Polymer Brushes

Anonymized Network Sensing using C++26 std::execution on GPUs

Ansor: Generating High-Performance Tensor Programs for Deep Learning

Anti-parallel Patterns in Fine-grain Data-parallel Programs

ANTS2 package: simulation and experimental data processing for Anger camera type detectors

AnyHLS: High-Level Synthesis with Partial Evaluation

AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs

AnySL: efficient and portable shading for ray tracing

Anytime Algorithms for GPU Architectures

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

APHOG: A Framework for Fast Object Detection Using Histograms of Oriented Gradients

API-Compiling for Image Hardware Accelerators

APL on GPUs: A TAIL from the Past, Scribbled in Futhark

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

APOGEE: adaptive prefetching on GPUs for energy efficiency

Apple Silicon Performance in Scientific Computing

Applicability of GPU Computing for Efficient Merge in In-Memory Databases

Application level energy measurements and models for hybrid platform with accelerators

Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics

Application of Deep-Learning to Compiler-Based Graphs

Application of GPGPU for Acceleration of Short DNA Sequence Alignment in Unipro UGENE Project

Application of GPU Computing to Some Urban Traffic Problems

Application of GPU Smooth Particle Hydrodynamics: Wave Runup and Overtopping on Composite Slopes

Application of GPUs for the Calculation of Two Point Correlation Functions in Cosmology

Application of Graphics Processing Units to Search Pipeline for Gravitational Waves from Coalescing Binaries of Compact Objects

Application of performance portability solutions for GPUs and many-core CPUs to track reconstruction kernels

Application of the Characteristic Basis Function Method using CUDA

Application of the Mean Field Methods to MRF Optimization in Computer Vision

Application of the OpenCL API for Implementation of the NIPALS Algorithm for Principal Component Analysis of Large Data Sets

Application Performance Profiling on Intel GPUs with Oneprof and Onetrace

Brief statistics for this page

Titles: 100

Download open PDFs: 96

Package packages: 23

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)