high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Feasibility Analysis of Bilateral Filtering by General Purpose Graphical Processing Unit Computing

Feasibility Analysis of Low Cost Graphical Processing Units for Electromagnetic Field Simulations by Finite Difference Time Domain Method

FEAST – Realisation of hardware-oriented Numerics for HPC simulations with Finite Elements

Feature Aligned Volume Manipulation for Illustration and Visualization

Feature based terrain generation using diffusion equation

Feature Extraction and Visualization from Higher-Order CFD Data

Feature Generation for Quantification of Visual Similarity

Feature tracking and matching in video using programmable graphics hardware

Feature Tracking in Time-Varying Volumetric Data through Scale Invariant Feature Transform

Feature-based speed limit sign detection using a graphics processing unit

Feature-preserving triangular geometry images for level-of-detail representation of static and skinned meshes

FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10

FELARE: Fair Scheduling of Machine Learning Applications on Heterogeneous Edge Systems

Fermi GF100 GPU Architecture

Ferrofluid Simulations with the Barnes-Hut Algorithm on Graphics Processing Units

Feynman Machine: The Universal Dynamical Systems Computer

FFT and Convolution Performance in Image Filtering on GPU

FFT Implementation on a Streaming Architecture

FFT Parallel Implementation for MRI Image Reconstruction

FFT-SPA Non-Binary LDPC Decoding on GPU

FIELA: A Fast Image Encryption with Lorenz Attractor using Hybrid Computing

Field modelling acceleration on ultrasonic systems using graphic hardware

FIESTA 4: optimized Feynman integral calculations with GPU support

FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification

File I/O on Intel Xeon Phi Coprocessors: RAM disks, VirtIO, NFS and Lustre

Filtered Blending: A new, minimal Reconstruction Filter for Ghosting-Free Projective Texturing with Multiple Images

Final Project Implementing Extremely Randomized Trees in CUDA

Financial Derivatives Modeling Using GPU’s

Financial modeling on the cell broadband engine

Finding Convex Hulls Using Quickhull on the GPU

Finding faint HI structure in and around galaxies: scraping the barrel

Finding Longest Common Subsequences by GPU-Based Parallel Ant Colony Optimization

Finding Missed Code Size Optimizations in Compilers using LLMs

Finding Next Best Views for Autonomous UAV Mapping through GPU-Accelerated Particle Simulation

Finding the Force – Consistent Particle Seeding for Satellite Aerodynamics

Finding, Measuring, and Reducing Inefficiencies in Contemporary Computer Systems

Fine-Grain Acceleration of Graph Algorithms on a Heterogeneous Chip

Fine-grain Parallelism using Multi-core, Cell/BE, and GPU Systems

Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function

Fine-grain Task Aggregation and Coordination on GPUs

Fine-grained Parallel ILU Preconditioners with Fill-ins for Multi-core CPUs and GPUs

Fine-Grained Parallel Incomplete LU Factorization

Fine-grained parallelization of a Vlasov-Poisson application on GPU

Fine-Grained Resource Sharing for Concurrent GPGPU Kernels

Fine-Grained Synchronizations and Dataflow Programming on GPUs

Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation

Fine-Granular Parallel EBCOT and Optimization with CUDA for Digital Cinema Image Compression

Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit

Fine-Tuning GPT-5 for GPU Kernel Generation

Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors: LU Decomposition of Small Matrices

Fingerprint grid enhancement on GPU

Fingerprint Local Invariant Feature Extraction on GPU with CUDA

Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors

Finite Difference Time-Domain Modelling of Metamaterials: GPU Implementation of Cylindrical Cloak

Finite differences numerical method for two-dimensional superlattice Boltzmann transport equation and case comparison of CPU(C) and GPGPU(CUDA) implementations

Finite element assembly strategies on multi-and many-core architectures

Finite Element Integration on GPUs

Finite Element Integration with Quadrature on the GPU

Finite Element Matrix Generation on a GPU

Finite Element Modelling of Prostate Deformation and Needle-Tissue Interactions

Finite element numerical integration for first order approximations on multi-core architectures

Finite Element Numerical Integration on Xeon Phi coprocessor

Finite Pointset Method for 2D Dam-Break Problem with GPU-Acceleration

Finite temperature lattice QCD with GPUs

Finite Volume Errors in B_K

Finite-difference time-domain simulations of metamaterials

Finite-difference time-domain solver for room acoustics using graphics processing units

Finite-size scaling method for the Berezinskii-Kosterlitz-Thouless transition

FIR filtering and AES encryption with OpenCL 2.0

Fireflies: New software for interactively exploring dynamical systems using GPU computing

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

Firepile: Run-time Compilation for GPUs in Scala

First Application of Lattice QCD to Pezy-SC Processor

First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

First Experiences Optimizing Smith-Waterman on Intel’s Knights Landing Processor

First experiences with the Intel MIC architecture at LRZ

First Steps Towards More Numerical Reproducibility

Fitting Galaxies on GPUs

Fitting multi-planet transit models to photometric time-data series by evolution strategies

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs

FLASH: Fast All-to-All Communication in GPU Clusters

FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

Flashlight: Enabling Innovation in Tools for Machine Learning

FlexGrip: A Soft GPGPU for FPGAs

Flexible FPGA design for FDTD using OpenCL

Flexible Hardware Mapping for Finite Element Simulations on Hybrid CPU / GPU Clusters

Flexible Linear Algebra Development and Scheduling with Cholesky Factorization

Flexible N-Way MIMO Detector on GPU

Flexible neuronal network simulation framework using code generation for NVidia CUDA

Flexible OpenCL accelerated disparity estimation for video communication applications

Flexible Performant GEMM Kernels on GPUs

Flexible Pixel Compositor for Plug-and-Play Multi-Projector Displays

Flexible Software Profiling of GPU Architectures

Flexible, Fast and Accurate Sequence Alignment Profiling on GPGPU with PaSWAS

Flexible, high performance convolutional neural networks for image classification

FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System

FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing

Flip-Flop: Convex Hull Construction via Star-Shaped Polyhedron in 3D

Floating Point Arithmetic for Transport Triggered Architectures

Brief statistics for this page

Titles: 100

Download open PDFs: 94

Package packages: 16

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)