high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Hybrid Use of OmpSs for a Shock Hydrodynamics Proxy Application

Hybrid Visualization for White Matter Tracts using Triangle Strips and Point Sprites

Hydra: a C++11 framework for data analysis in massively parallel platforms

Hydrodynamic Computation with Hybrid Programming on CPU-GPU Clusters

Hyper neural network on OpenCL

Hypercubic Storage Layout and Transforms in Arbitrary Dimensions using GPUs and CUDA

Hyperfast Parallel–Beam Backprojection

Hyperfast Perspective Cone–Beam Backprojection

Hyperspectral Unmixing on GPUs and Multi-Core Processors: A Comparison

HyPHI – task based hybrid execution C++ library for the Intel Xeon Phi coprocessor

I/O Lower Bounds for Auto-tuning of Convolutions in CNNs

I3DC: Interactive Three-Dimensional Cubes

IA-SpGEMM: An Input-aware Auto-tuning Framework for Parallel Sparse Matrix-Matrix Multiplication

IBM Deep Learning Service

Ice Simulation Using GPGPU

IceCubes GPGPU’s cluster for extensive MC production

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

Identification and Elimination of Platform-Specific Code Smells in High Performance Computing Applications

Identifying scalar behavior in CUDA kernels

Identifying the Key Features of Intel Xeon Phi: A Comparative Approach

IgNet. A Super-precise Convolutional Neural Network

Ignite-GPU: a GPU-enabled in-memory computing architecture on clusters

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

iGPU: Exception Support and Speculative Execution on GPUs

iGUARD: In-GPU Advanced Race Detection

Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout

Ilargi: a GPU Compatible Factorized ML Model Training Framework

Illustrative Rendering of Particle Systems

Illustrative Stream Surfaces

Illustrative Volume Visualization Using GPU-Based Particle Systems

Image and Video Processing on CUDA: State of the Art and Future Directions

Image and Video Processing on GPU: Implementation Scheme, Applications and Future Directions

Image Classification with Pyramid Representation and Rotated Data Augmentation on Torch 7

Image Convolution Processing: a GPU versus FPGA Comparison

Image Denoising Using Wavelet Transform and CUDA

Image Encryption Using Parallel RSA Algorithm on CUDA

Image Noise Removal on Heterogeneous CPU-GPU Configurations

Image Object Tracking System Using Parallel Mean Shift Algorithm

Image parallel processing based on GPU

Image processing algorithm optimization with CUDA for Pure Data

Image processing applications on a low power highly parallel SIMD architecture

Image Processing on Graphical Processing Units for faster DNA Sequencing

Image Processing using Parallel Computing

Image Processing with CUDA

Image reconstruction in digital holographic microscopy on GPU

Image registration on GPU

Image representation by blob and its application in CT reconstruction from few projections

Image segmentation using CUDA implementations of the Runge-Kutta-Merson and GMRES methods

Image selection for improved Multi-View Stereo

Image Space Gathering

Image spatial diffusion on GPUs

Image super-resolution by vectorizing edges

Image Super-Resolution Using Deep Convolutional Networks

Image-based fast three-dimensional leaf modeling

Image-Based Material Restyling with Fast Non-local Means Filtering

Image-Based Proxy Accumulation for Real-Time Soft Global Illumination

Image-Space Caustics and Curvatures

Image-Space Collision Detection Through Alternate Surface Peeling

Image-Space GPU Metaballs for Time-Dependent Particle Data Sets

ImageCL: An Image Processing Language for Performance Portability on Heterogeneous Systems

ImageCL: Language and source-to-source compiler for performance portability, load balancing, and scalability prediction on heterogeneous systems

Impact of asynchronism on GPU accelerated parallel iterative computations

Impact of communication times on mixed CPU/GPU applications scheduling using KAAPI

Impact of data layouts on the efficiency of GPU-accelerated IDW interpolation

Impact of Floating-Point Precision on Boundary Layer Instabilities Modeled on Fermi GPU

Impact of GPU Memory Access Patterns on FDTD

Impact of Modern OpenGL on FPS

Impact of the channel count on the nonlinear tolerance in coherently-detected POLMUX-QPSK modulation

Impact of the Random Number generator quality on particle swarm optimization algorithm running on graphic processor units

Impact of Warp Formation on GPU Performance

Impacts of Parallel Programming on Limited-Resource Hardware

Implementability of shading models for current game engines

Implementation & Parallelisation of FDTD code for Electromagnetic Scattering

Implementation and Analysis of AES Encryption on GPU

Implementation and Evaluation of Recurrence Equation Solvers on GPGPU systems using Rearrangement of Array Configurations

Implementation and Evaluation of Scientific Simulations on High Performance Computing Architectures

Implementation and evaluation of various demons deformable image registration algorithms on GPU

Implementation and Experimental Evaluation of a CUDA Core under Single Event Effects

Implementation and Optimization of Image Processing Algorithms on Embedded GPU

Implementation and optimization of image processing algorithms on handheld GPU

Implementation and Performance Analysis of Many-body Quantum Chemical Methods on the Intel Xeon Phi Coprocessor and NVIDIA GPU Accelerator

Implementation and Performance Analysis of SEAL Encryption on FPGA, GPU and Multi-core Processors

Implementation and performance analysis of the AXPY, DOT, and SpMV functions on Intel Xeon Phi and NVIDIA Tesla using OpenCL

Implementation and Performance Comparison of the Motion Compensation Kernel of the AVS Video Decoder on FPGA, GPU and Multicore Processors

Implementation and performance evaluation of a GPU particle-in-cell code

Implementation and performance evaluation of reconstruction algorithms on graphics processors

Implementation Details of GPU-based Out-of-Core Many-Lights Rendering

Implementation of 2-D Discrete Cosine Transform Algorithm on GPU

Implementation of 3D FFTs Across Multiple GPUs in Shared Memory Environments

Implementation of 3D Monte Carlo PET reconstruction algorithm on GPU

Implementation of 802.11n on 128-CORE Processor

Implementation of a 3GPP LTE turbo decoder accelerator on GPU

Implementation of a distributed real-time video panorama pipeline for creating high quality virtual views

Implementation of a Fast Image Coding and Retrieval System Using a GPU

Implementation of a High Throughput 3GPP Turbo Decoder on GPU

Implementation of a High Throughput Soft MIMO Detector on GPU

Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA

Implementation of a Lattice–Boltzmann method for numerical fluid mechanics using the nVIDIA CUDA technology

Implementation of a motion estimation algorithm for Intel FPGAs using OpenCL

Implementation of a Multi-User Detector for Satellite Return Links on a GPU Platform

Brief statistics for this page

Titles: 100

Download open PDFs: 86

Package packages: 12

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)