high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Investigation of General-Purpose Computing on Graphics Processing Units and its Application to the Finite Element Analysis of Electromagnetic Problems

Investigation of GPU-based Pattern Matching

Investigation of heterogeneous computing through novel parallel programming platforms

Investigation of Parallel Computation – MPI, CUDA and Parallel Visualization

Investigation of the OpenCL SYCL Programming Model

Investigation of the SYCL for OpenCL Programming Model

Investigation on the Use of GPGPU for Fast Sparse Matrix Factorization

Invitation to a Standard Programming Interface for Massively Parallel Computing Environment: OpenCL

Invited paper: Accelerating neuromorphic vision on FPGAs

IODA: an Input/Output Deep Architecture for image labeling

IP routing processing with graphic processors

IPMACC: Open Source OpenACC to CUDA/OpenCL Translator

IPMACC: Translating OpenACC API to OpenCL

Iris Matching Algorithm on Many-Core Platforms

Iris recognition on GPU with the usage of Non-Negative Matrix Factorization

Iris: First-Class Multi-GPU Programming Experience in Triton

IRIS: Illustrative Rendering for Integral Surfaces

Irradiation Instability at the Inner Edges of Accretion Disks

Irregular algorithms on the Xeon Phi

Irregularity Mitigation and Portability Abstractions for Accelerated Sparse Matrix Factorization

Is GPGPU CCL worth it? A performance comparison between some GPU and CPU algorithms for solving connected components labeling on binary images

Is OpenCL a suitable platform for algorithm development in health care systems?

Is the game worth the candle? Evaluation of OpenCL for object detection algorithm optimization

Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs

ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs

Isocube: Exploiting the Cubemap Hardware

Isolated Scheduling for Distributed Training Tasks in GPU Clusters

Isosurface Extraction and View-Dependent Filtering from Time-Varying Fields Using Persistent Time-Octree (PTOT)

Issues and challenges in compiling for graphics processors

Issues in Heterogenenous GPU Clusters

It’s all about data movement: Optimising FPGA data access to boost performance

Iterative and Predictive Ray-Traced Collision Detection for Multi-GPU Architectures

Iterative CT Reconstruction on the GPU

Iterative GPGPU Linear Solvers for Sparse Matrices

Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

Iterative induced dipoles computation for molecular mechanics on GPUs

Iterative Krylov solution methods for geophysical electromagnetic simulations on throughput-oriented processing units

Iterative layer-based raytracing on CUDA

Iterative Methods for Visualization of Implicit Surfaces On GPU

Iterative optimization methods for efficient image restoration on multicore architectures

Iterative SLE Solvers over a CPU-GPU Platform

Iterative Solution of Linear Systems in Electromagnetics (and not only): Experiences with CUDA

Iterative Statistical Kernels on Contemporary GPUs

iTree: Exploring Time-Varying Data using Indexable Tree

Jacobian-free Newton-Krylov methods with GPU acceleration for computing nonlinear ship wave patterns

Jailbreaking LLM-Controlled Robots

Java on CUDA architecture

Java with Auto-Parallelization on Graphics Coprocessing Architecture

JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python

JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA

JCudaMP: OpenMP/Java on CUDA

JIT-Compilation for Interactive Scientific Visualization

Jit4OpenCL: a compiler from Python to OpenCL

Jitter analysis of PLL-generated clock propagation using Jitter Mitigation techniques with laser voltage probing

Job Parallelism using Graphical Processing Unit individual Multi-Processors and Highly Localised Memory

Job Parallelism using Graphical Processing Unit Individual Multi-Processors and Localised Memory

Join Algorithms on GPUs: A Revisit After Seven Years

Join Execution Using Fragmented Columnar Indices on GPU and MIC

Joint Forces: From Multithreaded Programming to GPU Computing

Joint Training on AMD and NVIDIA GPUs

Joint-MAP Tomographic Reconstruction with Patch Similarity Based Mixture Prior Model

JPEG 2000 Wireless Image Transmission System using Encryption Domain Authentication

JPEG-GPU:: a GPGPU Implementation of JPEG Core Coding Systems

Brief statistics for this page

Titles: 100

Download open PDFs: 88

Package packages: 16

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)