high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Evaluation of Two Parallel Finite Element Implementations of the Time-Dependent Advection Diffusion Problem: GPU versus Cluster Considering Time and Energy Consumption

Evenly Spaced Streamlines for Surfaces: An Image-Based Approach

Event-Based OpenMP Tasks for Time-Sensitive GPU-Accelerated Systems

Event-driven gate-level simulation with GP-GPUs

EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models

Evolution of a double-front Rayleigh-Taylor system using a GPU-based high resolution thermal Lattice-Boltzmann model

Evolution of image filters on graphics processor units using Cartesian Genetic Programming

Evolution of Kernels: Automated RISC-V Kernel Optimization with Large Language Models

Evolution of thread-level parallelism in desktop applications

Evolutionary Algorithm for Optimizing Parameters of GPGPU-based Image Segmentation

Evolutionary Clustering on CUDA

Evolutionary Computing on Consumer-Level Graphics Hardware

Evolutionary Quantum Logic Synthesis of Boolean Reversible Logic Circuits Embedded in Ternary Quantum Space using Heuristics

Evolutionary Simulation of Life Using CUDA

Evolving a CUDA kernel from an nVidia template

Evolving CUDA PTX programs by quantum inspired linear genetic programming

Evolving GeneChip correlation predictors on parallel graphics hardware

Evolving gzip matches Kernel from an nVidia CUDA Template

Evolving Neural Networks on GPUs

Evolving Soft Robotic Locomotion in PhysX

EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery

EvoTorch: Scalable Evolutionary Computation in Python

exa-AMD: An Exascale-Ready Framework for Accelerating the Discovery and Design of Functional Materials

EXA2PRO: A Framework for High Development Productivity on Heterogeneous Computing Systems

Exact and complete short read alignment to microbial genomes using GPU programming

Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming

Exact calculation of disconnected loops

Exact diagonalization of quantum lattice models on coprocessors

Exact diagonalization of the Hubbard model on graphics processing units

Exact Selectivity Computation for Modern In-Memory Database Query Optimization

Exact Sparse Matrix-Vector Multiplication on GPU’s and Multicore Architectures

Exact Symbolic-Numeric Computation of Planar Algebraic Curves

Examining the Analytic Structure of Green’s Functions: Massive Parallel Complex Integration using GPUs

Example-based volume illustrations

ExaNBody: a HPC framework for N-Body applications

Exascale Deep Learning for Climate Analytics

Exascale Deep Learning for Scientific Inverse Problems

Executing Dynamic Data Rate Actor Networks on OpenCL Platforms

Executing Process Networks on Heterogeneous Platforms using OpenCL

Execution of Compound Multi-Kernel OpenCL Computations in Multi-CPU/Multi-GPU Environments

Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A

Exercising high-level parallel programming on streams: a systems biology use case

EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system

Expanding the boundaries of GPU computing

Expanding the VPE-qGM Environment Towards a Parallel Quantum Simulation of Quantum Processes Using GPUs

Expansion Techniques for Collisionless Stellar Dynamical Simulations

Experience Applying Fortran GPU Compilers to Numerical Weather Prediction

Experience Migrating OpenCL to SYCL: A Case Study on Searches for Potential Off-Target Sites of Cas9 RNA-Guided Endonucleases on AMD GPUs

Experience of Migrating a Parallel Graph Coloring Program from CUDA to SYCL

Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system

Experience Report: Writing A Portable GPU Runtime with OpenMP 5.1

Experience with Intel’s Many Integrated Core architecture in ATLAS software

Experiences Building an MLIR-based SYCL Compiler

Experiences Developing the OpenUH Compiler and Runtime Infrastructure

Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation

Experiences in Data-Parallel Simulation and Analysis of Complex Systems with Irregular Graph Structures

Experiences in Speeding Up Computer Vision Applications on Mobile Computing Platforms

Experiences in Teaching a Specialty Multicore Computing Course

Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study

Experiences Porting a Molecular Dynamics Code to GPUs on a Cray XK7

Experiences with Achieving Portability across Heterogeneous Architectures

Experiences with Cell-BE and GPU for Tomography

Experiences with High-Level Programming Directives for Porting Applications to GPUs

Experiences with hybrid clusters

Experiences with implementing Kokkos’ SYCL backend

Experiences with Mapping Non-linear Memory Access Patterns into GPUs

Experimental B+-tree for GPU

Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

Experimental Evaluation of Thread Distribution Effects on Multiple Output Errors in GPUs

Experimental Fault-Tolerant Synchronization for Reliable Computation on Graphics Processors

Experimentation Procedure for Offloaded Mini-Apps Executed on Cluster Architectures with Xeon Phi Accelerators

Experiments on Parallel Training of Deep Neural Network using Model Averaging

Experiments with Massively Parallel Matrix Multiplication

Experiments with Single Core, Multi-core, and GPU Based Computation of Cellular Automata

Explainable Deep Behavioral Sequence Clustering for Transaction Fraud Detection

Explicit Cache Management for Volume Ray-Casting on Parallel Architectures

Explicit caching HYB: a new high-performance SpMV framework on GPGPU

Explicit Control of Vector Field Based Shape Deformations

Explicit Fourth-Order Runge-Kutta Method on Intel Xeon Phi Coprocessor

Explicit Integration with GPU Acceleration for Large Kinetic Networks

Explicit platform descriptions for heterogeneous many-core architectures

Explicit Shallow Water Simulations on GPUs: Guidelines and Best Practices

Exploded Views for Volume Data

Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code

Exploiting BSP Abstractions for Compiler Based Optimizations of GPU Applications on multi-GPU Systems

Exploiting co-execution with oneAPI: heterogeneity from a modern perspective

Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU

Exploiting Computational Resources in Distributed Heterogeneous Platforms

Exploiting Computing Power on Graphics Processing Unit

Exploiting Concurrency Patterns with Heterogeneous Task and Data Parallelism

Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs

Exploiting concurrent kernel execution on graphic processing units

Exploiting contextual information for image re-ranking and rank aggregation

Brief statistics for this page

Titles: 100

Download open PDFs: 95

Package packages: 19

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)