high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

CAVE-CL: An OpenCL version of the package for detection and quantitative analysis of internal cavities in a system of overlapping balls: application to proteins

CBench: Analyzing Compute Performance for Modern NVIDIA and AMD GPUs

CBESW: sequence alignment on the Playstation 3

CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

CDFC: Collision Detection Based on Fuzzy Clustering for Deformable Objects on GPU’s

Celeris: A GPU-accelerated open source software with a Boussinesq-type wave solver for real-time, interactive simulation and visualization

CELES: CUDA-accelerated simulation of electromagnetic scattering by large ensembles of spheres

Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabled GPU

cellGPU: massively parallel simulations of dynamic vertex models

Cellular automaton for ultra-fast watershed transform on GPU

Cellular genetic algorithms

Cellular Genetic Algorithms and Local Search for 3-SAT problem on Graphic Hardware

Cellular GPU Models to Euclidean Optimization Problems

Cellular Level Agent Based Modelling on the Graphics Processing Unit

Central Force Optimization on a GPU: A case study in high performance metaheuristics using multiple topologies

cf4ocl: a C framework for OpenCL

CFD code adaptation to the FPGA architecture

CFD Simulation of Jet Cooling and Implementation of Flow Solvers in GPU

CFD-based analysis and two-level aerodynamic optimization on Graphics Processing Units

CFMDS: CUDA-based fast multidimensional scaling for genome-scale data

CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

Cg in Two Pages

Cg: a system for programming graphics hardware in a C-like language

CGiS, a new Language for Data-parallel GPU Programming

CGO: G: Intelligent Heuristic Construction with Active Learning

CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection

Chai: Collaborative Heterogeneous Applications for Integrated-architectures

ChainerMN: Scalable Distributed Deep Learning Framework

Challenge benchmarks that must be conquered to sustain the gpu revolution

Challenges Adapting CUDA PIC Codes to multiple GPUs

Challenges and Opportunities in C/C++ Source-To-Source Compilation

Challenges and opportunities of obtaining performance from multi-core CPUs and many-core GPUs

Challenges and Techniques for Transparent Acceleration of Unmodified Big Data Applications

Challenges for a GPU-Accelerated Dynamic Programming Approach for Join-Order Optimization

Challenges for compiler support for exascale computing

Challenges of mapping financial analytics to many-core architecture

Challenges of medical image processing

Challenging cloning related problems with GPU-based algorithms

Challenging Portability Paradigms: FPGA Acceleration Using SYCL and OpenCL

Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation

CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi

Character-level Transformer-based Neural Machine Translation

Charactering and Detecting CUDA Program Bugs

Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

Characterising Bipartite Graph Matching Algorithms on GPUs

Characterization and Analysis of Dynamic Parallelism in Unstructured GPU Applications

Characterization and Exploitation of GPU Memory Systems

Characterization and Performance Analysis for 3D Benchmarks

Characterization and Transformation of Unstructured Control Flow in Bulk Synchronous GPU Applications

Characterization and Transformation of Unstructured Control Flow in GPU Applications

Characterization of FPGA-based High Performance Computers

Characterization of Lossy SIW Resonators Based on Multilayer Perceptron Neural Networks on Graphics Processing Unit

Characterization of OpenCL on a Scalable FPGA Architecture

Characterization of Speech Recognition Systems on GPU Architectures

Characterizing and Enhancing Global Memory Data Coalescing on GPUs

Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems

Characterizing and Improving the Use of Demand-Fetched Caches in GPUs

Characterizing and Optimizing Irregular Applications on Graphics Processing Units

Characterizing and Predicting Scientific Workloads for Heterogeneous Computing Systems

Characterizing CUDA and OpenMP Synchronization Primitives

Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs

Characterizing Deep Learning Training Workloads on Alibaba-PAI

Characterizing Optimizations to Memory Access Patterns using Architecture-Independent Program Features

Characterizing the Challenges and Evaluating the Efficacy of a CUDA-to-OpenCL Translator

Characterizing the Performance of Parallel Data-Compression Algorithms across Compilers and GPUs

Charged particles constrained to a curved surface

CHARM-SYCL: New Unified Programming Environment for Multiple Accelerator Types

Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs

CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications

CheCUDA: A Checkpoint/Restart Tool for CUDA Applications

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

Chest CT automatic analysis for lung nodules detection implemented on a GPU computing system

Chestnut: A GPU Programming Language for Non-Experts

CHO: A Benchmark Suite for OpenCL-based FPGA Accelerators

Brief statistics for this page

Titles: 100

Download open PDFs: 89

Package packages: 32

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)