high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

A Hierarchical Thread Scheduler and Register File for Energy-efficient Throughput Processors

A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units

A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication

A high performance agent based modelling framework on graphics card hardware with CUDA

A high performance computing for AOM stock trading order matching using GPU

A high performance computing framework for physics-based modeling and simulation of military ground vehicles

A High Performance Framework for Coupled Urban Microclimate Models

A High Performance Image Authentication Algorithm on GPU with CUDA

A High Performance Massively Parallel Approach for Real Time Deformable Body Physics Simulation

A High Performance Parallel FDTD Method Enhanced By Using SSE Instruction Set

A High Performance Parallel Sparse Linear Equation Solver Using CUDA

A High Performance Random Number Generator Using Heterogeneous Computing Platform

A High Quality Reflectance Model in Medical Image Visualization

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

A High-Performance Brownian Bridge for GPUs: Lessons for Bandwidth Bound Applications

A High-Performance Computing Cluster for Distributed Deep Learning: A Practical Case of Weed Classification Using Convolutional Neural Network Models

A high-performance fault-tolerant software framework for memory on commodity GPUs

A High-Performance Multi-user Service System for Financial Analytics Based on Web Service and GPU Computation

A High-Performance Parallel FDTD Method Enhanced by Using SSE Instruction Set

A High-productivity Framework for Multi-GPU computation of Mesh-based applications

A High-resolution approach for Tsunami impact simulation on graphics processing units

A high-speed multi-GPU implementation of bottom-up attention using CUDA

A High-Throughput GPU Framework for Adaptive Lossless Compression of Floating-Point Data

A high-throughput screening approach to discovering good forms of biologically inspired visual representation

A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition

A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization

A Highly Extensible Framework for Molecule Dynamic Simulation on GPUs

A Highly Parallel Reuse Distance Analysis Algorithm on GPUs

A Highly Parameterizable Framework for Conditional Restricted Boltzmann Machine Based Workloads Accelerated With FPGAs and OpenCL

A Highly Scalable Solution of an NP-Complete Problem Using CUDA

A Highly-Efficient Memory-Compression Scheme for GPU-Accelerated Intrusion Detection Systems

A History-Based Performance Prediction Model with Profile Data Classification for Automatic Task Allocation in Heterogeneous Computing Systems

A Human–Machine Collaborative Tuning Framework for Triton Kernel Optimization on SIMD Platforms

A hybrid algorithm for parallel molecular dynamics simulations

A Hybrid Analytical DRAM Performance Model

A Hybrid Approach to Parallel Connected Component Labeling Using CUDA

A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

A Hybrid Computational Grid Architecture for Comparative Genomics

A Hybrid Computing Platform Digital Wideband Receiver Design and Performance Measurement

A hybrid condensed finite element model with GPU acceleration for interactive 3D soft tissue cutting

A hybrid CPU-GPU parallelization scheme of variable neighborhood search for inventory optimization problems

A Hybrid CPU/GPU Cluster for Encryption and Decryption of Large Amounts of Data

A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection

A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation

A Hybrid GPU-FPGA-based Computing Platform for Machine Learning

A Hybrid GPU/CPU FFT Library for Large FFT Problems

A hybrid Hermitian general eigenvalue solver

A Hybrid Method for Computing Apparent Ridges

A Hybrid Multi-GPU Implementation of Simplex Algorithm with CPU Collaboration

A Hybrid Parallel Algorithm for Computing and Tracking Level Set Topology

A hybrid parallel framework for computational solid mechanics

A Hybrid Parallel Implementation of the Aho-Corasick and Wu-Manber Algorithms Using NVIDIA CUDA and MPI Evaluated on a Biological Sequence Database

A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning

A Hybrid Programming Model for Compressible Gas Dynamics Using OpenCL

A Hybrid Software Framework for the GPU Acceleration of Multi-Threaded Monte Carlo Applications

A Hybrid-parallel Architecture for Applications in Bioinformatics

A Hyperelastic Finite-Element Model of Human Skin for Interactive Real-Time Surgical Simulation

A journey from single-GPU to optimized multi-GPU SPH with CUDA

A Kinetic Vlasov Model for Plasma Simulation Using Discontinuous Galerkin Method on Many-Core Architectures

A Language for Describing Optimization Strategies

A Language for Nested Data Parallel Design-space Exploration on GPUs

A Lattice Boltzmann Method Simulator for Microfluidics on GPU Cluster

A Lattice-Preserving Multigrid Method for Solving the Inhomogeneous Poisson Equations Used in Image Analysis

A Light-weight API for Portable Multicore Programming

A Light-Weight Approach to Dynamical Runtime Linking Supporting Heterogenous, Parallel, and Reconfigurable Architectures

A lighting model for fast rendering of forest ecosystems

A Lightweight Approach to Performance Portability with targetDP

A Lightweight, GPU-Based Software RAID System

A Linear Algebra Approach to Fast DNA Mixture Analysis Using GPUs

A linguistic approach to concurrent, distributed, and adaptive programming across heterogeneous platforms

A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster

A local diffusion wavelet approach for scattered data registration based on GPU

A Locality-Aware Memory Hierarchy for Energy-Efficient GPU Architectures

A low-cost 3D human interface device using GPU-based optical flow algorithms

A Low-Cost Solution For Excavator Simulation With Realistic Visual Effect

A low-power handheld GPU using logarithmic arithmetic and triple DVFS power domains

A Low-Power Hybrid CPU-GPU Sort

A low-power integrated x86-64 and graphics processor for mobile computing devices

A Machine-Learning Framework for Design for Manufacturability

A Many Threaded CUDA Interpreter for Genetic Programming

A Many-core Machine Model for Designing Algorithms with Minimum Parallelism Overheads

A map reduce framework for programming graphics processors

A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures

A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

A MapReduce Framework for Heterogeneous Computing Architectures

A Markovian event-based framework for stochastic spiking neural networks

A Massive Data Parallel Computational Framework on Petascale/Exascale Hybrid Computer Systems

A massively multicore parallelization of the Kohn-Sham energy gradients

A Massively Parallel Adaptive Fast Multipole Method on Heterogeneous Architectures

A massively parallel adaptive fast-multipole method on heterogeneous architectures

A Massively Parallel Algorithm for Cell Classification Using CUDA

A massively parallel algorithm for constructing the BWT of large string sets

A Massively Parallel Approach for Nonlinear Interdependency Analysis of Multivariate Signals with GPGPU

A Massively Parallel Architecture for Bioinformatics

A Massively Parallel Associative Memory Based on Sparse Neural Networks

A massively parallel framework using P systems and GPUs

A massively parallel implementation of QC-LDPC decoder on GPU

A massively parallel program to solve the phase field formulation for crack propagation

A master-slave robotic simulator based on GPUDirect

A matrix approach to tomographic reconstruction and its implementation on GPUs

Brief statistics for this page

Titles: 100

Download open PDFs: 88

Package packages: 10

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)