high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Multi-grain Parallel Processing of Data-Clustering on Programmable Graphics Hardware

Multi-hetero Acceleration by GPU and FPGA for Astrophysics Simulation on oneAPI Environment

Multi-Kepler GPU vs. Multi-Intel MIC for spin systems simulations

Multi-kernel Data Partitioning with Channel on OpenCL-based FPGAs

Multi-layer depth peeling via fragment sort

Multi-level Debugging for Multi-stage, Parallelizing Compilers

Multi-Level Ewald: A Hybrid Multigrid/Fast Fourier Transform Approach to the Electrostatic Particle-Mesh Problem

Multi-Level Graph Layout on the GPU

Multi-level Parallelism for Incompressible Flow Computations on GPU Clusters

Multi-level Parallelism for Time- and Cost-efficient Parallel Discrete Event Simulation on GPUs

Multi-level Parallelism with MPI and OpenACC for CFD Applications

Multi-level parallelism, global arrays, GPGPU Programming: Unify programming paradigms on Grid computing with efficiency

Multi-level parallelization for hybrid ACO

Multi-level Parallelization of Advanced Video Coding on Hybrid CPU/GPU Platform

Multi-line AI-assisted Code Authoring

Multi-Lingual Speech Recognition with Low-Rank Multi-Task Deep Neural Networks

Multi-mass solvers for lattice QCD on GPUs

Multi-Moment Methods for PDEs and GPUs for Large-Scale Scientific Computations

Multi-Object Geodesic Active Contours (MOGAC): A Parallel Sparse-Field Algorithm for Image Segmentation

Multi-Pass and Frame Parallel Algorithms of Motion Estimation in H.264/AVC for Generic GPU

Multi-platform Linear Algebra

Multi-Platform LU-Decomposition Solution in OpenCL

Multi-scale modeling of nano scale phenomenon using CUDA based HPC setup

Multi-scale neural texture classification using the GPU as a stream processing engine

Multi-scale problems, high performance computing and hybrid numerical methods

Multi-Scale Scheduling Techniques for Signal Processing Systems

Multi-Scale, Multi-Level, Heterogeneous Features Extraction and Classification of Volumetric Medical Images

Multi-Science Applications with Single Codebase – GAMER – for Massively Parallel Architectures

Multi-swarm PSO algorithm for the Quadratic Assignment Problem: a massive parallel implementation on the OpenCL platform

Multi-target DPA attacks: Pushing DPA beyond the limits of a desktop computer

Multi-target vectorization with MTPS C++ generic library

Multi-Tasking Scheduling for Heterogeneous Systems

Multi-Tenant Virtual GPUs for Optimising Performance of a Financial Risk Application

Multi-thread implementations of the lattice Boltzmann method on non-uniform grids for CPUs and GPUs

Multi-Threaded Automatic Integration Using OpenMP and CUDA

Multi-threaded Geant4 on the Xeon-Phi with Complex High-Energy Physics Geometry

Multi-threaded Kernel Offloading to GPGPU Using Hyper-Q on Kepler Architecture

Multi-tier Dynamic Vectorization for Translating GPU Optimizations into CPU Performance

Multi-user real-time speech recognition with a GPU

Multi-view Rendering Approach for Cloud-based Gaming Services

Multi-walk Parallel Pattern Search Approach on a GPU Computing Platform

Multi2Sim: a simulation framework for CPU-GPU computing

Multicore and GPU Algorithms for Nussinov RNA Folding

Multicore and GPU Parallelization of Neural Networks for Face Recognition

Multicore and Manycore Algorithms for Octrees

Multicore architecture and cache optimization techniques for solving graph problems

Multicore bundle adjustment

Multicore Computing: Algorithms, Architectures, and Applications

Multicore performance optimization using partner cores

Multicore Processing for Classification and Clustering Algorithms

Multicore Processing for Clustering Algorithms

Multicore Scheduling of Parallel Real-Time Tasks with Multiple Parallelization Options

Multidimensional Costas Arrays and Their Enumeration Using GPUs and FPGAs

Multidimensional Dataflow Graph Modeling and Mapping for Efficient GPU Implementation

Multidimensional Parallelization for Streaming Text Processing Applications Based on Parabix Framework

Multidimensional upwind hydrodynamics on unstructured meshes using Graphics Processing Units I. Two-dimensional uniform meshes

Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS

Multifold Acceleration of Neural Network Computations Using GPU

Multifrontal computations on GPUs and their multi-core hosts

Multifrontal Factorization of Sparse SPD Matrices on GPUs

Multifrontal Sparse Matrix Factorization on Graphics Processing Units

MultiGPU computing using MPI or OpenMP

Multigrid on GPU: Tackling Power Grid Analysis on parallel SIMT platforms

Multigrid Optimization Methods for High Performance Computing

Multikernel Data Partitioning With Channel on OpenCL-Based FPGAs

Multilayered Abstractions for Partial Differential Equations

Multilevel Granularity Parallelism Synthesis on FPGAs

Multilevel Multidimensional Scaling on the GPU

Multilevel summation of electrostatic potentials using graphics processing units

Multilevel Tile Load Map on Massive Terrain Visualization

Multimodal collaboration and human-computer interaction

Multimodal Image Registration Using GPU Parallel Computing Technology

Multimodality imaging and state-of-art GPU technology in discriminating benign from malignant breast lesions on real time decision support system

Multipattern String Matching On A GPU

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase Fluid Simulations on a Multiple GPGPU PC Using Unsplit Time Integration VSIAM3

Multiple Bounding Boxes Algorithm in Collision Detection and Its Performances in Sequential vs CUDA Parallel Processing

Multiple String Matching on a GPU using CUDAs

Multiple Time Scales Recurrent Neural Network for Complex Action Acquisition

Multiple-GPU Scalability of Phase-Field Simulation for Dendritic Solidification

Multiple-GPUs Algorithm for Lattice Boltzmann Method

Multiple-Tasks on Multiple-Devices (MTMD): Exploiting Concurrency in Heterogeneous Managed Runtimes

Multiprocessing Acceleration of H.264/AVC Motion Estimation Full Search Algorithm under CUDA Architecture

Multireduce and Multiscan on Modern GPUs

Multiresolution Flow Simulations on Multi/many-core Architectures

Multiresolution MIP Rendering of Large Volumetric Data Accelerated on Graphics Hardware

Multiscale Hemodynamics Using GPU Clusters

Multiscale texture synthesis

Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture

Multithreaded Dense Linear Algebra on Asymmetric Multi-core Processors

Multithreaded Transposition of Square Matrices with Common Code for Intel Xeon Processors and Intel Xeon Phi Coprocessors

Multithreading for Visual Effects

MuMax: a new high-performance micromagnetic simulation tool

MUPPET: Optimizing Performance in OpenMP via Mutation Testing

MusaCoder: Native GPU Kernel Generation with Full-Stack Training on Moore Threads GPU

Muscle pushing based skin deformation on GPU

Mutual information computation and maximization using GPU

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems

Brief statistics for this page

Titles: 100

Download open PDFs: 88

Package packages: 11

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)