high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Parallelization of binary and real-coded genetic algorithms on GPU using CUDA

Parallelization of BVH and BSP on the GPU

Parallelization of calculations using GPU in optimization approach for macromodels construction

Parallelization of cellular neural networks on GPU

Parallelization of Coherent Point Drift for patient registration

Parallelization of Data Intensive Code Using Computer Unified Device Architecture (CUDA)

Parallelization of DIRA and CTmod using OpenMP and OpenCL

Parallelization of DNA alignment algorithms using GPUs

Parallelization of Encryption and Hashing Algorithm Using GPU

Parallelization of Hierarchical Text Clustering on Multi-core CUDA Architecture

Parallelization of KMP String Matching Algorithm on Different SIMD architectures: Multi-Core and GPGPU’s

Parallelization of maximum likelihood fits with OpenMP and CUDA

Parallelization of Mesh Contraction and Fairing using OpenCL

Parallelization of Multipattern Matching on GPU

Parallelization of Myers Fast Bit-Vector Algorithm using GPGPU

Parallelization of PageRank on Multicore Processors

Parallelization of Particle Filter Algorithms

Parallelization of RSA Algorithm Based on Compute Unified Device Architecture

Parallelization of SAT Algorithms on GPUs

Parallelization of Shape Diameter Function Computation using OpenCL

Parallelization of Single Threaded Applications using OpenMP and CUDA/C

Parallelization of specialized fluid flow simulator based on lattice Boltzmann method on a multi GPU system

Parallelization of Synthetic Aperture Radar (SAR) Imaging Algorithms on GPU

Parallelization of tau-leap coarse-grained Monte Carlo simulations on GPUs

Parallelization of the Algorithm WHAM with NVIDIA CUDA

Parallelization of the Ant Colony Optimization for the Shortest Path Problem using OpenMP and CUDA

Parallelization of the Cuckoo Search using CUDA Architecture

Parallelization of the distinct lattice spring model

Parallelization of the Generalized Hough Transform on GPU

Parallelization of the Honeybee Search Algorithm for Object Tracking

Parallelization of the Local Threshold and Boolean Function Based Edge Detection Algorithm Using CUDA

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

Parallelization of the Symmetric Indefinite Factorization

Parallelization of the x264 encoder using OpenCL

Parallelization of Weighted Sequence Comparison by using EBWT

Parallelization Research of Circle Detection Based on Hough Transform

Parallelization Strategies for Ant Colony Optimisation on GPUs

Parallelization Strategies for Local Search Algorithms on Graphics Processing Units

Parallelization Strategies of the Canny Edge Detector for Multi-core CPUs and Many-core GPUs

Parallelization techniques of the x264 video encoder

Parallelization the Job-shop Problem on Distributed and Shared Memory Architectures

Parallelization with Different API on Multicore Architecture

Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis

Parallelize L-BFGS-B on the GPU

Parallelized agent-based simulation on CPU and graphics hardware for spatial and stochastic models in biology

Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU

Parallelized generation of photon texture and real-time rendering on GPU

Parallelized Hierarchical Expected Matching Probability for Multiple Sequence Alignment

Parallelized Incomplete Poisson Preconditioner in Cloth Simulation

Parallelized Kendall’s Tau Coefficient Computation via SIMD Vectorized Sorting On Many-Integrated-Core Processors

Parallelized Local Volatility Estimation Using GP-GPU Hardware Acceleration

Parallelized Physical Optics computations for Scattering Center Models in radio channel simulations

Parallelized Seeded Region Growing using CUDA

Parallelized Segmentation of CT-Angiography datasets using CUDA

Parallelized Vlasov-Fokker-Planck solver for desktop personal computers

Parallelizing a high-order WENO scheme for complicated flow structures on GPU and MIC

Parallelizing AES on multicores and GPUs

Parallelizing Alternating Direction Implicit Solver on GPUs

Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicores

Parallelizing Exact and Approximate String Matching via Inclusive Scan on a GPU

Parallelizing flow-accumulation calculations on graphics processing units – From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

Parallelizing FPGA Technology Mapping Using Graphics Processing Units (GPUs)

Parallelizing fuzzy rule generation using GPGPU

Parallelizing General Histogram Application for CUDA Architectures

Parallelizing Kernel Polynomial Method Applying Graphics Processing Units

Parallelizing LINQ Program for GPGPU

Parallelizing Map Projection of Raster Data on Multi-core CPU and GPU Parallel Programming Frameworks

Parallelizing Motion JPEG 2000 with CUDA

Parallelizing Multicore Cache Simulations using Heterogeneous Computing on General Purpose and Graphics Processors

Parallelizing Multiple Flow Accumulation Algorithm using CUDA and OpenACC

Parallelizing of digital signal processing with using GPU

Parallelizing Peptide-Spectrum scoring using modern graphics processing units

Parallelizing Simulated Annealing-Based Placement Using GPGPU

Parallelizing the cellular potts model on GPU and multi-core CPU: An OpenCL cross-platform study

Parallelizing the Cellular Potts Model on graphics processing units

Parallelizing the Edge application for GPU-based systems using the SkePU skeleton programming library

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Parallelizing Word2Vec in Multi-Core and Many-Core Architectures

Parallelizing Word2Vec in Shared and Distributed Memory

ParallelKittens: Systematic and Practical Simplification of Multi-GPU AI Kernels

Parameter Selection and Pre-Conditioning for a Graph Form Solver

Parameter Tuning of a Hybrid Treecode-FMM on GPUs

Parameterized Verification of GPU Kernel Programs

Parametric Flows: Automated Behavior Equivalencing for Symbolic Analysis of Races in CUDA Programs

Parametric GPU Code Generation for Affine Loop Programs

Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing

ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks

PARIS: A Parallel RSA-Prime Inspection Tool

Parle: parallelizing stochastic gradient descent

ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data

PARRAY: A Unifying Array Representation for Heterogeneous Parallelism

Parsing in Parallel on Multiple Cores and GPUs

Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network

ParTeCL: parallel testing using OpenCL

Partial Demosaicing for Stereo Matching of CFA Images on GPU and CPU

Partial Parallelization of the Successive Projections Algorithm using Compute Unified Device Architecture

Partial Volume Effect Correction using Anisotropic Backward Diffusion

Partial wave analysis at BES III harnessing the power of GPUs

Partial Wave Analysis using Graphics Cards

PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs

Brief statistics for this page

Titles: 100

Download open PDFs: 89

Package packages: 18

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)