high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Implementation of a multigrid solver on GPU for Stokes equations with strongly variable viscosity based on Matlab and CUDA

Implementation of a Parallel Tree Method on a GPU

Implementation of a PIC simulation using WebGL

Implementation of a Power Efficient Synthetic Aperture Radar Back Projection Algorithm on FPGAs Using OpenCL

Implementation of a Practical Distributed Calculation System with Browsers and JavaScript, and Application to Distributed Deep Learning

Implementation of a programming environment with a multithread model for reconfigurable systems

Implementation of a Soft Morphological Filter Based on GPU Framework

Implementation of Advanced Encryption Standard for encryption and decryption of images and text on a GPU

Implementation of algorithms for relativistic hydrodynamics using graphics processing units in CUDA framework

Implementation of algorithms with a fine-grained parallelism on GPUs

Implementation of Ant Colony Algorithm Based on GPU

Implementation of association rule mining using CUDA

Implementation of Autoencoders with Systolic Arrays through OpenCL

Implementation Of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs

Implementation of Diamond Search Algorithm Using Parallel Processing Architecture

Implementation of digital down converter in GPU

Implementation of Fast Artificial Neural Network for Pattern Classification on Heterogeneous System

Implementation of FDTD-Compatible Green’s Function on Heterogeneous CPU-GPU Parallel Processing System

Implementation of Filtering Beamforming Algorithms for Sonar Devices Using GPU

Implementation of float-float operators on graphics hardware

Implementation of Frequency Domain Convolution for the Caffe-Framework

Implementation of high speed hash function Keccak on GPU

Implementation of Jacobi iterative method on graphics processor unit

Implementation of Just In Time Value Specialization for the Optimization of Data Parallel Kernels

Implementation of k-Means Clustering Algorithm in CUDA

Implementation of K-shortest Path Algorithm in GPU Using CUDA

Implementation of Kd-Trees on the GPU to Achieve Real Time Graphics Processing

Implementation of Keccak hash function in Tree hashing mode on Nvidia GPU

Implementation of Kernel Methods on the GPU

Implementation of Kirchhoff prestack depth migration on GPU

Implementation of large-scale FIR adaptive filters on NVIDIA GeForce graphics processing unit

Implementation of LTE Mini receiver on GPUs

Implementation of Massive Artificial Neural Networks with CUDA

Implementation of medical image segmentation in CUDA

Implementation of Motion Estimation Based on Heterogeneous Parallel Computing System with OpenCL

Implementation of Parallel Fast Hartley Transform (FHT) Using Cuda

Implementation of Parallel Genetic Algorithms on Graphics Processing Units

Implementation of Parallel Simplified Swarm Optimization in CUDA

Implementation of PDE models of cardiac dynamics on GPUs using OpenCL

Implementation of QR Updating Algorithms on the GPU

Implementation of random linear network coding on OpenGL-enabled graphics cards

Implementation of Sequential Importance Sampling in GPGPU

Implementation of Smith-Waterman Algorithm in OpenCL for GPUs

Implementation of Smith-Waterman algorithm in OpenCL for GPUs

Implementation of Spectral Angle Mapper (SAM) Algorithm on a Graphic processing unit (GPU)

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Implementation of stereophonic acoustic echo canceller on nVIDIA GeForce graphics processing unit

Implementation of the "Local Rank Differences" Image Feature Using SIMD Instructions of CPU

Implementation of the FDTD Method Based on Lorentz-Drude Dispersive Model on GPU for Plasmonics Applications

Implementation of the genetic algorithm by means of CUDA technology involved in travelling salesman problem

Implementation of the Lucas-Kanade image registration algorithm on a GPU for 3D computational platform stabilisation

Implementation of the Neuberger-Dirac operator on GPUs

Implementation of the optimization algorithms on GPGPU architecture and multi-cores

Implementation of the r.cuda.los module in the open source GRASS GIS by using parallel computation on the NVIDIA CUDA graphic cards

Implementation of the SYCL Heterogeneous Computing Library

Implementation of the twisted mass fermion operator in the QUDA library

Implementation of usual computerized tomography methods on GPU using the Compute Unified Device Architecture (CUDA)

Implementation of Variable Preconditioned GCR with mixed precision on GPU using CUDA

Implementation of Virtual Embryology using the Thrust library for CUDA

Implementation Techniques for SPMD Kernels on CPUs

Implementations of a Parallel Algorithm for Computing Euclidean Distance Map in Multicore Processors and GPUs

Implementations of hardware acceleration for MD4-family algorithms based on GPU

Implementations of Parallel Computation of Euclidean Distance Map in Multicore Processors and GPUs

Implementations of the FFT algorithm on GPU

Implementations of the Hough Transform on the Embedded Multicore Processors

Implementing a Code Generator for Fast Matrix Multiplication in OpenCL on the GPU

Implementing a Finite Difference-Based Real-time Sound Synthesizer using GPUs

Implementing a GPU Programming Model on a non-GPU Accelerator Architecture

Implementing a GPU-Enhanced Cluster for Large-Scale Simulations

Implementing a Photorealistic Rendering System using GLSL

Implementing a Preconditioned Iterative Linear Solver Using Massively Parallel Graphics Processing Units

Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-sigma formats on NVIDIA GPUs

Implementing AES on GPU: Final Report

Implementing an architecture for efficient network traffic processing on modern graphics hardware

Implementing an efficient method of check-pointing on CPU-GPU

Implementing an embedded GPU language by combining translation and generation

Implementing an Interior Point Method for Linear Programs on a CPU-GPU System

Implementing and evaluating an heterogeneous, scalable, tridiagonal linear system solver with OpenCL to target FPGAs, GPUs, and CPUs

Implementing and Evaluating Candidate-Based Invariant Generation

Implementing cartesian genetic programming classifiers on graphics processing units using GPU.NET

Implementing CFD (Computational Fluid Dynamics) in OpenCL for Building Simulation

Implementing Computer Vision Functions with OpenCL on the Qualcomm Adreno 420

Implementing Continuous Integration Software in an Established Computational Chemistry Software Package

Implementing Decision Trees and Forests on a GPU

Implementing Deep Neural Networks for Financial Market Prediction on the Intel Xeon Phi

Implementing density functional theory (DFT) methods on many-core GPGPU accelerators

Implementing Domain-Specific Languages for Heterogeneous Parallel Computing

Implementing Efficient, Portable Computations for Machine Learning

Implementing general matrix-matrix multiplication algorithm on the Intel Xeon Phi Knights Landing Processor

Implementing Genetic Algorithms to CUDA Environment Using Data Parallelization

Implementing implicit OpenMP data sharing on GPUs

Implementing Independent Component Analysis in General-Purpose GPU Architectures

Implementing Interactive 3D Segmentation on CUDA Using Graph-Cuts and Watershed Transformation

Implementing Level-3 BLAS Routines in OpenCL on Different Processing Units

Implementing LNS using filtering units of GPUs

Implementing Machine Learning Algorithms on GPUs for Real-Time Traffic Sign Classification

Implementing mesh-based approaches for deformable objects on GPU

Implementing modular arithmetic using OpenCL

Implementing Molecular Dynamics on Hybrid High Performance Computers – Particle-Particle Particle-Mesh

Implementing molecular dynamics on hybrid high performance computers – short range forces

Brief statistics for this page

Titles: 100

Download open PDFs: 87

Package packages: 12

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)