high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Improving processing time for visual measurements of displacements of IPMC actuators using CUDA

Improving programmability of heterogeneous many-core systems via explicit platform descriptions

Improving Resource Efficiency in Virtualized Datacenters

Improving Resource Utilization in Heterogeneous CPU-GPU Systems

Improving Scheduling Techniques in Heterogeneous Systems with Dynamic, On-Line Optimisations

Improving SIMD efficiency for parallel Monte Carlo light transport on the GPU

Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels

Improving SMT performance: an application of genetic algorithms to configure resizable caches

Improving Student Learning in Computer Science Courses by Using Virtual OpenCL Laboratory

Improving Synchronization and Data Access in Parallel Programming Models

Improving tasks throughput on accelerators using OpenCL command concurrency

Improving the Efficiency of GPU Clusters

Improving the Efficiency of OpenCL Kernels through Pipes

Improving the GPU space of computation under triangular domain problems

Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs

Improving the Neural GPU Architecture for Algorithm Learning

Improving the Performance of a Ray Tracing Algorithm Using a GPU

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs

Improving the Performance of Fully Connected Neural Networks by Out-of-Place Matrix Transpose

Improving the Performance of Hyperspectral Image and Signal Processing Algorithms Using Parallel, Distributed and Specialized Hardware-Based Systems

Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

Improving the performance of PIR Protocol in Outsourced Databases

Improving the performance of spatial raster analysis in GIS using GPU

Improving the Performance of the Contextual Spaces Re-Ranking Algorithm on Heterogeneous Systems

Improving the Performance of the Linear Systems Solvers Using CUDA

Improving the Performance of the Sparse Matrix Vector Product with GPUs

Improving the Performance, Portability, and Productivity of Hardware Accelerators

Improving the Programmability of GPU Architectures

Improving the scalability of modern applications by parallel multi-core and many-core programming

Improving the speed of neural networks on CPUs

Improving the Speed of Virtual Rear Projection: A GPU-Centric Architecture

Improving the usability of hierarchical representations for interactively labeling large image data sets

In Search of Self-Organization

In Situ Power Analysis of General Purpose Graphical Processing Units

In vivo interactive visualization of four-dimensional blood flow patterns

In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Memory Data Analytics on Coupled CPU-GPU Architectures

In-memory database acceleration on FPGAs: a survey

In-memory grid files on graphics processors

In-Place Recursive Approach for All-Pairs Shortest Paths Problem Using OpenCL

In-process optical characterization method for sub-100-nm nanostructures

In-Situ Statistical Analysis of Autotune Simulation Data using Graphical Processing Units

In-Situ Techniques on GPU-Accelerated Data-Intensive Applications

Incoherent Ray tracing on GPU

Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS

Increased reliability on Intel GPUs via software diverse redundancy

Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition

Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

Increasing Memory Miss Tolerance for SIMD Cores

Increasing precision of uniform pseudorandom number generators

Increasing predictability of GPU’s

Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

Increasing Realism and Supporting Content Planning for Dynamic Scenes in a Mixed Reality System incorporating a Time-of-Flight Camera

Increasing the Accuracy of the Space-Sweeping Approach to Stereo Reconstruction, using Spherical Backprojection Surfaces

Increasing the performance of AllToAll variant of self-organizing migration algorithm using CUDA

Incremental Bounded Model Checking of Artificial Neural Networks in CUDA

Incremental Raycasting of Piecewise Quadratic Surfaces on the GPU

Indexing million of packets per second using GPUs

Indexing of Spatiotemporal Trajectories for Efficient Distance Threshold Similarity Searches on the GPU

Indigo: A Domain-Specific Language for Fast, Portable Image Reconstruction

Industrial Robot Collision Handling in Harsh Environments

Inertial Coupling Method for particles in an incompressible fluctuating fluid

Inertial-aided KLT feature tracking for a moving camera

Inexpensive Immersive Projection

iNFAnt: NFA pattern matching on GPGPU devices

Inferring the Scheduling Policies of an Embedded CUDA GPU

Infiniband-Verbs on GPU: A case study of controlling an Infiniband network device from the GPU

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization

Information Visualization of Multi-dimensional Cellular Automata using GPU Programming

Initial condition for efficient mapping of level set algorithms on many-core architectures

Initial Experiences Porting a Bioinformatics Application to a Graphics Processor

Initial Explorations of ARM Processors for Scientific Computing

Inline Vector Compression for Computational Physics

Innovative prospective of Antenna-Gain removing the pain of EMI engineers

Input Sensitivity of GPU Program Optimizations

Input Space Splitting for OpenCL

Input-Aware Auto-Tuning for Directive-based GPU Programming

Input-Aware Auto-Tuning of Compute-Bound HPC Kernels

Inside VOLT: Designing an Open-Source GPU Compiler

Inside VOLT: Designing an Open-Source GPU Compiler (Tool)

INSPIRE: an interactive image assisted non-photorealistic rendering system

INSTA-YOLO: Real-Time Instance Segmentation

Instructions’ Latencies Characterization for NVIDIA GPGPUs

Instruments of Productivity for High Performance Computing

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Integer sorting on multicores: some (experiments and) observations

Integrated Arrival and Departure Schedule Optimization Under Uncertainty

Integrated Framework for Heterogeneous Embedded Platforms Using OpenCL

Integrated GPUs: how useful are they in HPC?

Integrated Modelling of Hydrodynamic Processes, Faecal Indicator Organisms and Related Parameters with Improved Accuracy using Parallel (GPU) Computing

Integrating a large-scale testing campaign in the CK framework

Integrating Accelerators in Heterogeneous Systems

Integrating GPGPU computations with CPU coroutines in C++

Integrating GPUs as fast co-processors into the existing parallel FE package FEAST

Integrating Multi-GPU Execution in an OpenACC Compiler

Integrating multi-threading and accelerators into DUNE-ISTL

Integrating Object Detection with 3D Tracking Towards a Better Driver Assistance System

Integrating Occlusion Culling with Parallel LOD for Rendering Complex 3D Environments on GPU

Integrating Post-Newtonian Equations on Graphics Processing Units

Brief statistics for this page

Titles: 100

Download open PDFs: 92

Package packages: 15

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)