high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Exploiting parallel features of modern computer architectures in bioinformatics: applications to genetics, structure comparison and large graph analysis

Exploiting Parallel Processing Power of GPU for High Speed Frequent Pattern Mining

Exploiting Parallelism in GPUs

Exploiting Parallelism in Iterative Irregular Maxflow Computations on GPU Accelerators

Exploiting Segmentation for Robust 3D Object Matching

Exploiting SIMD extensions for linear image processing with OpenCL

Exploiting Space and Time Coherence in Grid-based Sorting

Exploiting SPMD Horizontal Locality

Exploiting SPMD Horizontal Locality to Improve Memory Efficiency

Exploiting Task Parallelism with OpenCL: A Case Study

Exploiting Task-Parallelism on GPU Clusters via OmpSs and rCUDA Virtualization

Exploiting the Parallelism of Heterogeneous Systems using Dataflow Graphs on Top of OpenCL

Exploiting the Power of GPUs for Asymmetric Cryptography

Exploiting two-level parallelism by aggregating computing resources in task-based applications over accelerator-based machines

Exploiting Unexploited Computing Resources for Computational Logics

Exploiting Uniform Vector Instructions for GPGPU Performance, Energy Efficiency, and Opportunistic Reliability Enhancement

Exploration of Cryptocurrency Mining-Specific GPUs in AI Applications: A Case Study of CMP 170HX

Exploration of cyber-physical systems for GPGPU computer vision-based detection of biological viruses

Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs

Exploration of Multifrontal Method with GPU in Power Flow Computation

Exploration of Optimization Options for Increasing Performance of a GPU Implementation of a Three-Dimensional Bilateral Filter

Exploration of Parallelization Frameworks for Computational Finance

Explorations of the Viability of ARM and Xeon Phi for Physics Processing

Exploratory Data Analysis of Software Repositories via GPU Processing

Exploratory research on embedding CUDA code into hetrogeneous MP-SOC achitectures programmed with the Daedalus framework

Exploring 2D tensor fields using stress nets

Exploring Applications in CUDA

Exploring complex quantum systems with a hybrid CPU-GPU computing platform

Exploring computational capabilities of GPUs using H.264 prediction algorithms

Exploring Computer Vision and Image Processing Algorithms in Teaching Parallel Programming

Exploring CPU-GPU Coherence

Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+GPU

Exploring Design Space of 3D NVM and eDRAM Caches Using DESTINY Tool (open-source code)

Exploring Different Automata Representations for Efficient Regular Expression Matching on GPUs

Exploring Fine-Grained Task-based Execution on Multi-GPU Systems

Exploring FPGA Optimizations to Compute Sparse Numerical Linear Algebra Kernels

Exploring FPGA-specific Optimizations for Irregular OpenCL Applications

Exploring GPGPU Acceleration of Process-Oriented Simulations

Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications

Exploring GPGPUs Workload Characteristics and Power Consumption

Exploring GPU Memory Performance Using Digital Image Processing Algorithms

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

Exploring Graphics Processing Unit (GPU) Resource Sharing Efficiency for High Performance Computing

Exploring graphics processing units as parallel coprocessors for online aggregation

Exploring graphics processor performance for general purpose applications

Exploring Heterogeneous Scheduling using the Task-Centric Programming Model

Exploring High Performance SQL Databases with Graphics Processing Units

Exploring LLVM Infrastructure for Simplified Multi-GPU Programming

Exploring Many-Core Design Templates for FPGAs and ASICs

Exploring Microcontrollers in GPUs

Exploring Multi-level Parallelism for Large-Scale Spiking Neural Networks

Exploring Multiple Dimensions of Parallelism in Junction Tree Message Passing

Exploring Multiple Levels of Performance Modeling for Heterogeneous Systems

Exploring new architectures in accelerating CFD for Air Force applications

Exploring Novel Parallelization Technologies for 3-D Imaging Applications

Exploring Optimisations for the Local Assembly phase of Finite Element Methods on GPUs

Exploring Parallel Algorithms for Volumetric Mass-Spring-Damper Models in CUDA

Exploring Portability and Performance of OpenCL FPGA Kernels on Intel HARPv2

Exploring power efficiency and optimizations targeting heterogeneous applications

Exploring Programming Multi-GPUs using OpenMP & OpenACC-based Hybrid Model

Exploring reconfigurable architectures for explicit finite difference option pricing models

Exploring Reconfigurable Architectures for Tree-Based Option Pricing Models

Exploring Scalability in C++ Parallel STL Implementations

Exploring scalability of FIR filter realizations on Graphics Processing Units

Exploring SIMD for Molecular Dynamics, Using Intel Xeon Processors and Intel Xeon Phi Coprocessors

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

Exploring SYCL for batched kernels with memory allocations

Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API

Exploring the acceleration of Nekbone on reconfigurable architectures

Exploring the Feasibility of Fully Homomorphic Encryption

Exploring The Latency and Bandwidth Tolerance of CUDA Applications

Exploring the Limits of Generic Code Execution on GPUs via Direct (OpenMP) Offload

Exploring the Limits of GPUs With Parallel Graph Algorithms

Exploring the Millennium Run – Scalable Rendering of Large-Scale Cosmological Datasets

Exploring the multiple-GPU design space

Exploring the Multitude of Real-Time Multi-GPU Configurations

Exploring the Optimization Space of Multi-Core Architectures with OpenCL Benchmarks

Exploring the power of GPU’s for training Deep Belief Networks

Brief statistics for this page

Titles: 100

Download open PDFs: 93

Package packages: 12

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)