Papers on hgpu.org (.txt-file)
High-Speed Implementations of Block Cipher ARIA Using Graphics Processing Units
High-Speed Object Detection: Design, Study and Implementation of a Detection Framework using Channel Features and Boosting

High-Speed Private Information Retrieval Computation on GPU

High-Speed Stream-Centric Dense Stereo and View Synthesis on Graphics Hardware
High-Speed Turbo Equalization for GPP-based Software Defined Radios

High-speed volume ray casting with CUDA

High-Throughput All-Atom Molecular Dynamics Simulations Using Distributed Computing
High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms

High-throughput bayesian computing machine with reconfigurable hardware

High-throughput Bayesian network learning using heterogeneous multicore computers

High-throughput Execution of Hierarchical Analysis Pipelines on Hybrid Cluster Platforms

High-Throughput parallel blind Virtual Screening using BINDSURF

High-Throughput Parallel Viterbi Decoder on GPU Tensor Cores

High-throughput protein crystallization on the World Community Grid and the GPU

High-throughput sequence alignment using Graphics Processing Units

High-Throughput Sequence Translation Using CUDA
High-throughput stream categorization and intrusion detection on GPU
High-Throughput Transaction Executions on Graphics Processors

Higher order FEM numerical integration on GPUs with OpenCL

Higher-order CFD and Interface Tracking Methods on Highly-Parallel MPI and GPU systems
Highly accelerated feature detection in proteomics data sets using modern graphics processing units

Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

Highly Efficient Lattice-Boltzmann Multiphase Simulations of Immiscible Fluids at High-Density Ratios on CPUs and GPUs through Code Generation

Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs
Highly interactive computational steering for coupled 3D flow problems utilizing multiple GPUs
Highly Optimized Full GPU-Acceleration of Non-hydrostatic Weather Model SCALE-LES

Highly optimized simulations on single- and multi-GPU systems of 3D Ising spin glass

Highly parallel decoding of space-time codes on graphics processing units

Highly Parallel Rate-Distortion Optimized Intra-Mode Decision on Multicore Graphics Processors

Highly Scalable Multi Objective Test Suite Minimisation Using Graphics Cards

Highly Scalable Multiplication for Distributed Sparse Multivariate Polynomials on Many-core Systems

Hinomiyagura Infrastructure Competiton TDP: Platform of rescue simulation using GPGPU

HIPAcc: A Domain-Specific Language and Compiler for Image Processing

HipBone: A performance-portable GPU-accelerated C++ version of the NekBone benchmark

HipKittens: Fast and Furious AMD Kernels

HIPRT: A Ray Tracing Framework in HIP

HiRace: Accurate and Fast Source-Level Race Checking of GPU Programs

HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

Histogram Computations on GPUs Kernel using Global and Shared Memory Atomics

Historic Learning Approach for Auto-tuning OpenACC Accelerated Scientific Applications

Historygrams: Enabling Interactive Global Illumination in Direct Volume Rendering using Photon Mapping

HLS Portability from Intel to Xilinx: A Case Study

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

HLSDataset: Open-Source Dataset for ML-Assisted FPGA Design using High Level Synthesis

hlslib: Software Engineering for Hardware Design

HOCL: A Family of Embedded Languages

Home-made Diffusion Model from Scratch to Hatch

Homomorphic-Encrypted Volume Rendering

Homunculus Warping: Conveying importance using self-intersection-free non-homogeneous mesh deformation

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

HORIZON: Accelerated General Relativistic Magnetohydrodynamics

Hotspot Analysis Based Partial CUDA Acceleration of HMMER 3.0 on GPGPUs

How a Single Chip Causes Massive Power Bills. GPUSimPow: A GPGPU Power Simulator

How GPUs Can Improve the Quality of Magnetic Resonance Imaging

How much can we gain from Tensor Kernel Fusion on GPUs?

How to Benefit from AMD, Intel and Nvidia Accelerator Technologies in Scilab

How to Correctly Deal With Pseudorandom Numbers in Manycore Environments – Application to GPU programming with Shoverand

How to distribute most efficiently a computation intensive calculation on an Android device to external compute units with an Android API

How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms

How to Render FDTD Computations More Effective Using a Graphics Accelerator
How to scale distributed deep learning?

How to Train BERT with an Academic Budget

How well do STARLAB and NBODY compare? II: Hardware and accuracy

HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU

HPC acceleration of large (min, +) matrix products to compute domination-type parameters in graphs

HPC on the Intel Xeon Phi: Homomorphic Word Searching

HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

HPerf: A Lightweight Profiler for Task Distribution on CPU+GPU Platforms

HPP-Controller: An intra-node controller designed for connecting heterogeneous CPUs
HPVM: A Portable Virtual Instruction Set for Heterogeneous Parallel Systems

HPVM: Heterogeneous Parallel Virtual Machine

HPX – The C++ Standard Library for Parallelism and Concurrency

HSApriori: High Speed Association Rule Mining using Apriori Based Algorithm for GPU

HSPA+/LTE-A Turbo Decoder on GPU and Multicore CPU

HSTREAM: A directive-based language extension for heterogeneous stream computing

HTML5 WebSocket protocol and its application to distributed computing

HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads

Human Re-identification System On Highly Parallel GPU and CPU Architectures

Humanoid navigation planning using future perceptive capability

Hybrid Acceleration of a Molecular Dynamics Simulation Using Short-Ranged Potentials

Hybrid algorithms for efficient Cholesky decomposition and matrix inverse using multicore CPUs with GPU accelerators

Hybrid Algorithms for List Ranking and Graph Connected Components

Hybrid coherence for scalable multicore architectures

Hybrid computational voxelization using the graphics pipeline

Hybrid Core Acceleration of UWB SIRE Radar Signal Processing
Hybrid CPU and GPGPU Volunteer Computing Framework over the Extensible Messaging and Presence Protocol for Prallel Branch and Bound Optimization of Truss Structures

Hybrid CPU-GPU Distributed Framework for Large Scale Mobile Networks Simulation

Hybrid CPU-GPU execution support in the skeleton programming framework SkePU

Hybrid CPU-GPU Framework for Network Motifs

Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods

Hybrid CPU-GPU Implementation of Tracking-Learning-Detection Algorithm

Hybrid CPU-GPU Pipeline Framework

Hybrid CPU/GPU KD-Tree Construction for Versatile Ray Tracing

Titles: 100
open PDFs: 88
packages: 28
