Papers on hgpu.org (.txt-file)
Analysis of High Level implementations for Recursive Methods on GPUs

Analysis of illumination conditions at the lunar south pole using parallel computing techniques

Analysis of KECCAK Tree Hashing on GPU Architectures

Analysis of Metallic Nanostructures by a Discontinuous Galerkin Time-Domain Maxwell Solver on Graphics Processing Units

Analysis of Multicore CPU and GPU Toward Parallelization of Total Focusing Method Ultrasound Reconstruction

Analysis of Parallel Montgomery Multiplication in CUDA

Analysis of Parallel Sorting Algorithms on Heterogeneous Processors with OpenCL

Analysis of periodic anisotropic media by means of split-field FDTD method and GPU computing

Analysis of periodic structures with GPU accelerating
Analysis of Real-Time Stereo Vision Algorithms On GPU

Analysis of RSA algorithm using GPU programming

Analysis of Single Phase Fluid Flow and Heat Transfer in Slip Flow Regime by Parallel Implementation of Lattice Boltzmann Method on GPUs

Analysis of SuperLU Solvers on Intel MIC Architecture

Analysis of Surface Folding Patterns of DICCCOLS Using the GPU-Optimized Geodesic Field Estimate

Analysis of the Performance of the Fish School Search Algorithm Running in Graphic Processing Units

Analysis-Driven Design of Parallel Floating-Point Matrix Multiplication for Implementation in Reconfigurable Logic

Analysis-driven Engineering of Comparison-based Sorting Algorithms on GPUs

Analytic Anti-Aliasing of Linear Functions on Polytopes

Analytic Antialiasing for Selective High Fidelity Rendering

Analytic Visibility on the GPU

Analytical motion blur rasterization with compression

Analytical Performance Estimation during Code Generation on Modern GPUs

Analytical Study of Various High Performance Computing Paradigms

Analyzing and Improving the Performance of Spatial Database Processing

Analyzing CUDA workloads using a detailed GPU simulator

Analyzing CUDA’s Compiler through the Visualization of Decoded GPU Binaries

Analyzing GPU Performance in Virtualized Environments: A Case Study

Analyzing GPU Tensor Core Potential for Fast Reductions

Analyzing Locality of Memory References in GPU Architectures

Analyzing Memory Accesses for Performance and Correctness of Parallel Programs

Analyzing Modern NVIDIA GPU cores

Analyzing Optimization Techniques for Power Efficiency on Heterogeneous Platforms

Analyzing Password Strength and Efficient Password Cracking

Analyzing program flow within a many-kernel OpenCL application

Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter

Analyzing Soft-Error Vulnerability on GPGPU Microarchitecture

Analyzing the CUDA Applications with its Latency and Bandwidth Tolerance

Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with Protein Database Search

Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation
Analyzing Use of OpenCL on the Cell Broadband Engine and a Proposal for OpenCL Extensions

Anatomizing Deep Learning Inference in Web Browsers

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

Anatomy of High-Performance Many-Threaded Matrix Multiplication

Android Malware Classification Using Parallelized Machine Learning Methods

ANGHABENCH: a Suite with One Million Compilable C Benchmarks for Code-Size Reduction

Animating physically based explosions in real-time
Animation of Orthogonal Texture Patterns for Vector Field Visualization

Anisotropic interfacial tension, contact angles, and line tensions: A graphics-processing-unit-based Monte Carlo study of the Ising model

Anisotropic Kuwahara Filtering on the GPU
Anisotropic mesh coarsening and refinement on GPU architecture

AnnotationGym: A Generic Framework for Automatic Source Code Annotation

Anomalous behaviour detection using spatiotemporal oriented energies, subset inclusion histogram comparison and event-driven processing

Anomalous metastability in a temperature-driven transition

Anomalous Structure and Scaling of Ring Polymer Brushes

Anonymized Network Sensing using C++26 std::execution on GPUs

Ansor: Generating High-Performance Tensor Programs for Deep Learning

Anti-parallel Patterns in Fine-grain Data-parallel Programs

ANTS2 package: simulation and experimental data processing for Anger camera type detectors

AnyHLS: High-Level Synthesis with Partial Evaluation

AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs

AnySL: efficient and portable shading for ray tracing

Anytime Algorithms for GPU Architectures

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

APHOG: A Framework for Fast Object Detection Using Histograms of Oriented Gradients

API-Compiling for Image Hardware Accelerators

APL on GPUs: A TAIL from the Past, Scribbled in Futhark

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

APOGEE: adaptive prefetching on GPUs for energy efficiency

Apple Silicon Performance in Scientific Computing

Applicability of GPU Computing for Efficient Merge in In-Memory Databases

Application level energy measurements and models for hybrid platform with accelerators

Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics

Application of Deep-Learning to Compiler-Based Graphs

Application of GPGPU for Acceleration of Short DNA Sequence Alignment in Unipro UGENE Project

Application of GPU Computing to Some Urban Traffic Problems

Application of GPU Smooth Particle Hydrodynamics: Wave Runup and Overtopping on Composite Slopes

Application of GPUs for the Calculation of Two Point Correlation Functions in Cosmology

Application of Graphics Processing Units to Search Pipeline for Gravitational Waves from Coalescing Binaries of Compact Objects

Application of performance portability solutions for GPUs and many-core CPUs to track reconstruction kernels

Application of the Characteristic Basis Function Method using CUDA

Application of the Mean Field Methods to MRF Optimization in Computer Vision

Application of the OpenCL API for Implementation of the NIPALS Algorithm for Principal Component Analysis of Large Data Sets

Application Performance Profiling on Intel GPUs with Oneprof and Onetrace

Application Synthesis and Optimization on Heterogeneous Parallel Processing Systems

Application-guided tool development for architecturally diverse computation

Application-independent accurate mouse placements on surfaces of arbitrary geometry

Applications of Deep Neural Networks

Applications of Linux-Based QT-CUDA Parallel Architecture

Applications of Many-Core Technologies to On-line Event Reconstruction in High Energy Physics Experiments

Applications Performance on GPGPUs with the Fermi Architecture

Applying Contact Angle to a Two-Dimensional Smoothed Particle Hydrodynamics (SPH) model on a Graphics Processing Unit (GPU) Platform

Applying Genetic Algorithms to Tune Heterogeneous Platform Configurations

Applying GPU Dynamic Parallelism to High-Performance Normalization of Gene Expressions

Applying graphics processor acceleration in a software defined radio prototyping environment

Applying Object Oriented Design Patterns to CUDA based Pyramidal Image Blending – An Experience

Applying OOC Techniques in the Reduction to Condensed Form for Very Large Symmetric Eigenproblems on GPUs

Titles: 100
open PDFs: 96
packages: 21
