Papers on hgpu.org (.txt-file)
Analyzing Locality of Memory References in GPU Architectures
Analyzing Memory Accesses for Performance and Correctness of Parallel Programs
Analyzing Optimization Techniques for Power Efficiency on Heterogeneous Platforms
Analyzing Password Strength and Efficient Password Cracking
Analyzing program flow within a many-kernel OpenCL application
Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter
Analyzing Soft-Error Vulnerability on GPGPU Microarchitecture
Analyzing the CUDA Applications with its Latency and Bandwidth Tolerance
Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation
Analyzing Use of OpenCL on the Cell Broadband Engine and a Proposal for OpenCL Extensions
Anatomizing Deep Learning Inference in Web Browsers
Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
Anatomy of High-Performance Many-Threaded Matrix Multiplication
Android Malware Classification Using Parallelized Machine Learning Methods
ANGHABENCH: a Suite with One Million Compilable C Benchmarks for Code-Size Reduction
Animating physically based explosions in real-time
Animation of Orthogonal Texture Patterns for Vector Field Visualization
Anisotropic interfacial tension, contact angles, and line tensions: A graphics-processing-unit-based Monte Carlo study of the Ising model
Anisotropic Kuwahara Filtering on the GPU
Anisotropic mesh coarsening and refinement on GPU architecture
Anomalous behaviour detection using spatiotemporal oriented energies, subset inclusion histogram comparison and event-driven processing
Anomalous metastability in a temperature-driven transition
Anomalous Structure and Scaling of Ring Polymer Brushes
Ansor: Generating High-Performance Tensor Programs for Deep Learning
Anti-parallel Patterns in Fine-grain Data-parallel Programs
ANTS2 package: simulation and experimental data processing for Anger camera type detectors
AnyHLS: High-Level Synthesis with Partial Evaluation
AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs
AnySL: efficient and portable shading for ray tracing
Anytime Algorithms for GPU Architectures
APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics
APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters
APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters
APHOG: A Framework for Fast Object Detection Using Histograms of Oriented Gradients
API-Compiling for Image Hardware Accelerators
APL on GPUs: A TAIL from the Past, Scribbled in Futhark
APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores
APOGEE: adaptive prefetching on GPUs for energy efficiency
Apple Silicon Performance in Scientific Computing
Applicability of GPU Computing for Efficient Merge in In-Memory Databases
Application level energy measurements and models for hybrid platform with accelerators
Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics
Application of Deep-Learning to Compiler-Based Graphs
Application of GPGPU for Acceleration of Short DNA Sequence Alignment in Unipro UGENE Project
Application of GPU Computing to Some Urban Traffic Problems
Application of GPU Smooth Particle Hydrodynamics: Wave Runup and Overtopping on Composite Slopes
Application of GPUs for the Calculation of Two Point Correlation Functions in Cosmology
Application of Graphics Processing Units to Search Pipeline for Gravitational Waves from Coalescing Binaries of Compact Objects
Application of performance portability solutions for GPUs and many-core CPUs to track reconstruction kernels
Application of the Characteristic Basis Function Method using CUDA
Application of the Mean Field Methods to MRF Optimization in Computer Vision
Application of the OpenCL API for Implementation of the NIPALS Algorithm for Principal Component Analysis of Large Data Sets
Application Performance Profiling on Intel GPUs with Oneprof and Onetrace
Application Synthesis and Optimization on Heterogeneous Parallel Processing Systems
Application-guided tool development for architecturally diverse computation
Application-independent accurate mouse placements on surfaces of arbitrary geometry
Applications of Deep Neural Networks
Applications of Linux-Based QT-CUDA Parallel Architecture
Applications of Many-Core Technologies to On-line Event Reconstruction in High Energy Physics Experiments
Applications Performance on GPGPUs with the Fermi Architecture
Applying Contact Angle to a Two-Dimensional Smoothed Particle Hydrodynamics (SPH) model on a Graphics Processing Unit (GPU) Platform
Applying Genetic Algorithms to Tune Heterogeneous Platform Configurations
Applying GPU Dynamic Parallelism to High-Performance Normalization of Gene Expressions
Applying graphics processor acceleration in a software defined radio prototyping environment
Applying Object Oriented Design Patterns to CUDA based Pyramidal Image Blending – An Experience
Applying OOC Techniques in the Reduction to Condensed Form for Very Large Symmetric Eigenproblems on GPUs
Applying software-managed caching and CPU/GPU task scheduling for accelerating dynamic workloads
Applying Source Level Auto-Vectorization to Aparapi Java
Applying the “Simple Accelerator Modelling in MATLAB” (SAMM) Code to High Luminosity LHC Upgrade
Applying the Midas Touch of Reproducibility to High-Performance Computing
Applying the Parallel GPU Model to Radiation Therapy Treatment
Approaches for parallelizing reductions on modern GPUs
Approaches for the Parallelization of Software Implementation of Integer Multiplication
Approximate Belief Propagation by Hierarchical Averaging of Outgoing Messages
Approximate Dynamic Programming and Neural Networks on Game Hardware
Approximate dynamic programming with post-decision states as a solution method for dynamic economic models
Approximate Principal Direction Trees
Approximate Similarity Search for Online Multimedia Services on Distributed CPU-GPU Platforms
Approximate Subdivision Surface Evaluation in the Language of Linear Algebra
Approximation of BEM matrices using GPGPUs
Approximation of Loop Subdivision Surfaces for Fast Rendering
Approximative inference for multivariate functional data on massively parallel processors
APPy: Annotated Parallelism for Python on GPUs
APTCC: Auto Parallelizing Translator From C To CUDA
APUNet: Revitalizing GPU as Packet Processing Accelerator
AQsort: Scalable Multi-Array In-Place Sorting with OpenMP
AQUAgpusph, a free 3D SPH solver accelerated with OpenCL
Aquila 2.0: Software Architecture for Cognitive Robotics
Aquila: An Open-Source GPU-Accelerated Toolkit for Cognitive Robotics Research
Arax: a runtime framework for decoupling applications from heterogeneous accelerators
Arbitrarily large iterative tomographic reconstruction on multiple GPUs using the TIGRE toolbox
Arbitrary dimension Reed-Solomon coding and decoding for extended RAID on GPUs
Arbitrary-Precision Arithmetics on the GPU
ArborX: A Performance Portable Search Library
ARC: Adaptive Ray-tracing with CUDA, a New Ray Tracing Code for Parallel GPUs
ArchesWeather: An efficient AI weather forecasting model at 1.5° resolution
Architecting an LTE Base Station with Graphics Processing Units
Architecting graphics processors for non-graphics compute acceleration
Titles: 100
open PDFs: 95
packages: 24