Papers on hgpu.org (.txt-file)
U-Net: Convolutional Networks for Biomedical Image Segmentation

UAV Path Planning with Parallel Genetic Algorithms on CUDA Architecture

uBench: Performance Impact of CUDA Block Geometry

UberFlow: a GPU-based particle engine

Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford

UCHPC – UnConventional High Performance Computing for Finite Element Simulations

Ultra-Fast Detection of Higher-Order Epistatic Interactions on GPUs

Ultra-Fast Displaying Spectral Domain Optical Doppler Tomography System Using a Graphics Processing Unit

Ultra-fast FFT protein docking on graphics processors

Ultra-Fast Hybrid CPU-GPU Multiple Scatter Simulation for 3D PET

Ultra-fast treatment plan optimization for volumetric modulated arc therapy (VMAT)

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Ultrasound goes GPU: real-time simulation using CUDA

Ultrasound Image Simulation with GPU-based Ray Tracing

Uncertainty-Aware Guided Volume Segmentation

Uncluttering Graph Layouts Using Anisotropic Diffusion and Mass Transport

Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units
Under the Hood of SYCL – An Initial Performance Analysis With an Unstructured-mesh CFD Application

Understanding and Modeling the Synchronization Cost in the GPU Architecture

Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric

Understanding GEMM Performance and Energy on NVIDIA Ada Lovelace: A Machine Learning-Based Analytical Approach

Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures

Understanding GPU Triggering APIs for MPI+X Communication

Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations

Understanding Latency Hiding on GPUs

Understanding Performance Portability of Bioinformatics Applications in SYCL on an NVIDIA GPU

Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models

Understanding software approaches for GPGPU reliability

Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors

Understanding the design trade-offs among current multicore systems for numerical computations

Understanding the efficiency of GPU algorithms for matrix-matrix multiplication

Understanding the efficiency of ray traversal on GPUs

Understanding the impact of CUDA tuning techniques for Fermi

Understanding the Impact of Hybrid Programming on Software Energy Efficiency

Understanding the Impact of Input Entropy on FPU, CPU, and GPU Power

Understanding the ISA impact on GPU Architecture

Understanding the Landscape of Ampere GPU Memory Errors

Understanding the Performance of HPC Applications

Understanding the Power of Evolutionary Computation for GPU Code Optimization

Understanding the SIMD Efficiency of Graph Traversal on GPU

Understanding the Topics and Challenges of GPU Programming by Classifying and Analyzing Stack Overflow Posts

Unfolding and Shrinking Neural Machine Translation Ensembles

UNICORN: A Bulk Synchronous Programming Model, Framework and Runtime for Hybrid CPU-GPU Clusters

Unified – A Sharp Turn in the Latest Era of Graphic Processors

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment

Unified Parallel C for GPU Clusters: Language Extensions and Compiler Implementation

Unified Particle Physics for Real-Time Applications

Unified schemes for directive-based GPU offloading

Unified Shader Programming in C++

Unified Shared Memory: Friend or Foe?

Unified system of code transformation and execution for heterogeneous multi-core architectures

Unified Tables for Exponential and Logarithm Families

UniFL: Accelerating Federated Learning Using Heterogeneous Hardware Under a Unified Framework

Uniform partitioning of Monte Carlo radiosity on GPUs
Unifying stream based and reconfigurable computing to design application accelerators

Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization case study

Unlocking Bandwidth for GPUs in CC-NUMA Systems

Unsafe Floating-point to Unsigned Integer Casting Check for GPU Programs

Unstructured grid applications on GPU: performance analysis and improvement

Unsupervised Asset Cluster Analysis Implemented with Parallel Genetic Algorithms on the NVIDIA CUDA Platform

Unsupervised Deep Learning of Incompressible Fluid Dynamics

Unsupervised Markovian Segmentation on Graphics Hardware

Up to 700k GPU cores, Kepler, and the Exascale future for simulations of star clusters around black holes

UPC on MIC: Early Experiences with Native and Symmetric Modes

Urban Regional Seismic Damage Prediction Based On GPU-CPU Hybrid Computing

Usable assembly language for GPUs: a success story

Use NVIDIA CUDA technology to create genetic algorithms with extensive population

Use of Checkpoint-Restart for Complex HEP Software on Traditional Architectures and Intel MIC

Use of CUDA for the Continuous Space Language Model

Use of CUDA Parallel Computing Technology in Modeling of Solid Mineral Deposits

Use of FPGA or GPU-based architectures for remotely sensed hyperspectral image processing

Use of modern GPUs in Design Optimization

Use of Multi-GPU Systems for Larger Than Device FFTs: With Applications in Ultrasound Simulations

Use of Multiple GPUs on Shared Memory Multiprocessors for Ultrasound Propagation Simulations

Use of Multiple GPUs to Speedup the Execution of a Three-Dimensional Computational Model of the Innate Immune System

User-Driven Online Kernel Fusion for SYCL

User’s needs influencing HPC technologies

Uses of GPU Powered Interval Optimization for Parameter Identification in the Context of SO Fuel Cells

Using a GPU to accelerate die and mold fabrication

Using a GPU-CPU architecture to speed up a GA-based real-time system for trading the stock market
Using a GPU, Online Diarization – Offline Diarization

Using AI libraries for Incompressible Computational Fluid Dynamics

Using an OpenCL Framework to Evaluate Interconnect Implementations on FPGAs

Using Artificial Intelligence in Computational Games

Using Butterfly-Patterned Partial Sums to Optimize GPU Memory Accesses for Drawing from Discrete Distributions

Using Commodity Coprocessors for Host Intrusion Detection

Using Commodity Graphics Hardware for Real-Time Digital Hologram View-Reconstruction

Using common graphics hardware for multi-agent traffic simulation with CUDA

Using Compiler Directives for Performance Portability in Scientific Computing: Kernels from Molecular Simulation

Using Compiler Snippets to Exploit Parallelism on Heterogeneous Hardware: A Java Reduction Case Study

Using Compute Unified Device Architecture (CUDA) in Parallelizing Different Digital Image Processing Techniques

Using CUDA architecture for computer simulations of thermomechanical phenomena

Using CUDA Architecture for the Computer Simulation of the Casting Solidification Process

Using CUDA for Exhaustive Password Recovery

Using CUDA GPU to Accelerate the Ant Colony Optimization Algorithm

Using Data Compression for Increasing Efficiency of Data Transfer Between Main Memory and Intel Xeon Phi Coprocessor or NVidia GPU in Parallel DBMS

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Titles: 100
open PDFs: 96
packages: 17
