Papers on hgpu.org (.txt-file)
U-Net: Convolutional Networks for Biomedical Image Segmentation
UAV Path Planning with Parallel Genetic Algorithms on CUDA Architecture
uBench: Performance Impact of CUDA Block Geometry
UberFlow: a GPU-based particle engine
Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford
UCHPC – UnConventional High Performance Computing for Finite Element Simulations
Ultra-Fast Detection of Higher-Order Epistatic Interactions on GPUs
Ultra-Fast Displaying Spectral Domain Optical Doppler Tomography System Using a Graphics Processing Unit
Ultra-fast FFT protein docking on graphics processors
Ultra-Fast Hybrid CPU-GPU Multiple Scatter Simulation for 3D PET
Ultra-fast treatment plan optimization for volumetric modulated arc therapy (VMAT)
Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml
Ultrasound goes GPU: real-time simulation using CUDA
Ultrasound Image Simulation with GPU-based Ray Tracing
Uncertainty-Aware Guided Volume Segmentation
Uncluttering Graph Layouts Using Anisotropic Diffusion and Mass Transport
Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units
Under the Hood of SYCL – An Initial Performance Analysis With an Unstructured-mesh CFD Application
Understanding and Modeling the Synchronization Cost in the GPU Architecture
Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures
Understanding GPU Triggering APIs for MPI+X Communication
Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations
Understanding Latency Hiding on GPUs
Understanding Performance Portability of Bioinformatics Applications in SYCL on an NVIDIA GPU
Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models
Understanding software approaches for GPGPU reliability
Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors
Understanding the design trade-offs among current multicore systems for numerical computations
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
Understanding the efficiency of ray traversal on GPUs
Understanding the impact of CUDA tuning techniques for Fermi
Understanding the Impact of Hybrid Programming on Software Energy Efficiency
Understanding the Impact of Input Entropy on FPU, CPU, and GPU Power
Understanding the ISA impact on GPU Architecture
Understanding the Performance of HPC Applications
Understanding the Power of Evolutionary Computation for GPU Code Optimization
Understanding the SIMD Efficiency of Graph Traversal on GPU
Understanding the Topics and Challenges of GPU Programming by Classifying and Analyzing Stack Overflow Posts
Unfolding and Shrinking Neural Machine Translation Ensembles
UNICORN: A Bulk Synchronous Programming Model, Framework and Runtime for Hybrid CPU-GPU Clusters
Unified – A Sharp Turn in the Latest Era of Graphic Processors
Unified Deep Learning with CPU, GPU, and FPGA Technologies
Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment
Unified Parallel C for GPU Clusters: Language Extensions and Compiler Implementation
Unified Particle Physics for Real-Time Applications
Unified Shader Programming in C++
Unified Shared Memory: Friend or Foe?
Unified system of code transformation and execution for heterogeneous multi-core architectures
Unified Tables for Exponential and Logarithm Families
UniFL: Accelerating Federated Learning Using Heterogeneous Hardware Under a Unified Framework
Uniform partitioning of Monte Carlo radiosity on GPUs
Unifying stream based and reconfigurable computing to design application accelerators
Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization case study
Unlocking Bandwidth for GPUs in CC-NUMA Systems
Unsafe Floating-point to Unsigned Integer Casting Check for GPU Programs
Unstructured grid applications on GPU: performance analysis and improvement
Unsupervised Asset Cluster Analysis Implemented with Parallel Genetic Algorithms on the NVIDIA CUDA Platform
Unsupervised Deep Learning of Incompressible Fluid Dynamics
Unsupervised Markovian Segmentation on Graphics Hardware
Up to 700k GPU cores, Kepler, and the Exascale future for simulations of star clusters around black holes
UPC on MIC: Early Experiences with Native and Symmetric Modes
Urban Regional Seismic Damage Prediction Based On GPU-CPU Hybrid Computing
Usable assembly language for GPUs: a success story
Use NVIDIA CUDA technology to create genetic algorithms with extensive population
Use of Checkpoint-Restart for Complex HEP Software on Traditional Architectures and Intel MIC
Use of CUDA for the Continuous Space Language Model
Use of CUDA Parallel Computing Technology in Modeling of Solid Mineral Deposits
Use of FPGA or GPU-based architectures for remotely sensed hyperspectral image processing
Use of modern GPUs in Design Optimization
Use of Multi-GPU Systems for Larger Than Device FFTs: With Applications in Ultrasound Simulations
Use of Multiple GPUs on Shared Memory Multiprocessors for Ultrasound Propagation Simulations
Use of Multiple GPUs to Speedup the Execution of a Three-Dimensional Computational Model of the Innate Immune System
User-Driven Online Kernel Fusion for SYCL
User’s needs influencing HPC technologies
Uses of GPU Powered Interval Optimization for Parameter Identification in the Context of SO Fuel Cells
Using a GPU to accelerate die and mold fabrication
Using a GPU-CPU architecture to speed up a GA-based real-time system for trading the stock market
Using a GPU, Online Diarization – Offline Diarization
Using AI libraries for Incompressible Computational Fluid Dynamics
Using an OpenCL Framework to Evaluate Interconnect Implementations on FPGAs
Using Artificial Intelligence in Computational Games
Using Butterfly-Patterned Partial Sums to Optimize GPU Memory Accesses for Drawing from Discrete Distributions
Using Commodity Coprocessors for Host Intrusion Detection
Using Commodity Graphics Hardware for Real-Time Digital Hologram View-Reconstruction
Using common graphics hardware for multi-agent traffic simulation with CUDA
Using Compiler Directives for Performance Portability in Scientific Computing: Kernels from Molecular Simulation
Using Compiler Snippets to Exploit Parallelism on Heterogeneous Hardware: A Java Reduction Case Study
Using Compute Unified Device Architecture (CUDA) in Parallelizing Different Digital Image Processing Techniques
Using CUDA architecture for computer simulations of thermomechanical phenomena
Using CUDA Architecture for the Computer Simulation of the Casting Solidification Process
Using CUDA for Exhaustive Password Recovery
Using CUDA GPU to Accelerate the Ant Colony Optimization Algorithm
Using Data Compression for Increasing Efficiency of Data Transfer Between Main Memory and Intel Xeon Phi Coprocessor or NVidia GPU in Parallel DBMS
Using Deep Convolutional Neural Networks in Monte Carlo Tree Search
Using DRBL to Deploy MPICH2 and CUDA on Green Computing
Using efficient parallelization in Graphic Processing Units to parameterize stochastic fire propagation models
Using Fermi architecture knowledge to speed up CUDA and OpenCL programs
Using generalized ensemble simulations and Markov state models to identify conformational states
Titles: 100
open PDFs: 96
packages: 17