Papers on hgpu.org (.txt-file)
Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems

Optimization of Lattice Boltzmann Simulations on Heterogeneous Computers

Optimization of linked list prefix computations on multithreaded GPUs using CUDA

Optimization of mapped functions sequences using fusions on GPU

Optimization of massive data applications on heterogeneous architectures

Optimization of Molecular Dynamics Simulation Code and Applications to Biomolecular Systems

Optimization of OpenCL applications on FPGA

Optimization of parallel Genetic Algorithms for nVidia GPUs
Optimization of Pattern Matching Algorithms for Multi- and Many-Core Platforms

Optimization of Ported CFD Kernels on Intel Data Center GPU Max 1550 using oneAPI ESIMD

Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors

Optimization of RAID Erasure Coding Algorithms for Intel Xeon Phi

Optimization of real-time ultrasound PCIe data streaming and OpenCL processing for SAFT imaging

Optimization of solver for gas flow modeling

Optimization of Spatial Convolution in ConvNets on Intel KNL

Optimization of tele-immersion codes

Optimization of the Brillouin operator on the KNL architecture

Optimization of the Gaussian Mixture Model Evaluation on GPU

Optimization of the HEFT algorithm for a CPU-GPU environment

Optimization of the Oktay-Kronfeld Action Conjugate Gradient Inverter

Optimization of the Particle-based Volume Rendering for GPUs with Hiding Data Transfer Latency

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Optimization procedures during parallelization of specialized software for fluid flow simulations

Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units

Optimization solutions for the segmented sum algorithmic function

Optimization strategies for parallel CPU and GPU implementations of a meshfree particle method

Optimization Techniques for CUDA Application

Optimization Techniques for GPU Programming

Optimization Techniques for Mapping Algorithms and Applications onto CUDA GPU Platforms and CPU-GPU Heterogeneous Platforms

Optimization Techniques on GPU: A Survey
Optimization, Specification and Verification of the Prefix Sum Program in an OpenCL Environment

Optimizations and Performance of a Robotics Grasping Algorithm Described in Geometric Algebra

Optimizations in Bioinformatics using GPU Processing on Binary Data
Optimize or Wait? Using llc Fast-Prototyping Tool to Evaluate CUDA Optimizations
Optimize Overall System Performance Through Workload Sequencing for GPUs Data Offloading

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

Optimized Code Generation for Parallel and Polyhedral Loop Nests using MLIR

Optimized Composition: Generating Efficient Code for Heterogeneous Systems from Multi-Variant Components, Skeletons and Containers

Optimized Data Transfers Based on the OpenCL Event Management Mechanism

Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform

Optimized Event-Driven Runtime Systems for Programmability and Performance

Optimized GPU Framework for Pulsed Wave Doppler Ultrasound
Optimized GPU Framework for Speckle Reduction Using Histogram Matching and Region Growing
Optimized GPU Framework for Ultrasound B-Mode Imaging
Optimized GPU Framework for Ultrasound Color Flow Imaging
Optimized GPU Framework for Ultrasound Strain Imaging
Optimized GPU histograms for multi-modal registration
Optimized GPU Implementation and Performance Analysis of HC Series of Stream Ciphers

Optimized GPU simulation of continuous-spin glass models

Optimized HPL for AMD GPU and multi-core CPU usage
Optimized MFCC Feature Extraction on GPU

Optimized Parallel Implementation of Gillespie’s First Reaction Method on Graphics Processing Units

Optimized parallel implementation of pedestrian tracking using HOG features on GPU
Optimized Password Recovery for Encrypted RAR on GPUs

Optimized Pattern-Based Adaptive Mesh Refinement Using GPU

Optimized Private Information Retrieval Protocol Using Graphics Processing Unit With Reduced Accessibility

Optimized Strategies for Mapping Three-dimensional FFTs onto CUDA GPUs

Optimizing 3D Convolutions for Wavelet Transforms on CPUs with SSE Units and GPUs

Optimizing a Biomedical Imaging Orientation Score Framework

Optimizing a Hardware Network Stack to Realize an In-Network ML Inference Application

Optimizing a High Energy Physics (HEP) Toolkit on Heterogeneous Architectures

Optimizing a Near-duplicate Document Detection System with SIMD Technologies

Optimizing a Semantic Comparator using CUDA-enabled Graphics Hardware

Optimizing a shared virtual memory system for a heterogeneous CPU-accelerator platform
Optimizing All-to-All and Allgather Communications on GPGPU Clusters

Optimizing an OpenCL Application for Video Watermarking in FPGAs

Optimizing and Auto-tuning Belief Propagation on the GPU

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

Optimizing ASP.NET with C++ AMP on the GPU

Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM

Optimizing Communication by Compression for Multi-GPU Scalable Breadth-First Searches

Optimizing Communication for Clusters of GPUs

Optimizing CUDA Code By Kernel Fusion – Application on BLAS

Optimizing CUDA Shared Memory Usage

Optimizing data intensive GPGPU computations for DNA sequence alignment

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission

Optimizing dataflow applications on heterogeneous environments

Optimizing Deep CNN-Based Queries over Video Streams at Scale

Optimizing Deep Learning Models For Raspberry Pi

Optimizing exact computation of Betweenness Centrality for CUDA

Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming

Optimizing Full Correlation Matrix Analysis of fMRI Data on Intel Xeon Phi Coprocessors

Optimizing GPU to GPU Communication on Cray XK7

Optimizing GPU Volume Rendering

Optimizing GPU-accelerated Group-By and Aggregation

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

Optimizing High-Performance Linpack for Exascale Accelerated Architectures

Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

Optimizing Krylov Subspace Solvers on Graphics Processing Units

Optimizing Lempel-Ziv Factorization for the GPU Architecture

Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer

Optimizing LZSS Compression on GPGPUs

Optimizing MapReduce for GPUs with effective shared memory usage

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs

Optimizing memory management on heterogeneous systems using polyhedral, compile-time techniques

Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators

Optimizing Monte Carlo radiosity on graphics hardware
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes

Titles: 100
open PDFs: 86
packages: 14
