Papers on hgpu.org (.txt-file)
Optimization of massive data applications on heterogeneous architectures
Optimization of Molecular Dynamics Simulation Code and Applications to Biomolecular Systems
Optimization of OpenCL applications on FPGA
Optimization of parallel Genetic Algorithms for nVidia GPUs
Optimization of Pattern Matching Algorithms for Multi- and Many-Core Platforms
Optimization of Ported CFD Kernels on Intel Data Center GPU Max 1550 using oneAPI ESIMD
Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors
Optimization of RAID Erasure Coding Algorithms for Intel Xeon Phi
Optimization of real-time ultrasound PCIe data streaming and OpenCL processing for SAFT imaging
Optimization of solver for gas flow modeling
Optimization of Spatial Convolution in ConvNets on Intel KNL
Optimization of tele-immersion codes
Optimization of the Brillouin operator on the KNL architecture
Optimization of the Gaussian Mixture Model Evaluation on GPU
Optimization of the HEFT algorithm for a CPU-GPU environment
Optimization of the Oktay-Kronfeld Action Conjugate Gradient Inverter
Optimization of the Particle-based Volume Rendering for GPUs with Hiding Data Transfer Latency
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Optimization procedures during parallelization of specialized software for fluid flow simulations
Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units
Optimization solutions for the segmented sum algorithmic function
Optimization strategies for parallel CPU and GPU implementations of a meshfree particle method
Optimization Techniques for CUDA Application
Optimization Techniques for GPU Programming
Optimization Techniques for Mapping Algorithms and Applications onto CUDA GPU Platforms and CPU-GPU Heterogeneous Platforms
Optimization Techniques on GPU: A Survey
Optimization, Specification and Verification of the Prefix Sum Program in an OpenCL Environment
Optimizations and Performance of a Robotics Grasping Algorithm Described in Geometric Algebra
Optimizations in Bioinformatics using GPU Processing on Binary Data
Optimize or Wait? Using llc Fast-Prototyping Tool to Evaluate CUDA Optimizations
Optimize Overall System Performance Through Workload Sequencing for GPUs Data Offloading
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?
Optimized Code Generation for Parallel and Polyhedral Loop Nests using MLIR
Optimized Composition: Generating Efficient Code for Heterogeneous Systems from Multi-Variant Components, Skeletons and Containers
Optimized Data Transfers Based on the OpenCL Event Management Mechanism
Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform
Optimized Event-Driven Runtime Systems for Programmability and Performance
Optimized GPU Framework for Pulsed Wave Doppler Ultrasound
Optimized GPU Framework for Speckle Reduction Using Histogram Matching and Region Growing
Optimized GPU Framework for Ultrasound B-Mode Imaging
Optimized GPU Framework for Ultrasound Color Flow Imaging
Optimized GPU Framework for Ultrasound Strain Imaging
Optimized GPU histograms for multi-modal registration
Optimized GPU Implementation and Performance Analysis of HC Series of Stream Ciphers
Optimized GPU simulation of continuous-spin glass models
Optimized HPL for AMD GPU and multi-core CPU usage
Optimized MFCC Feature Extraction on GPU
Optimized Parallel Implementation of Gillespie’s First Reaction Method on Graphics Processing Units
Optimized parallel implementation of pedestrian tracking using HOG features on GPU
Optimized Password Recovery for Encrypted RAR on GPUs
Optimized Pattern-Based Adaptive Mesh Refinement Using GPU
Optimized Private Information Retrieval Protocol Using Graphics Processing Unit With Reduced Accessibility
Optimized Strategies for Mapping Three-dimensional FFTs onto CUDA GPUs
Optimizing 3D Convolutions for Wavelet Transforms on CPUs with SSE Units and GPUs
Optimizing a Biomedical Imaging Orientation Score Framework
Optimizing a Hardware Network Stack to Realize an In-Network ML Inference Application
Optimizing a High Energy Physics (HEP) Toolkit on Heterogeneous Architectures
Optimizing a Near-duplicate Document Detection System with SIMD Technologies
Optimizing a Semantic Comparator using CUDA-enabled Graphics Hardware
Optimizing a shared virtual memory system for a heterogeneous CPU-accelerator platform
Optimizing All-to-All and Allgather Communications on GPGPU Clusters
Optimizing an OpenCL Application for Video Watermarking in FPGAs
Optimizing and Auto-tuning Belief Propagation on the GPU
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures
Optimizing ASP.NET with C++ AMP on the GPU
Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM
Optimizing Communication by Compression for Multi-GPU Scalable Breadth-First Searches
Optimizing Communication for Clusters of GPUs
Optimizing CUDA Code By Kernel Fusion – Application on BLAS
Optimizing CUDA Shared Memory Usage
Optimizing data intensive GPGPU computations for DNA sequence alignment
Optimizing Data Locality for Iterative Matrix Solvers on CUDA
Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission
Optimizing dataflow applications on heterogeneous environments
Optimizing Deep CNN-Based Queries over Video Streams at Scale
Optimizing Deep Learning Models For Raspberry Pi
Optimizing exact computation of Betweenness Centrality for CUDA
Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming
Optimizing Full Correlation Matrix Analysis of fMRI Data on Intel Xeon Phi Coprocessors
Optimizing GPU to GPU Communication on Cray XK7
Optimizing GPU Volume Rendering
Optimizing GPU-accelerated Group-By and Aggregation
Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps
Optimizing High-Performance Linpack for Exascale Accelerated Architectures
Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs
Optimizing Krylov Subspace Solvers on Graphics Processing Units
Optimizing Lempel-Ziv Factorization for the GPU Architecture
Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer
Optimizing LZSS Compression on GPGPUs
Optimizing MapReduce for GPUs with effective shared memory usage
Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs
Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs
Optimizing memory management on heterogeneous systems using polyhedral, compile-time techniques
Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators
Optimizing Monte Carlo radiosity on graphics hardware
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs
Optimizing OpenCL Local Work Group Size With Machine Learning
Optimizing Performance and Energy Efficiency in Massively Parallel Systems
Optimizing Performance of Recurrent Neural Networks on GPUs
Titles: 100
open PDFs: 86
packages: 16