Papers on hgpu.org (.txt-file)
Optimized GPU Framework for Ultrasound B-Mode Imaging
Optimized GPU Framework for Ultrasound Color Flow Imaging
Optimized GPU Framework for Ultrasound Strain Imaging
Optimized GPU histograms for multi-modal registration
Optimized GPU Implementation and Performance Analysis of HC Series of Stream Ciphers

Optimized GPU simulation of continuous-spin glass models

Optimized HPL for AMD GPU and multi-core CPU usage
Optimized MFCC Feature Extraction on GPU

Optimized Parallel Implementation of Gillespie’s First Reaction Method on Graphics Processing Units

Optimized parallel implementation of pedestrian tracking using HOG features on GPU
Optimized Password Recovery for Encrypted RAR on GPUs

Optimized Pattern-Based Adaptive Mesh Refinement Using GPU

Optimized Private Information Retrieval Protocol Using Graphics Processing Unit With Reduced Accessibility

Optimized Strategies for Mapping Three-dimensional FFTs onto CUDA GPUs

Optimizing 3D Convolutions for Wavelet Transforms on CPUs with SSE Units and GPUs

Optimizing a Biomedical Imaging Orientation Score Framework

Optimizing a Hardware Network Stack to Realize an In-Network ML Inference Application

Optimizing a High Energy Physics (HEP) Toolkit on Heterogeneous Architectures

Optimizing a Near-duplicate Document Detection System with SIMD Technologies

Optimizing a Semantic Comparator using CUDA-enabled Graphics Hardware

Optimizing a shared virtual memory system for a heterogeneous CPU-accelerator platform
Optimizing All-to-All and Allgather Communications on GPGPU Clusters

Optimizing an OpenCL Application for Video Watermarking in FPGAs

Optimizing and Auto-tuning Belief Propagation on the GPU

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

Optimizing ASP.NET with C++ AMP on the GPU

Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM

Optimizing Communication by Compression for Multi-GPU Scalable Breadth-First Searches

Optimizing Communication for Clusters of GPUs

Optimizing CUDA Code By Kernel Fusion – Application on BLAS

Optimizing CUDA Shared Memory Usage

Optimizing data intensive GPGPU computations for DNA sequence alignment

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission

Optimizing dataflow applications on heterogeneous environments

Optimizing Deep CNN-Based Queries over Video Streams at Scale

Optimizing Deep Learning Models For Raspberry Pi

Optimizing exact computation of Betweenness Centrality for CUDA

Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming

Optimizing Full Correlation Matrix Analysis of fMRI Data on Intel Xeon Phi Coprocessors

Optimizing GPU to GPU Communication on Cray XK7

Optimizing GPU Volume Rendering

Optimizing GPU-accelerated Group-By and Aggregation

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

Optimizing High-Performance Linpack for Exascale Accelerated Architectures

Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

Optimizing Krylov Subspace Solvers on Graphics Processing Units

Optimizing Lempel-Ziv Factorization for the GPU Architecture

Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer

Optimizing LZSS Compression on GPGPUs

Optimizing MapReduce for GPUs with effective shared memory usage

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs

Optimizing memory management on heterogeneous systems using polyhedral, compile-time techniques

Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators

Optimizing Monte Carlo radiosity on graphics hardware
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes

Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs

Optimizing OpenCL Local Work Group Size With Machine Learning

Optimizing Performance and Energy Efficiency in Massively Parallel Systems

Optimizing Performance of Recurrent Neural Networks on GPUs

Optimizing Performance of Stencil Code with SPL Conqueror

Optimizing performance per watt on GPUs in High Performance Computing: temperature, frequency and voltage effects

Optimizing RDF stores by coupling General-purpose Graphics Processing Units and Central Processing Units

Optimizing Real Time GPU Kernels Using Fuzzy Inference System

Optimizing Similarity Computations for Ontology Matching – Experiences from GOMMA

Optimizing simulated annealing on GPU: A case study with IC floorplanning
Optimizing Smith-Waterman algorithm on Graphics Processing Unit
Optimizing Sparse Matrix-Matrix Multiplication for the GPU

Optimizing Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures

Optimizing Stencil Computations for NVIDIA Kepler GPUs

Optimizing strassen matrix multiply on GPUs

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures

Optimizing Sweep3D for Graphic Processor Unit

Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs

Optimizing the Computation of Eigenvalues Using Graphics Processing Units

Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL
Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems

Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units

Optimizing the optimizer increasing performance efficiency of modern compilers

Optimizing the Performance of Parallel and Concurrent Applications Based on Asynchronous Many-Task Runtimes

Optimizing the SUSAN corner detection algorithm for a high speed FPGA implementation
Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee

Optimizing Urban Environmental Simulations using Boinc

Optimizing Web Virtual Reality

Optimizing Xeon Phi for Interactive Data Analysis

OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization

OptiML: An implicitly parallel domain-specific language for machine learning

Optimum Application Deployment Technology for Heterogeneous IaaS Cloud

Option pricing with COS method on graphics processing units
Option pricing with multi-dimensional quadrature architectures

OptiX: a general purpose ray tracing engine

Orca: FSS-based Secure Training with GPUs

Orchestrated Scheduling and Prefetching for GPGPUs

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Orchestrating Thread Scheduling and Cache Management to Improve Memory System Throughput in Throughput Processors

Orchestration by approximation: mapping stream programs onto multicore architectures

Orders-of-magnitude performance increases in GPU-accelerated correlation of images from the International Space Station

Titles: 100
open PDFs: 86
packages: 20
