Papers on hgpu.org (.txt-file)
Optimizing Deep Learning Models For Raspberry Pi
Optimizing exact computation of Betweenness Centrality for CUDA
Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming
Optimizing Full Correlation Matrix Analysis of fMRI Data on Intel Xeon Phi Coprocessors
Optimizing GPU to GPU Communication on Cray XK7
Optimizing GPU Volume Rendering
Optimizing GPU-accelerated Group-By and Aggregation
Optimizing High-Performance Linpack for Exascale Accelerated Architectures
Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs
Optimizing Krylov Subspace Solvers on Graphics Processing Units
Optimizing Lempel-Ziv Factorization for the GPU Architecture
Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer
Optimizing LZSS Compression on GPGPUs
Optimizing MapReduce for GPUs with effective shared memory usage
Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs
Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs
Optimizing memory management on heterogeneous systems using polyhedral, compile-time techniques
Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators
Optimizing Monte Carlo radiosity on graphics hardware
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs
Optimizing OpenCL Local Work Group Size With Machine Learning
Optimizing Performance and Energy Efficiency in Massively Parallel Systems
Optimizing Performance of Recurrent Neural Networks on GPUs
Optimizing Performance of Stencil Code with SPL Conqueror
Optimizing performance per watt on GPUs in High Performance Computing: temperature, frequency and voltage effects
Optimizing RDF stores by coupling General-purpose Graphics Processing Units and Central Processing Units
Optimizing Real Time GPU Kernels Using Fuzzy Inference System
Optimizing Similarity Computations for Ontology Matching – Experiences from GOMMA
Optimizing simulated annealing on GPU: A case study with IC floorplanning
Optimizing Smith-Waterman algorithm on Graphics Processing Unit
Optimizing Sparse Matrix-Matrix Multiplication for the GPU
Optimizing Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures
Optimizing Stencil Computations for NVIDIA Kepler GPUs
Optimizing strassen matrix multiply on GPUs
Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures
Optimizing Sweep3D for Graphic Processor Unit
Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs
Optimizing the Computation of Eigenvalues Using Graphics Processing Units
Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL
Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems
Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor
Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units
Optimizing the Performance of Parallel and Concurrent Applications Based on Asynchronous Many-Task Runtimes
Optimizing the SUSAN corner detection algorithm for a high speed FPGA implementation
Optimizing Urban Environmental Simulations using Boinc
Optimizing Web Virtual Reality
Optimizing Xeon Phi for Interactive Data Analysis
OptiML: An implicitly parallel domain-specific language for machine learning
Optimum Application Deployment Technology for Heterogeneous IaaS Cloud
Option pricing with COS method on graphics processing units
Option pricing with multi-dimensional quadrature architectures
OptiX: a general purpose ray tracing engine
Orca: FSS-based Secure Training with GPUs
Orchestrated Scheduling and Prefetching for GPGPUs
Orchestrating Multiple Data-Parallel Kernels on Multiple Devices
Orchestrating Thread Scheduling and Cache Management to Improve Memory System Throughput in Throughput Processors
Orchestration by approximation: mapping stream programs onto multicore architectures
Orders-of-magnitude performance increases in GPU-accelerated correlation of images from the International Space Station
Origami: A Convolutional Network Accelerator
Orion: Interference-aware, Fine-grained GPU Sharing for ML Applications
Orthogonalization on a General Purpose Graphics Processing Unit with Double Double and Quad Double Arithmetic
Orthogononalization on a general purpose graphics processing unit with double double and quad double arithmetic
Orthorectification by Using GPGPU Method
Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs
Out-of-core cone beam reconstruction using multiple GPUs
Out-of-core Implementation for Accelerator Kernels on Heterogeneous Clouds
Out-of-core singular value decomposition
Out-of-core Training for Extremely Large-Scale Neural Networks With Adaptive Window-Based Scheduling
Out-of-the-box library support for DBMS operations on GPUs
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Overcoming the GPU memory limitation on FDTD through the use of overlapping subgrids
Overcomplete Dictionary Learning with Jacobi Atom Updates
Overdetermined Shooting Methods for Computing Standing Water Waves with Spectral Accuracy
Overhauling SC atomics in C11 and OpenCL
Overlapping Computation and Communication for Advection on Hybrid Parallel Computers
Overlapping computation and communication of three-dimensional FDTD on a GPU cluster
Overtaking CPU DBMSes with a GPU in Whole-Query Analytic Processing with Parallelism-Friendly Execution Plan Optimization
Overview of approaches for accelerating scale invariant feature detection algorithm
Overview of implementation of DARPA GPU program in SAIC
OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance
P-HGRMS: A Parallel Hypergraph Based Root Mean Square Algorithm for Image Denoising
PacketShader: a GPU-accelerated software router
Padding Free Bank Conflict Resolution for CUDA-Based Matrix Transpose Algorithm
Pairwise Sequence Alignment for Very Long Sequences on GPUs
Pairwise Sequence Alignment with Gaps with GPU
PAKCK: Performance and Power Analysis of Key Computational Kernels on CPUs and GPUs
Panda: A Compiler Framework for Concurrent CPU-GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures
Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor
Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU
PanJoin: A Partition-based Adaptive Stream Join
PANNA: Properties from Artificial Neural Network Architectures
Pannotia: Understanding Irregular GPGPU Graph Applications
PantaRay: fast ray-traced occlusion caching of massive scenes
PAPER – Accelerating parallel evaluations of ROCS
ParadisEO-MO-GPU: a Framework for Parallel GPU-based Local Search Metaheuristics
Paragon: Collaborative Speculative Loop Execution on GPU and CPU
Titles: 100
open PDFs: 89
packages: 21