Papers on hgpu.org (.txt-file)
Profiling based Out-of-core Hybrid Method for Large Neural Networks
Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson – Extended
Profiling General Purpose GPU Applications
Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms
Profiling High Level Heterogeneous Programs: Using the SPOC GPGPU framework for OCaml
Profiling of Data-Parallel Processors
Program Acceleration in a Heterogeneous Computing Environment Using OpenCL, FPGA, and CPU
Program Analysis and Machine Learning based Approach to Predict Power Consumption of CUDA Kernel
Program optimization carving for GPU computing?
Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+
Program Optimization of Stencil Based Application on the GPU-Accelerated System
Program optimization space pruning for a multithreaded gpu
Program Optimization Strategies for Data-Parallel Many-Core Processors
Program Optimization Study on a 128-Core GPU
PROGRAML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations
ProGraML: Graph-based Deep Learning for Program Optimization and Analysis
Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems
Programmability: Design Costs and Payoffs using AMD GPU Streaming Languages and Traditional Multi-Core Libraries
Programmable and Scalable Architecture for Graphics Processing Units
Programmable shaders for deformation rendering
Programming Abstractions and Optimization Techniques for GPU-based Heterogeneous Systems
Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume Method
Programming and Scheduling Model for Supporting Heterogeneous Accelerators in Linux
Programming CUDA and OpenCL: A Case Study Using Modern C++ Libraries
Programming Dense Linear Algebra Kernels on Vectorized Architectures
Programming Embedded Manycore: Refinement and Optimizing Compilation of a Parallel Action Language for Hierarchical State Machines
Programming finite-difference time-domain for graphics processor units using compute unified device architecture
Programming for scientific computing on peta-scale heterogeneous parallel systems
Programming framework for clusters with heterogeneous accelerators
Programming Frameworks for Distributed Smartphone Computing
Programming Future Parallel Architectures with Haskell and Intel ArBB
Programming GPUs with C++14 and Just-In-Time Compilation
Programming Heterogeneous Systems from an Image Processing DSL
Programming Heterogeneous Systems with General and Domain-Specific Frameworks
Programming hybrid systems with implicit memory based synchronization
Programming in CUDA for Kepler and Maxwell Architecture
Programming issues for video analysis on Graphics Processing Units
Programming Massively Parallel Architectures using MARTE: a Case Study
Programming massively parallel processors : A Hands – on approach
Programming Massively Parallel Processors with CUDA (audio course)
Programming model for a heterogeneous x86 platform
Programming Models and Runtimes for Heterogeneous Systems
Programming Models and Scheduling Techniques for Heterogeneous Architectures
Programming Models and Tools for Many-Core Platforms
Programming NVIDIA cards by means of transitive closure based parallelization algorithms
Programming of shared memory GPUs shared memory systems
Programming on Parallel Machines: GPU, Multicore, Clusters and More
Programming video cards for computational electromagnetics applications
Programming with Explicit Dependencies. A Framework for Portable Parallel Programming
Programming-Model Centric Debugging for Multicore Embedded Systems
Progressive Clustering of Big Data with GPU Acceleration and Visualization
Progressive High-Quality Response Surfaces for Visually Guided Sensitivity Analysis
Progressive Photon Mapping on GPUs
Progressive Semantic Segmentation
Projected tetrahedra revisited: a barycentric formulation applied to digital radiograph reconstruction using higher-order attenuation functions
Projectile Monte-Carlo Trajectory Analysis Using a Graphics Processing Unit
Projecting Tetrahedra with a Simplified Basis Graph
PROJECTION Algorithm for Motif Finding on GPUs
Promise of embedded system with GPU in artificial leg control: Enabling time-frequency feature extraction from electromyography
Proposition for propagated occupation grids for non-rigid moving objects tracking
Prospects for scalable 3D FFTs on heterogeneous exascale systems
Prospects of GPGPU in the Auger Offline Software Framework
pROST : A Smoothed Lp-norm Robust Online Subspace Tracking Method for Realtime Background Subtraction in Video
PROST: Parallel robust online simple tracking
Protecting Real-Time GPU Applications on Integrated CPU-GPU SoC Platforms
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs
Proteus: Efficient Resource Use in Heterogeneous Architectures
Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks
Prototyping methodology of image processing applications on heterogeneous parallel systems
Provably Efficient GPU Algorithms
Providing performance portable numerics for Intel GPUs
Providing Source Code Level Portability Between CPU and GPU with MapCG
PSCToolkit: solving sparse linear systems with a large number of GPUs
Pseudo Random Number Generators on Graphics Processing Units, with Applications in Finance
Pseudo-random number generation for Brownian Dynamics and Dissipative Particle Dynamics simulations on GPU devices
Pseudo-Random Number Generation on GP-GPU
Pseudo-random number generators for Monte Carlo simulations on ATI Graphics Processing Units
Pseudo-random number generators for Monte Carlo simulations on Graphics Processing Units
Pseudorandom number generation on the GPU
Pseudorandom Numbers Generation for Monte Carlo Simulations on GPUs: OpenCL Approach
Pseudoscalar Meson in Two Flavors QCD with the Optimal Domain-Wall Fermion
pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations
PTask: Operating System Abstractions To Manage GPUs as Compute Devices
PTX2Kernel: Converting PTX Code into Compilable Kernels
PUGACE, a cellular Evolutionary Algorithm framework on GPUs
Pulsar Acceleration Searches on the GPU for the Square Kilometre Array
Pulsar search acceleration using FPGAs and OpenCL templates
Pulse-coupled neural network performance for real-time identification of vegetation during forced landing
Purine: A bi-graph based deep learning framework
Pushing the Envelope: Extreme Network Coding on the GPU
Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning
Pushing the limits for medical image reconstruction on recent standard multicore processors
Putting Automatic Polyhedral Compilation for GPGPU to Work
pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments
PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI
pyATF: Constraint-Based Auto-Tuning in Python
Titles: 100
open PDFs: 88
packages: 23