Papers on hgpu.org (.txt-file)
Implementing a Code Generator for Fast Matrix Multiplication in OpenCL on the GPU

Implementing a Finite Difference-Based Real-time Sound Synthesizer using GPUs

Implementing a GPU Programming Model on a non-GPU Accelerator Architecture

Implementing a GPU-Enhanced Cluster for Large-Scale Simulations

Implementing a Photorealistic Rendering System using GLSL

Implementing a Preconditioned Iterative Linear Solver Using Massively Parallel Graphics Processing Units

Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-sigma formats on NVIDIA GPUs

Implementing AES on GPU: Final Report

Implementing an architecture for efficient network traffic processing on modern graphics hardware

Implementing an efficient method of check-pointing on CPU-GPU

Implementing an embedded GPU language by combining translation and generation

Implementing an Interior Point Method for Linear Programs on a CPU-GPU System

Implementing and evaluating an heterogeneous, scalable, tridiagonal linear system solver with OpenCL to target FPGAs, GPUs, and CPUs

Implementing and Evaluating Candidate-Based Invariant Generation

Implementing cartesian genetic programming classifiers on graphics processing units using GPU.NET
Implementing CFD (Computational Fluid Dynamics) in OpenCL for Building Simulation

Implementing Computer Vision Functions with OpenCL on the Qualcomm Adreno 420

Implementing Continuous Integration Software in an Established Computational Chemistry Software Package

Implementing Decision Trees and Forests on a GPU

Implementing Deep Neural Networks for Financial Market Prediction on the Intel Xeon Phi

Implementing density functional theory (DFT) methods on many-core GPGPU accelerators

Implementing Domain-Specific Languages for Heterogeneous Parallel Computing

Implementing Efficient, Portable Computations for Machine Learning

Implementing general matrix-matrix multiplication algorithm on the Intel Xeon Phi Knights Landing Processor

Implementing Genetic Algorithms to CUDA Environment Using Data Parallelization

Implementing implicit OpenMP data sharing on GPUs

Implementing Independent Component Analysis in General-Purpose GPU Architectures
Implementing Interactive 3D Segmentation on CUDA Using Graph-Cuts and Watershed Transformation

Implementing Level-3 BLAS Routines in OpenCL on Different Processing Units

Implementing LNS using filtering units of GPUs

Implementing Machine Learning Algorithms on GPUs for Real-Time Traffic Sign Classification

Implementing mesh-based approaches for deformable objects on GPU

Implementing modular arithmetic using OpenCL

Implementing Molecular Dynamics on Hybrid High Performance Computers – Particle-Particle Particle-Mesh

Implementing molecular dynamics on hybrid high performance computers – short range forces
Implementing Molecular Dynamics on Hybrid High Performance Computers – Three-Body Potentials

Implementing Neural Networks Efficiently

Implementing Open-Source CUDA Runtime

Implementing Parallel SMO to Train SVM on CUDA-Enabled Systems
Implementing Push-Pull Efficiently in GraphBLAS

Implementing QR Factorization Updating Algorithms on GPUs

Implementing sparse matrix-vector multiplication on throughput-oriented processors

Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format
Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU

Implementing Stereo Vision of GPU-Accelerated Scientific Simulations using Commodity Hardware

Implementing Strassen’s Algorithm with CUTLASS on NVIDIA Volta GPUs

Implementing the Approximate Message Passing (AMP) Algorithm on a GPU
Implementing the Himeno benchmark with CUDA on GPU clusters
Implementing the PGI Accelerator model

Implementing the Projected Spatial Rich Features on a GPU

Implementing Ultrasound Beamforming on the GPU using CUDA

Implications of the Turing completeness of reaction-diffusion models, informed by GPGPU simulations on an XBox 360: cardiac arrhythmias, re-entry and the Halting problem

Implicit Adaptive Volume Ray Casting

Implicit and dynamic trees for high performance rendering

Implicit Boundary Control of Vector Field Based Shape Deformations

Implicit Feature-Based Alignment System for Radiotherapy
Implicit Methods for Real-Time simulation of Interactive Waves

Implicit Parallel Time Integrators

Implicit Skinning: Real-Time Skin Deformation with Contact Modeling

Importance of Data Loading Pipeline in Training Deep Neural Networks

Importance of Explicit Vectorization for CPU and GPU Software Performance

Importance Point Projection for GPU-based Final Gathering

Importance sampling algorithms for first passage time probabilities in the infinite server queue

Importance Sampling of Realistic Light Sources

Importance-driven compositing window management

Importance-Driven Isosurface Decimation for Visualization of Large Simulation Data Based on OpenCL

Importance-Driven Particle Techniques for Flow Visualization

Impostors and pseudo-instancing for GPU crowd rendering

Impostors, Pseudo-instancing and Image Maps for GPU Crowd Rendering

Improved automated lattice perturbation theory in background field gauge

Improved Distance Weighted GPU-based 3D Ultrasound Reconstruction Methods

Improved FCM algorithm for Clustering on Web Usage Mining

Improved Finite Difference Schemes for a 3-D Viscothermal Wave Equation on a GPU

Improved GPU Co-processor Sorting Algorithm with Barrier Synchronization

Improved Implementation of Simulation for Membrane Computing on the Graphic Processing Unit

Improved Integral Histogram Algorithm for Big Sized Images in CUDA Environment

Improved Lossless Image Compression Model Using Coefficient Based Discrete Wavelet Transform

Improved OpenCL-based Implementation of Social Field Pedestrian Model

Improved Performance of CaFE and IRIS Model Fitting Using CUDA

Improved Poisson Matting for a Real Time Tele-presence System Using GPU
Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring
Improved Real-Time Stereo on Commodity Graphics Hardware

Improved Row-Grouped CSR Format for Storing of Sparse Matrices on GPU

Improved Sequential & Parallel Designs and Implementations of the Eight Direction Prewitt Edge Detection

Improvement of the fused CUDA kernels performance prediction

Improvement Study of EEMD Decomposition Efficiency Based on CUDA Architecture

Improvements to Physically Based Cloth Simulation

Improving 3D Lattice Boltzmann Method stencil with asynchronous transfers on many-core processors

Improving accuracy for matrix multiplications on GPUs
Improving Atmospheric Model Performance on a Multi-Core Cluster System

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

Improving Cache Locality for GPU-based Volume Rendering

Improving Cache Locality for Ray Casting with CUDA

Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters

Improving Communication Performance in GPU-Accelerated HPC Clusters

Improving CUDA DNA Analysis Software with Genetic Programming

Improving CUDASW++, a Parallelization of Smith-Waterman for CUDA Enabled Devices

Improving energy and power efficiency using NComputing and approaches for predicting reliability of complex computing systems

Improving Energy Efficiency of Basic Linear Algebra Routines on Heterogeneous Systems with Multiple GPUs

Improving Energy Efficiency of GPU based General-Purpose Scientific Computing through Automated Selection of Near Optimal Configurations

Titles: 100
open PDFs: 89
packages: 13
