Papers on hgpu.org (.txt-file)
An efficient midpoint-radius representation format to deal with symmetric fuzzy numbers

An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm

An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor

An Efficient Multiway Mergesort for GPU Architectures

An efficient numerical method for solving the Boltzmann equation in multidimensions

An efficient out-of-core volume rendering method based on ray casting and GPU acceleration
An efficient parallel algorithm for accelerating computational protein design

An Efficient Parallel Algorithm for Graph Isomorphism on GPU using CUDA

An Efficient Parallel Data Clustering Algorithm Using Isoperimetric Number of Trees

An Efficient Parallel GPU Evaluation of Small Angle X-Ray Scattering Profiles

An Efficient Parallel ISODATA Algorithm Based on Kepler GPUs

An Efficient Parallel Motion Estimation Algorithm and X264 Parallelization in CUDA

An Efficient SAR Processor Based on GPU via CUDA
An efficient scheduling scheme using estimated execution time for heterogeneous computing systems

An Efficient Signal Processor of Synthetic Aperture Radar Based on GPU
An Efficient Simulation Environment for Modeling Large-Scale Cortical Processing

An efficient solution for hazardous geophysical flows simulation using GPUs

An efficient stochastic approach to groupwise non-rigid image registration

An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs

An Efficient Work-Distribution Strategy for Gridding Radio-Telescope Data on GPUs

An Efficient WSN Simulator for GPU-Based Node Performance

An Efficient, Automatic Approach to High Performance Heterogeneous Computing

An efficient, model-based CPU-GPU heterogeneous FFT library
An Embedded Stream Processor Core Based on Logarithmic Arithmetic for a Low-Power 3-D Graphics SoC

An Embedding Method for Interactive Simulation on Dynamic Surfaces

An emotionally biased ant colony algorithm for pathfinding in games
An Empirical Performance Evaluation of GPU-Enabled Graph-Processing Systems

An Empirical Study of Intel Xeon Phi

An Empirically Guided Optimization Framework for FPGA OpenCL

An Empirically Optimized Radix Sort for GPU

An End-to-End Programming Model for AI Engine Architectures

An End-to-End System for Unconstrained Face Verifcation with Deep Convolutional Neural Networks

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

An Energy Consumption Model for GPU Computing at Instruction Level

An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

An energy model for graphics processing units
An Energy Optimization of a GPU Application by Grid Design Space Exploration

An Energy-Efficient Heterogeneous System for Embedded Learning and Classification

An Environment to Support GPU and Multicore Programming for Rapid, High Performance, Application Deployment

An EoS-meter of QCD transition from deep learning

An error correction solver for linear systems: Evaluation of mixed precision implementations

An Evaluation of Directive-Based Parallelization on the GPU Using a Parboil Benchmark

An evaluation of GPU acceleration for sparse reconstruction
An Evaluation of the GAMA/StarPU Frameworks for Heterogeneous Platforms: the Progressive Photon Mapping Algorithm

An Evaluative Comparison of Performance Portability across GPU Programming Models

An events based algorithm for distributing concurrent tasks on multi-core architectures

An Evolutionary Approach to Parallel Computing Using GPU

An Evolutionary Optimization Strategy Using Graphics Processing Units to Efficiently Investigate Gene-Gene Interactions in Genetic Association Studies

An Execution Model and Runtime For Heterogeneous Many-Core Systems

An execution model for adaptive load-balancing on multicore and multi-GPU systems

An Execution Model for OpenCL 2.0

An Experiment in Parallelizing the Fast Fourier Transform

An Experimental Distributed Visualization System for Petascale Computing
An experimental study of group-by and aggregation on CPU-GPU processors

An Experimental Study of SYCL Task Graph Parallelism for Large-Scale Machine Learning Workloads

An experimental study on performance portability of OpenCL kernels

An Explicit Algorithm for Porous Media Flow Simulation using GPUs

An exploration of CUDA and CBEA for a gravitational wave data-analysis application (Einstein@Home)

An exploration of CUDA and CBEA for a gravitational wave source-modelling application

An Exploration of OpenCL for a Numerical Relativity Application

An Exploration of OpenCL on Multiple Hardware Platforms for a Numerical Relativity Application

An Exploratory Study of High Performance Graphics Application Programming Interfaces

An extended GPU radiosity solver

An Extensible Component-based Approach to Simulation Systems on Heterogeneous Clusters

An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

An FPGA Accelerator for Molecular Dynamics Simulation Using OpenCL

An FPGA Implementation of Information Theoretic Visual-Saliency System and Its Optimization
An FPGA-based processing pipeline for high definition stereo video

An FPGA-based Torus Communication Network

An FPGA-specific algorithm for direct generation of multi-variate Gaussian random numbers
An hardware architecture for 3D object tracking and motion estimation

An HPC Benchmark Survey and Taxonomy for Characterization

An hybrid AES-256-GCM implementation for NEON CPU & CUDA GPU

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

An image-warping VR-architecture: design, implementation and applications

An implementation and its evaluation of password cracking tool parallelized on GPGPU
An implementation for quad-tree based solid object coloring using CUDA

An implementation of a reordering approach for increasing the product of diagonal entries in a sparse matrix

An Implementation of Coincidence Algorithm on Graphic Processing Units

An Implementation of Conflict-Free Offline Permutation on the GPU

An Implementation of Differential Evolution for Independent Tasks Scheduling on GPU

An implementation of level set based topology optimization using GPU

An Implementation of Real-Time Phased Array Radar Fundamental Functions on a DSP-Focused, High-Performance, Embedded Computing Platform

An implementation of tensor product patch smoothers on GPU

An Implementation of the Discontinuous Galerkin Method on Graphics Processing Units

An Implementation of the Smooth Particle Mesh Ewald Method on GPU Hardware
An implementation of the tile QR factorization for a GPU and multiple CPUs

An implicit multigrid solver for high-order compressible flow simulations on GPUs

An implicit Tensor-Mass solver on the GPU for soft bodies simulation

An Improved CUDA-Based Implementation of Differential Evolution on GPU

An Improved Image Segmentation Algorithm Based on GPU Parallel Computing

An improved implementation of Preconditioned Conjugate Gradient Method on GPU

An Improved Magma Gemm For Fermi Graphics Processing Units

An Improved Monte Carlo Ray Tracing for Large-Scale Rendering in Hadoop

An Improved Parallel Algorithm using GPU for Siting Observers on Terrain

An improved parallel contrast-aware halftoning

An Improved Parallel Implementation of 3D DRIE Simulation on GPU

An improved scheme of an interactive finite element model for 3D soft-tissue cutting and deformation

Titles: 100
open PDFs: 87
packages: 12
