Papers on hgpu.org (.txt-file)
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient similarity search on multimedia databases
Efficient simulation of agent-based models on multi-GPU and multi-core clusters
Efficient Simulation of Fluid Flow and Transport in Heterogeneous Media Using Graphics Processing Units (GPUs)
Efficient simulation of large-scale spiking neural networks using CUDA graphics processors
Efficient Simulation of Ocean and Land Scenes Based on Digital Earth
Efficient Simulation Techniques for Large-Scale Applications
Efficient simulations of long wave propagation and runup using a LBM approach on GPGPU hardware
Efficient softmax approximation for GPUs
Efficient Sparse Matrix-Vector Multiplication on CUDA
Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format
Efficient Sparse Matrix-Vector Multiplication on x86-Based Many-Core Processors
Efficient sparse voxel octrees
Efficient Sparse Voxel Octrees – Analysis, Extensions, and Implementation
Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format
Efficient Spatial Anti-Aliasing Rendering for Line Joins on Vector Maps
Efficient Spatial Binning on the GPU
Efficient spectral and pseudospectral algorithms for 3D simulations of whistler-mode waves in a plasma
Efficient Stack-less BVH Traversal for Ray Tracing
Efficient Static and Dynamic Memory Management Techniques for Multi-GPU Systems
Efficient stream reduction on the GPU
Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures
Efficient Surface Reconstruction From Noisy Data Using Regularized Membrane Potentials
Efficient SVM Training Using Parallel Primal-Dual Interior Point Method on GPU
Efficient Synchronization Primitives for GPUs
Efficient Target and Application Specific Selection and Ordering of Compiler Passes
Efficient Triangle and Quadrilateral Clipping within Shaders
Efficient Two-Level Preconditionined Conjugate Gradient Method on the GPU
Efficient Use of In-Game Ray-Tracing Techniques
Efficient Video Compression via Content-Adaptive Super-Resolution
Efficient Virtual Shadow Maps for Many Lights
Efficient visual hull computation for real-time 3D reconstruction using CUDA
Efficient Volume Rendering in CUDA Path Tracer
Efficient Wave Propagation in Discontinuous Media and Complex Geometry for Many-core Architectures
Efficient Weighted Histogramming on GPUs with CUDA
Efficient Workload Balancing on Heterogeneous GPUs using Mixed-Integer Non-Linear Programming
Efficient XML Path Filtering Using GPUs
Efficient, High-Quality Bayer Demosaic Filtering on GPUs
EfficientBioAI: Making Bioimaging AI Models Efficient in Energy, Latency and Representation
Efficiently Computing Tensor Eigenvalues on a GPU
Efficiently GPU-accelerating long kernel convolutions in 3-D DIRECT TOF PET reconstruction via a kernel decomposition scheme
Efficiently Mapping the AES Encryption Algorithm on GPUs
Efficiently Processing Large Relational Joins on GPUs
Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Efficiently Using a CUDA-enabled GPU as Shared Resource
eGPU: A 750 MHz Class Soft GPGPU for FPGA
EIE: Efficient Inference Engine on Compressed Deep Neural Network
EigenCFA: accelerating flow analysis with GPUs
Eigentransport for efficient and accurate all-frequency relighting
Elastic deep learning in multi-tenant GPU cluster
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts
Elastic stream cloud (ESC): A stream-oriented cloud computing platform for Rich Internet Application
Elastically Deformable Models based on the Finite Element Method Accelerated on Graphics Hardware using CUDA
ElastiFace: Matching and Blending Textured Faces
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Electric polarizability of hadrons with overlap fermions on multi-GPUs
Electric potential and field calculation of charged BEM triangles and rectangles by Gaussian cubature
Electrical distribution grid visualization using programmable GPUs
Electrical-Level Attacks on CPUs, FPGAs, and GPUs: Survey and Implications in the Heterogeneous Era
Electromagnetic Computation and Visualization of Transmission Particle Model and its Simulation Based on GPU
Electromagnetic effects in capacitively coupled plasma simulated with a PIC-MCC darwin code
Electromagnetic transient simulation of large-scale electrical power networks using graphics processing units
Elementary functions: towards automatically generated, efficient, and vectorizable implementations
Elevation-based MRF stereo implemented in real-time on a GPU
EM+TV for Reconstruction of Cone-beam CT with Curved Detectors using GPU
Embedded Ensemble Propagation for Improving Performance, Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures
Embedded real-time stereo estimation via Semi-Global Matching on the GPU
Embedded Software Synthesis using Heterogeneous Dataflow Models
Embedding GPU Computations in Hadoop
Embedding OpenCL in C++ for Expressive GPU Programming
Embedding OpenCL in GHC Haskell
Embracing Heterogeneity: Parallel Programming for Changing Hardware
Emerging technology about GPGPU
EMMA: an AMR cosmological simulation code with radiative transfer
EmoNets: Multimodal deep learning approaches for emotion recognition in video
Empirical analysis of a parallel data mining algorithm on a graphic processor
Empirical performance modeling of GPU kernels using active learning
Employ Bump Mapping to Enrich the 3D NPR Image
Employing Directive Based Compression Solutions on Accelerators Global Memory under OpenACC
Employing GPU Accelerators for Efficient Enforcement of Data Integrity in Outsourced Data
Employing OpenCL as a Standard Hardware Abstraction in a Distributed Embedded System: A Case Study
Empower Sequence Labeling with Task-Aware Neural Language Model
Empowering Visual Categorization With the GPU
Empty Space Skipping and Occlusion Clipping for Texture-based Volume Rendering
Enabling a High Throughput Real Time Data Pipeline for a Large Radio Telescope Array with GPUs
Enabling active storage on parallel I/O software stacks
Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems
Enabling Computational Dynamics in Distributed Computing Environments Using a Heterogeneous Computing Template
Enabling CP2K Application for Exascale Computing with Accelerators using OpenACC and OpenCL
Enabling Data Movement and Computation Pipelining in Deep Learning Compiler
Enabling Development of OpenCL Applications on FPGA platforms
Enabling Efficient Online Profiling of Homogeneous and Heterogeneous Multicore Systems
Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects
Enabling Energy-Efficient Analysis of Massive Neural Signals Using GPGPU
Enabling Energy-Efficient DNN Training on Hybrid GPU-FPGA Accelerators
Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments
Enabling full-speed random access to the entire memory on the A100 GPU
Enabling High Performance Computing in Cloud Infrastructure using rCUDA
Enabling High Performance Computing in Cloud Infrastructure using Virtualized GPUs
Enabling Inter-Machine Parallelism in High-Level Languages with SEJITS and MapReduce
Titles: 100
open PDFs: 96
packages: 14