Papers on hgpu.org (.txt-file)
Efficient Spatial Anti-Aliasing Rendering for Line Joins on Vector Maps

Efficient Spatial Binning on the GPU

Efficient spectral and pseudospectral algorithms for 3D simulations of whistler-mode waves in a plasma

Efficient Stack-less BVH Traversal for Ray Tracing

Efficient Static and Dynamic Memory Management Techniques for Multi-GPU Systems

Efficient stream reduction on the GPU

Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures

Efficient Surface Reconstruction From Noisy Data Using Regularized Membrane Potentials

Efficient SVM Training Using Parallel Primal-Dual Interior Point Method on GPU

Efficient Synchronization Primitives for GPUs

Efficient Target and Application Specific Selection and Ordering of Compiler Passes

Efficient Triangle and Quadrilateral Clipping within Shaders

Efficient Two-Level Preconditionined Conjugate Gradient Method on the GPU

Efficient Use of In-Game Ray-Tracing Techniques

Efficient Video Compression via Content-Adaptive Super-Resolution

Efficient Virtual Shadow Maps for Many Lights

Efficient visual hull computation for real-time 3D reconstruction using CUDA

Efficient Volume Rendering in CUDA Path Tracer

Efficient Wave Propagation in Discontinuous Media and Complex Geometry for Many-core Architectures

Efficient Weighted Histogramming on GPUs with CUDA

Efficient Workload Balancing on Heterogeneous GPUs using Mixed-Integer Non-Linear Programming

Efficient XML Path Filtering Using GPUs

Efficient, High-Quality Bayer Demosaic Filtering on GPUs

EfficientBioAI: Making Bioimaging AI Models Efficient in Energy, Latency and Representation

Efficiently Computing Tensor Eigenvalues on a GPU

Efficiently GPU-accelerating long kernel convolutions in 3-D DIRECT TOF PET reconstruction via a kernel decomposition scheme

Efficiently Mapping the AES Encryption Algorithm on GPUs

Efficiently Processing Large Relational Joins on GPUs

Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

Efficiently Using a CUDA-enabled GPU as Shared Resource

eGPU: A 750 MHz Class Soft GPGPU for FPGA

EIE: Efficient Inference Engine on Compressed Deep Neural Network

EigenCFA: accelerating flow analysis with GPUs

Eigentransport for efficient and accurate all-frequency relighting

Elastic deep learning in multi-tenant GPU cluster

Elastic pipeline: addressing GPU on-chip shared memory bank conflicts
Elastic stream cloud (ESC): A stream-oriented cloud computing platform for Rich Internet Application

Elastically Deformable Models based on the Finite Element Method Accelerated on Graphics Hardware using CUDA

ElastiFace: Matching and Blending Textured Faces

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Electric polarizability of hadrons with overlap fermions on multi-GPUs

Electric potential and field calculation of charged BEM triangles and rectangles by Gaussian cubature

Electrical distribution grid visualization using programmable GPUs

Electrical-Level Attacks on CPUs, FPGAs, and GPUs: Survey and Implications in the Heterogeneous Era

Electromagnetic Computation and Visualization of Transmission Particle Model and its Simulation Based on GPU

Electromagnetic effects in capacitively coupled plasma simulated with a PIC-MCC darwin code
Electromagnetic transient simulation of large-scale electrical power networks using graphics processing units

Elementary functions: towards automatically generated, efficient, and vectorizable implementations

Elevation-based MRF stereo implemented in real-time on a GPU

EM+TV for Reconstruction of Cone-beam CT with Curved Detectors using GPU

Embedded Ensemble Propagation for Improving Performance, Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures

Embedded real-time stereo estimation via Semi-Global Matching on the GPU

Embedded Software Synthesis using Heterogeneous Dataflow Models

Embedding GPU Computations in Hadoop

Embedding OpenCL in C++ for Expressive GPU Programming

Embedding OpenCL in GHC Haskell

Embracing Heterogeneity: Parallel Programming for Changing Hardware

Emerging technology about GPGPU
EMMA: an AMR cosmological simulation code with radiative transfer

EmoNets: Multimodal deep learning approaches for emotion recognition in video

Empirical analysis of a parallel data mining algorithm on a graphic processor

Empirical performance modeling of GPU kernels using active learning

Employ Bump Mapping to Enrich the 3D NPR Image
Employing Directive Based Compression Solutions on Accelerators Global Memory under OpenACC

Employing GPU Accelerators for Efficient Enforcement of Data Integrity in Outsourced Data

Employing OpenCL as a Standard Hardware Abstraction in a Distributed Embedded System: A Case Study

Empower Sequence Labeling with Task-Aware Neural Language Model

Empowering Visual Categorization With the GPU

Empty Space Skipping and Occlusion Clipping for Texture-based Volume Rendering

Enabling a High Throughput Real Time Data Pipeline for a Large Radio Telescope Array with GPUs

Enabling active storage on parallel I/O software stacks

Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems

Enabling Computational Dynamics in Distributed Computing Environments Using a Heterogeneous Computing Template

Enabling CP2K Application for Exascale Computing with Accelerators using OpenACC and OpenCL

Enabling Data Movement and Computation Pipelining in Deep Learning Compiler

Enabling Development of OpenCL Applications on FPGA platforms

Enabling Efficient Online Profiling of Homogeneous and Heterogeneous Multicore Systems

Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects

Enabling Energy-Efficient Analysis of Massive Neural Signals Using GPGPU

Enabling Energy-Efficient DNN Training on Hybrid GPU-FPGA Accelerators

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

Enabling full-speed random access to the entire memory on the A100 GPU

Enabling High Performance Computing in Cloud Infrastructure using rCUDA

Enabling High Performance Computing in Cloud Infrastructure using Virtualized GPUs

Enabling Inter-Machine Parallelism in High-Level Languages with SEJITS and MapReduce

Enabling multiple accelerator acceleration for Java/OpenMP

Enabling On-Device Smartphone GPU based Training: Lessons Learned

Enabling OpenCL on a Configurable, VLIW Chip-Multiprocessor

Enabling OpenMP Task Parallelism on Multi-FPGAs

Enabling OS Research by Inferring Interactions in the Black-Box GPU Stack

Enabling Profile Guided Optimizations (PGO) for Graphics

Enabling Quantum Computer Simulations on AMD GPUs: a HIP Backend for Google’s qsim

Enabling task-level scheduling on heterogeneous platforms

Enabling the use of Heterogeneous Computing for Bioinformatics

Enabling Traceability in an MDE Approach to Improve Performance of GPU Applications

Enabling Traceability in MDE to Improve Performance of GPU Applications

Encapsulated synchronization and load-balance in heterogeneous programming

Encrypting video and image streams using OpenCL code on-demand

Encrypting video streams using OpenCL code on-demand

Titles: 100
open PDFs: 96
packages: 11
