Papers on hgpu.org (.txt-file)
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware

Dynamic Warp Resizing in High-Performance SIMT

Dynamic Workload Division in GPU-CPU Heterogeneous Systems

Dynamical heterogeneities as fingerprints of a backbone structure in Potts models

Dynamical simulations of extrasolar planetary systems with debris disks using a GPU accelerated N-body code

Dynamically Finding Optimal Kernel Launch Parameters for CUDA Programs

Dynamically Managed Data for CPU-GPU Architectures

Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators

Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms

DynaProg for Scala: A Scala DSL for Dynamic Programming on CPU and GPU

DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model

E-MOGA: A General Purpose Platform for Multi Objective Genetic Algorithm running on CUDA

E(A+M)PEC – An OpenCL Atomic and Molecular Plasma Emission Code For Interstellar Medium Simulations

E2C: A Visual Simulator to Reinforce Education of Heterogeneous Computing Systems

Early Application Experiences on a Modern GPU-Accelerated Arm-based HPC Platform

Early evaluation of directive-based GPU programming models for productive exascale computing

Early Experiences in Running Many-Task Computing Workloads on GPGPUs

Early Experiences Migrating CUDA codes to oneAPI

Early Experiences Running the 3D Stencil Jacobi Method in Intel Xeon Phi

Early experiences with the intel many integrated cores accelerated computing technology
Early Experiences With The OpenMP Accelerator Model

Early Results of Deep Learning on the Stampede2 Supercomputer

EASEA parallelization of tree-based Genetic Programming

EASEA: A Generic Optimization Tool for GPU Machines in Asynchronous Island Model

EASEA: specification and execution of evolutionary algorithms on GPGPU

Easy and Efficient Agent-based Simulations with the OpenABL Language and Compiler

Easy and Efficient Transformer: Scalable Inference Solution For large NLP mode

Easy-to-Use On-the-Fly Binary Program Acceleration on Many-Cores

EASYPAP: a Framework for Learning Parallel Programming

EasyPBR: A Lightweight Physically-Based Renderer

Ebb: A DSL for Physical Simluation on CPUs and GPUs

eccCL: parallelized GPU implementation of Ensemble Classifier Chains

ECM modeling and performance tuning of SpMV and Lattice QCD on A64FX

EcoG: A Power-Efficient GPU Cluster Architecture for Scientific Computing
Edge AI for Internet of Energy: Challenges and Perspectives

Edge coloring in unstructured CFD codes

Edge Stream Oriented LDPC Decoding
Edify 3D: Scalable High-Quality 3D Asset Generation

EDSSA: An Encoder-Decoder Semantic Segmentation Networks Accelerator on OpenCL-Based FPGA Platform

Effect And Analysis of Elastic Fidelity Computing On GPUs

Effect of GPU Communication-Hiding for SpMV Using OpenACC

Effective Dynamic Scheduling on Heterogeneous Multi/Manycore Desktop Platforms
Effective Extensible Programming: Unleashing Julia on GPUs

Effective GPU Sharing Under Compiler Guidance

Effective GPU Strategies for LU Decomposition

Effective Mapping of Grammatical Evolution to CUDA Hardware Model

Effective Multi-Modal Retrieval based on Stacked Auto-Encoders

Effective Parallelization of Non-bonded Interactions Kernel for Virtual Screening on GPUs

Effective Sparse Matrix Representation for the GPU Architectures

Effectiveness of GPGPU for Solving the Magnetohydrodynamics Equations Using the CIP-MOCCT Method

Effectiveness of program transformations and compilers for directive-based GPU programming models

Effects of Compiler Optimizations in OpenMP to CUDA Translation

Effects of compression on data intensive algorithms

Effects of Concurrency Techniques and Algorithm Performance: A Comparative Analysis of Single-Threaded, Multi-Threaded, and GPGPU Programming Techniques

Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU

Effects of Easy Hybrid Parallelization with CUDA for Numerical-Atomic-Orbital Density Functional Theory Calculation

Effects of GPU and CPU Loads on Performance of CUDA Applications

Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation

EFFEX: an embedded processor for computer vision based feature extraction

Efficacy of Images Versus Data Buffers: Optimizing Interactive Applications Utilizing OpenCL for Scientific Visualization

Efficent multiple pass, multiple output algorithms on the GPU

Efficiency analysis of a physical problem: Different parallel computational approaches for a dynamical integrator evolution

Efficiency Considerations of Cauchy Reed-Solomon Implementations on Accelerator and Multi-Core Platforms

Efficiency of general Krylov methods on GPUs – An experimental study

Efficiency of Parallelization of Neural Network Algorithm on Graphic Cards

Efficiency of the energy transfer in the Fenna-Matthews-Olson complex using hierarchical equations on graphics processing units

Efficiency without Tears: Securing Multilingual Programs with TRINITY

Efficient 2D Software Rendering

Efficient 3D Isotropic Volume Reconstruction Based On 2D Localized Ultrasound Images

Efficient 3D reconstruction of large-scale urban environments from street-level video

Efficient Acceleration of Mutual Information Computation for Nonrigid Registration using CUDA

Efficient Algorithm for RSA Text Encryption Using CUDA-C

Efficient Algorithms for Sorting on GPUs

Efficient algorithms for the realistic simulation of fluids

Efficient all-against-all protein similarity matrix computation using OpenCL

Efficient allocation of image recognition and LLM tasks on multi-GPU system

Efficient and Accurate Sound Propagation Using Adaptive Rectangular Decomposition

Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU

Efficient and Good Delaunay Meshes From Random Points

Efficient and High-quality Sparse Graph Coloring on the GPU

Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives

Efficient and portable multi-tasking for heterogeneous systems

Efficient and Quality Contouring Algorithms on the GPU
Efficient and Scalable k-Means on GPUs

Efficient and Scalable Parallel Zonal Statistics on Large-Scale Species Occurrence Data on GPUs

Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs

Efficient Approximate Visibility of Point Sets on the GPU
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores

Efficient Bayesian inference in stochastic chemical kinetic models using graphical processing units

Efficient bayesian multi-view deconvolution

Efficient Calculation of Pairwise Nonbonded Forces

Efficient Canny Edge Detection Using a GPU

Efficient code generation for hardware accelerators by refining partially specified implementation

Efficient Collision Detection and Physics-Based Deformation for Haptic Simulation with Local Spherical Hash

Efficient Communications in Training Large Scale Neural Networks

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Efficient computation of condition estimates for linear least squares problems

Titles: 100
open PDFs: 92
packages: 27
