Papers on hgpu.org (.txt-file)
Dynamic loop vectorization for executing OpenCL kernels on CPUs
Dynamic Memory Allocation for OpenCL
Dynamic Orchestration of Massively Data Parallel Execution
Dynamic Overset Grid Computations for CFD Applications on Graphics Processing Units
Dynamic Parallelism in GPU Optimized Barnes Hut Trees for Molecular Dynamics Simulations
Dynamic particle coupling for gpu-based fluid simulation
Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures
Dynamic Programming with CUDA – Part II
Dynamic real-time 4D cardiac MDCT image display using GPU-accelerated volume rendering
Dynamic Sampling and Rendering of Algebraic Point Set Surfaces
Dynamic Scheduling for Large-Scale Distributed-Memory Ray Tracing
Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters
Dynamic scheduling Monte-Carlo framework for multi-accelerator heterogeneous clusters
Dynamic Scheduling of Parallel Code for Heterogeneous Systems
Dynamic Self-Rescheduling of Tasks over a Heterogeneous Platform
Dynamic Shader Generation for Flexible Multi-Volume Visualization
Dynamic Sparse-Matrix Allocation on GPUs
Dynamic Task Parallelism with a GPU Work-Stealing Runtime System
Dynamic Task-Scheduling and Resource Management for GPU Accelerators in Medical Imaging
Dynamic Translation of Runtime Environments for Heterogeneous Computing
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
Dynamic Warp Resizing in High-Performance SIMT
Dynamic Workload Division in GPU-CPU Heterogeneous Systems
Dynamical heterogeneities as fingerprints of a backbone structure in Potts models
Dynamical simulations of extrasolar planetary systems with debris disks using a GPU accelerated N-body code
Dynamically Finding Optimal Kernel Launch Parameters for CUDA Programs
Dynamically Managed Data for CPU-GPU Architectures
Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators
Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms
DynaProg for Scala: A Scala DSL for Dynamic Programming on CPU and GPU
DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model
E-MOGA: A General Purpose Platform for Multi Objective Genetic Algorithm running on CUDA
E(A+M)PEC – An OpenCL Atomic and Molecular Plasma Emission Code For Interstellar Medium Simulations
E2C: A Visual Simulator to Reinforce Education of Heterogeneous Computing Systems
Early Application Experiences on a Modern GPU-Accelerated Arm-based HPC Platform
Early evaluation of directive-based GPU programming models for productive exascale computing
Early Experiences in Running Many-Task Computing Workloads on GPGPUs
Early Experiences Migrating CUDA codes to oneAPI
Early Experiences Running the 3D Stencil Jacobi Method in Intel Xeon Phi
Early experiences with the intel many integrated cores accelerated computing technology
Early Experiences With The OpenMP Accelerator Model
Early Results of Deep Learning on the Stampede2 Supercomputer
EASEA parallelization of tree-based Genetic Programming
EASEA: A Generic Optimization Tool for GPU Machines in Asynchronous Island Model
EASEA: specification and execution of evolutionary algorithms on GPGPU
Easy and Efficient Agent-based Simulations with the OpenABL Language and Compiler
Easy and Efficient Transformer: Scalable Inference Solution For large NLP mode
Easy-to-Use On-the-Fly Binary Program Acceleration on Many-Cores
EASYPAP: a Framework for Learning Parallel Programming
EasyPBR: A Lightweight Physically-Based Renderer
Ebb: A DSL for Physical Simluation on CPUs and GPUs
eccCL: parallelized GPU implementation of Ensemble Classifier Chains
ECM modeling and performance tuning of SpMV and Lattice QCD on A64FX
EcoG: A Power-Efficient GPU Cluster Architecture for Scientific Computing
Edge AI for Internet of Energy: Challenges and Perspectives
Edge coloring in unstructured CFD codes
Edge Stream Oriented LDPC Decoding
EDSSA: An Encoder-Decoder Semantic Segmentation Networks Accelerator on OpenCL-Based FPGA Platform
Effect And Analysis of Elastic Fidelity Computing On GPUs
Effect of GPU Communication-Hiding for SpMV Using OpenACC
Effective Dynamic Scheduling on Heterogeneous Multi/Manycore Desktop Platforms
Effective Extensible Programming: Unleashing Julia on GPUs
Effective GPU Sharing Under Compiler Guidance
Effective GPU Strategies for LU Decomposition
Effective Mapping of Grammatical Evolution to CUDA Hardware Model
Effective Multi-Modal Retrieval based on Stacked Auto-Encoders
Effective Parallelization of Non-bonded Interactions Kernel for Virtual Screening on GPUs
Effective Sparse Matrix Representation for the GPU Architectures
Effectiveness of GPGPU for Solving the Magnetohydrodynamics Equations Using the CIP-MOCCT Method
Effectiveness of program transformations and compilers for directive-based GPU programming models
Effects of Compiler Optimizations in OpenMP to CUDA Translation
Effects of compression on data intensive algorithms
Effects of Concurrency Techniques and Algorithm Performance: A Comparative Analysis of Single-Threaded, Multi-Threaded, and GPGPU Programming Techniques
Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU
Effects of Easy Hybrid Parallelization with CUDA for Numerical-Atomic-Orbital Density Functional Theory Calculation
Effects of GPU and CPU Loads on Performance of CUDA Applications
Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation
EFFEX: an embedded processor for computer vision based feature extraction
Efficacy of Images Versus Data Buffers: Optimizing Interactive Applications Utilizing OpenCL for Scientific Visualization
Efficent multiple pass, multiple output algorithms on the GPU
Efficiency analysis of a physical problem: Different parallel computational approaches for a dynamical integrator evolution
Efficiency Considerations of Cauchy Reed-Solomon Implementations on Accelerator and Multi-Core Platforms
Efficiency of general Krylov methods on GPUs – An experimental study
Efficiency of Parallelization of Neural Network Algorithm on Graphic Cards
Efficiency of the energy transfer in the Fenna-Matthews-Olson complex using hierarchical equations on graphics processing units
Efficiency without Tears: Securing Multilingual Programs with TRINITY
Efficient 2D Software Rendering
Efficient 3D Isotropic Volume Reconstruction Based On 2D Localized Ultrasound Images
Efficient 3D reconstruction of large-scale urban environments from street-level video
Efficient Acceleration of Mutual Information Computation for Nonrigid Registration using CUDA
Efficient Algorithm for RSA Text Encryption Using CUDA-C
Efficient Algorithms for Sorting on GPUs
Efficient algorithms for the realistic simulation of fluids
Efficient all-against-all protein similarity matrix computation using OpenCL
Efficient and Accurate Sound Propagation Using Adaptive Rectangular Decomposition
Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU
Efficient and Good Delaunay Meshes From Random Points
Titles: 100
open PDFs: 94
packages: 24