high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Embedded Ensemble Propagation for Improving Performance, Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures

Embedded real-time stereo estimation via Semi-Global Matching on the GPU

Embedded Software Synthesis using Heterogeneous Dataflow Models

Embedding GPU Computations in Hadoop

Embedding OpenCL in C++ for Expressive GPU Programming

Embedding OpenCL in GHC Haskell

Embracing Heterogeneity: Parallel Programming for Changing Hardware

Emerging technology about GPGPU

EMMA: an AMR cosmological simulation code with radiative transfer

EmoNets: Multimodal deep learning approaches for emotion recognition in video

Empirical analysis of a parallel data mining algorithm on a graphic processor

Empirical performance modeling of GPU kernels using active learning

Employ Bump Mapping to Enrich the 3D NPR Image

Employing Directive Based Compression Solutions on Accelerators Global Memory under OpenACC

Employing GPU Accelerators for Efficient Enforcement of Data Integrity in Outsourced Data

Employing OpenCL as a Standard Hardware Abstraction in a Distributed Embedded System: A Case Study

Empower Sequence Labeling with Task-Aware Neural Language Model

Empowering Visual Categorization With the GPU

Empty Space Skipping and Occlusion Clipping for Texture-based Volume Rendering

Enabling a High Throughput Real Time Data Pipeline for a Large Radio Telescope Array with GPUs

Enabling active storage on parallel I/O software stacks

Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems

Enabling Computational Dynamics in Distributed Computing Environments Using a Heterogeneous Computing Template

Enabling CP2K Application for Exascale Computing with Accelerators using OpenACC and OpenCL

Enabling Data Movement and Computation Pipelining in Deep Learning Compiler

Enabling Development of OpenCL Applications on FPGA platforms

Enabling Efficient Online Profiling of Homogeneous and Heterogeneous Multicore Systems

Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects

Enabling Energy-Efficient Analysis of Massive Neural Signals Using GPGPU

Enabling Energy-Efficient DNN Training on Hybrid GPU-FPGA Accelerators

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

Enabling full-speed random access to the entire memory on the A100 GPU

Enabling High Performance Computing in Cloud Infrastructure using rCUDA

Enabling High Performance Computing in Cloud Infrastructure using Virtualized GPUs

Enabling Inter-Machine Parallelism in High-Level Languages with SEJITS and MapReduce

Enabling multiple accelerator acceleration for Java/OpenMP

Enabling New Uses for GPUs

Enabling On-Device Smartphone GPU based Training: Lessons Learned

Enabling OpenCL on a Configurable, VLIW Chip-Multiprocessor

Enabling OpenMP Task Parallelism on Multi-FPGAs

Enabling OS Research by Inferring Interactions in the Black-Box GPU Stack

Enabling Quantum Computer Simulations on AMD GPUs: a HIP Backend for Google’s qsim

Enabling task-level scheduling on heterogeneous platforms

Enabling the use of Heterogeneous Computing for Bioinformatics

Enabling Traceability in an MDE Approach to Improve Performance of GPU Applications

Enabling Traceability in MDE to Improve Performance of GPU Applications

Encapsulated synchronization and load-balance in heterogeneous programming

Encrypting video and image streams using OpenCL code on-demand

Encrypting video streams using OpenCL code on-demand

End-to-end data reduction and hardware accelerated rendering techniques for visualizing time-varying non-uniform grid volume data

End-to-end Deep Learning of Optimization Heuristics

End-to-end Mapping in Heterogeneous Systems Using Graph Representation Learning

End-to-end Optimization of Machine Learning Prediction Queries

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models

Energy Auto-tuning using the Polyhedral Approach

Energy conservation techniques for GPU computing

Energy Consumption of Algorithms for Solving the Compressible Navier-Stokes Equations on CPU’s, GPU’s and KNL’s

Energy consumption of Graphic Processing Units with respect to automotive use-cases

Energy Efficiency Analysis of GPUs

Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler GPU Architecture

Energy efficiency of finite difference algorithms on multicore CPUs, GPUs, and Intel Xeon Phi processors

Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms

Energy Efficiency Studies of Mont Blanc Applications

Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster

Energy efficient biomolecular simulations with FPGA-based reconfigurable computing

Energy Efficient Computing on Multi-core Processors: Vectorization and Compression Techniques

Energy Efficient Parallel K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi

Energy Transfer Ray Tracing with OptiX

Energy-and cost-efficient Lattice-QCD computations using graphics processing units

Energy-aware metrics for benchmarking heterogeneous systems

Energy-aware Task Scheduling with Deadline Constraint in DVFS-enabled Heterogeneous Clusters

Energy-based Tuning of Convolutional Neural Networks on Multi-GPUs

Energy-efficient algorithms

Energy-Efficient Collective Reduce and Allreduce Operations on Distributed GPUs

Energy-efficient computing for extreme-scale science

Energy-efficient Computing on Distributed GPUs using Dynamic Parallelism and GPU-controlled Communication

Energy-Efficient Execution of Data-Parallel Applications on Heterogeneous Mobile Platforms

Energy-Efficient FPGA Implementation for Binomial Option Pricing Using OpenCL

Energy-efficient FPGA Implementation of the k-Nearest Neighbors Algorithm Using OpenCL

Energy-Efficient GPU Clusters Scheduling for Deep Learning

Energy-efficient mechanisms for managing thread context in throughput processors

Energy-optimized mapping of application to smartphone platform – A case study of mobile face recognition

Energy-saving techniques for low-power graphics processing unit

EngineCL: Usability and Performance in Heterogeneous Computing

Engineering a static verification tool for GPU kernels

Engineering Concurrent Software Guided by Statistical Performance Analysis

Engineering of Computer Vision Algorithms Using Evolutionary Algorithms

Enhanced implementation of the NTRUEncrypt algorithm using graphics cards

Enhanced molecular dynamics performance with a programmable graphics processor

Enhanced Parallel ILU (p)-based Preconditioners for Multi-core CPUs and GPUs-The Power (g)-pattern Method

Enhanced Parallel NegaMax Tree Search Algorithm on GPU

Brief statistics for this page

Titles: 100

Download open PDFs: 94

Package packages: 10

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Efficient deep learning inference on end devices

Ouroboros: Virtualized Queues for dynamic memory management

Dynamic Memory Management on GPUs with SYCL

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)