high performance computing on graphics processing units: hgpu.org

Posts

Jul, 12

GE-SpMM: General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks

Graph Neural Networks (GNNs) have achieved significant improvements in various domains. Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operator in GNNs, which performs a multiplication between a sparse matrix and a dense matrix. Accelerating SpMM on parallel hardware like GPUs can face the following challenges: From the GNN application perspective, the compatibility needs to be […]

CUDA

Jul, 5

SYCL-Bench: A Versatile Cross-Platform Benchmark Suite for Heterogeneous Computing

The SYCL standard promises to enable high productivity in heterogeneous programming of a broad range of parallel devices, including multicore CPUs, GPUs, and FPGAs. Its modern and expressive C++ API design, as well as flexible task graph execution model give rise to ample optimization opportunities at run-time, such as the overlapping of data transfers and […]

OpenCL

Jul, 5

Studies on CUDA Offloading for Real-Time Simulation and Visualization

The Graphics Processing Unit (GPU) is a co-processor designed to aid the Central Processing Unit (CPU) for rendering 3D graphics. The prompt development of these graphics chips due to the popularity of games and media design helped the GPU to evolve its ubiquitous parallel architecture. The programmability of these devices increased with the introduction of […]

CUDA

Jul, 5

Efficient Matrix Factorization on Heterogeneous CPU-GPU Systems

Matrix Factorization (MF) has been widely applied in machine learning and data mining. A large number of algorithms have been studied to factorize matrices. Among them, stochastic gradient descent (SGD) is a commonly used method. Heterogeneous systems with multi-core CPUs and GPUs have become more and more promising recently due to the prevalence of GPUs […]

CUDA

Jul, 5

Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity

Despite numerous efforts for optimizing the performance of Sparse Matrix and Vector Multiplication (SpMV) on modern hardware architectures, few works are done to its sparse counterpart, Sparse Matrix and Sparse Vector Multiplication (SpMSpV), not to mention dealing with input vectors of varied sparsity. The key challenge is that depending on the sparsity levels, distribution of […]

CUDA

Jul, 5

Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing

In this paper, we present Ginkgo, a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, Ginkgo’s design principle abstracts all functionality as "linear operators", motivating the notation of a "linear operator algebra library". Ginkgo’s current focus is oriented towards providing sparse linear algebra […]

CUDA

Jun, 30

The Fifth International Workshop on GPU Computing and AI (GCA), 2020

==================================================== The Fifth International Workshop on GPU Computing and AI (GCA’20) to be held in conjunction with The Eighth International Symposium on Computing and Networking (CANDAR’20),Naha, Okinawa, Japan, November 24-27, 2020 ==================================================== Special announcement regarding COVID-19 situation– Although we are still working with the possibility of having physical meetings for CANDAR 2020 as planned, the […]

Jun, 30

2nd International Conference on Frontiers of Intelligent Manufacturing and Automation (CFIMA’21), 2021

★ 2021 2nd International Conference on Frontiers of Intelligent Manufacturing and Automation (CFIMA 2021) — Ei Compendex & Scopus — Call for paperJanuary 13-15, 2021｜Shenzhen, China ★ Venue: Hilton Garden Inn Shenzhen Bao’an, Shenzhen, ChinaJust off G107, the Hilton Garden Inn Shenzhen Bao’an is inside the Xihuicheng Shopping Plaza in the center of the Bao’an […]

Jun, 30

International Conference on Cyber Physical Systems and IoT (CPSIOT’20), 2020

★ 2020 International Conference on Cyber Physical Systems and IoT(CPSIOT 2020) — Ei Compendex & Scopus — Call for papersDecember 14-16, 2020｜Bangkok, Thailand ★ CPSIOT 2020 presents researchers, engineers, and academics with an unprecedented opportunity to associate and interact with some of the foremost experts in the field of Cyber Physical Systems and IoT from […]

Jun, 30

International Conference on Wireless Networks and Embedded Systems (ICWNES’20), 2020

★ 2020 International Conference on Wireless Networks and Embedded Systems (ICWNES 2020) — Ei Compendex & Scopus — Call for paperDecember 14-16, 2020｜Bangkok, Thailand ★ Researchers, scientists, engineers and industry professionals will join together this year at ICWNES 2020, where the latest research will be unveiled and groundbreaking research projects will be presented. The field […]

Jun, 30

International Joint Conference on Signals, Systems and Computers (CSSC’20), 2020

★ 2020 International Joint Conference on Signals, Systems and Computers (CSSC 2020) — Ei Compendex & Scopus — Call for papersDecember 14-16, 2020｜Bangkok, Thailand ★ 2020 International Joint Conference on Signals, Systems and Computers (CSSC 2020) will be held in Bangkok, Thailand. From keynote lectures by internationally recognized academics and leading experts, to a forum […]

Jun, 28

Performance benchmarking of deep learning framework on Intel Xeon Phi

With the success of deep learning (DL) methods in diverse application domains, several deep learning software frameworks have been proposed to facilitate the usage of these methods. By knowing the frameworks which are employed in big data analysis, the analysis process will be more efficient in terms of time and accuracy. Thus, benchmarking DL software […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GE-SpMM: General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks

SYCL-Bench: A Versatile Cross-Platform Benchmark Suite for Heterogeneous Computing

Studies on CUDA Offloading for Real-Time Simulation and Visualization

Efficient Matrix Factorization on Heterogeneous CPU-GPU Systems

Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity

Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing

The Fifth International Workshop on GPU Computing and AI (GCA), 2020

2nd International Conference on Frontiers of Intelligent Manufacturing and Automation (CFIMA’21), 2021

International Conference on Cyber Physical Systems and IoT (CPSIOT’20), 2020

International Conference on Wireless Networks and Embedded Systems (ICWNES’20), 2020

International Joint Conference on Signals, Systems and Computers (CSSC’20), 2020

Performance benchmarking of deep learning framework on Intel Xeon Phi

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)