26206

Posts

Feb, 6

SZx: an Ultra-fast Error-bounded Lossy Compressor for Scientific Datasets

Today’s scientific high performance computing (HPC) applications or advanced instruments are producing vast volumes of data across a wide range of domains, which introduces a serious burden on data transfer and storage. Error-bounded lossy compression has been developed and widely used in scientific community, because not only can it significantly reduce the data volumes but […]
Feb, 6

Porting OpenACC to OpenMP on heterogeneous systems

This documentation is designed for beginners in Graphics Processing Unit (GPU)-programming and who want to get familiar with OpenACC and OpenMP offloading models. Here we present an overview of these two programming models as well as of the GPU-architectures. Specifically, we provide some insights into the functionality of these models and perform experiments involving different […]
Feb, 6

GC3: An Optimizing Compiler for GPU Collective Communication

Machine learning models made up of millions or billions of parameters are often trained and served on large multi-GPU systems. As models grow in size and execute on more GPUs, the collective communications used in these applications becomes a bottleneck. Custom collective algorithms optimized for both particular network topologies and application specific communication patterns can […]
Jan, 30

Teaching Parallel Programming in Containers: Virtualization of a Heterogeneous Local Infrastructure

Providing parallel programming education is an emerging challenge, requires teaching approaches to further the learning process and a complex infrastructure to provide a suitable environment for the laboratory practical classes. Do not prioritize parallel programming requirements in future computing professionals learning can lead to a significant training gap, negatively impacting the efficient use of current […]
Jan, 30

Performance prediction of deep learning applications training in GPU as a service systems

Data analysts predict that the GPU as a Service (GPUaaS) market will grow from US$700 million in 2019 to $7 billion in 2025 with a compound annual growth rate of over 38% to support 3D models, animated video processing, and gaming. GPUaaS adoption will be also boosted by the use of graphics processing units (GPUs) […]
Jan, 30

GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration

Graph neural networks (GNNs) have recently exploded in popularity thanks to their broad applicability to ubiquitous graph-related problems such as quantum chemistry, drug discovery, and high energy physics. However, meeting demand for novel GNN models and fast inference simultaneously is challenging because of the gap between the difficulty in developing efficient FPGA accelerators and the […]
Jan, 30

Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU

In a general graph data structure like an adjacency matrix, when edges are homogeneous, the connectivity of two nodes can be sufficiently represented using a single bit. This insight has, however, not yet been adequately exploited by the existing matrix-centric graph processing frameworks. This work fills the void by systematically exploring the bit-level representation of […]
Jan, 30

Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

More and more HPC applications require fast and effective compression techniques to handle large volumes of data in storage and transmission. Not only do these applications need to compress the data effectively during simulation, but they also need to perform decompression efficiently for post hoc analysis. SZ is an error-bounded lossy compressor for scientific data, […]
Jan, 23

A tool set for random number generation on GPUs in R

We introduce the R package clrng which leverages the gpuR package and is able to generate random numbers in parallel on a Graphics Processing Unit (GPU) with the clRNG (OpenCL) library. Parallel processing with GPU’s can speed up computationally intensive tasks, which when combined with R, it can largely improve R’s downsides in terms of […]
Jan, 23

Reusing Auto-Schedules for Efficient DNN Compilation

Auto-scheduling is a process where a search algorithm automatically explores candidate schedules (program transformations) for a given tensor program on a given hardware platform to improve its performance. However this can be a very time consuming process, depending on the complexity of the tensor program, and capacity of the target device, with often many thousands […]
Jan, 23

Multi-hetero Acceleration by GPU and FPGA for Astrophysics Simulation on oneAPI Environment

GPU (Graphics Processing Unit) computing is one of the most popular accelerating methods for various high-performance computing applications. For scientific computations based on multi-physical phenomena, however, a single device solution on a GPU is insufficient, where the single timescale or degree of parallelism is not simply supported by a simple GPU-only solution. We have been […]
Jan, 23

NNP/MM: Fast molecular dynamics simulations with machine learning potentials and molecular mechanics

Parametric and non-parametric machine learning potentials have emerged recently as a way to improve the accuracy of bio-molecular simulations. Here, we present NNP/MM, an hybrid method integrating neural network potentials (NNPs) and molecular mechanics (MM). It allows to simulate a part of molecular system with NNP, while the rest is simulated with MM for efficiency. […]

* * *

* * *

* * *

HGPU group © 2010-2022 hgpu.org

All rights belong to the respective authors

Contact us: