27501

Posts

Nov, 13

A Study on Neural-based Code Summarization in Low-resource Settings

Automated software engineering with deep learning techniques has been comprehensively explored because of breakthroughs in code representation learning. Many code intelligence approaches have been proposed for the downstream tasks of this field in the past years, contributing to significant performance progress. Code summarization has been the central research topic among these downstream tasks because of […]
Nov, 13

pyGSL: A Graph Structure Learning Toolkit

We introduce pyGSL, a Python library that provides efficient implementations of state-of-the-art graph structure learning models along with diverse datasets to evaluate them on. The implementations are written in GPU-friendly ways, allowing one to scale to much larger network tasks. A common interface is introduced for algorithm unrolling methods, unifying implementations of recent state-of-the-art techniques […]
Nov, 13

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

GPUs are essential to accelerating the latency-sensitive deep neural network (DNN) inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of GPUs among co-located DNN inference workloads becomes increasingly compelling. However, GPU sharing inevitably brings severe performance interference among co-located inference workloads, as motivated by an empirical measurement study of DNN inference […]
Nov, 13

Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI

We assess the performance of the hybrid Open Accelerator (OpenACC) and Message Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated thermal lattice Boltzmann (LB) simulation. The OpenACC accelerates computation on a single GPU, and the MPI synchronizes the information between multiple GPUs. With a single GPU, the two-dimension (2D) simulation achieved 1.93 billion […]
Nov, 6

Enabling Data Movement and Computation Pipelining in Deep Learning Compiler

Pipelining between data loading and computation is a critical tensor program optimization for GPUs. Multi-stage pipelining across the multi-level buffer hierarchy of GPU is particularly indispensable on the latest NVIDIA Ampere GPUs to reduce resource idleness and guarantee kernel performance. Currently, people rely on libraries written by experts such as cuBLAS to access the pipelining […]
Nov, 6

Using scheduling entropy amplification in CUDA/OpenMP code to exhibit non-reproducibility issues

Rounding error or cancellation that appears with each floating-point operations, combined with the lack of control over execution order in parallel code leads to numerical issues such as numerical reproducibility. In order to enhance the possibility to discover such numerical issue, in this article we propose a simple solution base on an index interposer and […]
Nov, 6

An Open-source FPGA Library for Data Sorting

Field-programmable gate arrays (FPGAs) have garnered significant interest in research on high-performance computing because their flexibility enables the building of application-specific computation pipelines and data supply systems. In addition to the flexibility, toolchains for the development of FPGAs in OpenCL have been developed and offered by FPGA vendors that reduce the programming effort required. However, […]
Nov, 6

Apple Silicon Performance in Scientific Computing

With the release of the Apple Silicon System-on-a-Chip processors, and the impressive performance shown in general use by both the M1 and M1 Ultra, the potential use for Apple Silicon processors in scientific computing is explored. Both the M1 and M1 Ultra are compared to current state-of-the-art data-center GPUs, including an NVIDIA V100 with PCIe, […]
Nov, 6

The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in Materials Science

We present the Open MatSci ML Toolkit: a flexible, self-contained, and scalable Python-based framework to apply deep learning models and methods on scientific data with a specific focus on materials science and the OpenCatalyst Dataset. Our toolkit provides: 1. A scalable machine learning workflow for materials science leveraging PyTorch Lightning, which enables seamless scaling across […]
Oct, 30

A systematic performance study of the parallel programming framework SkePU 3 using HPC-benchmarks

With hardware performance no longer following Moore’s law, software optimization becomes more important. In this paper, we discuss parallel programming, which is one way to optimize software. However, writing parallel code is considered more difficult than writing sequential code. There is often a specific framework to be used to write parallel code for each type […]
Oct, 30

Benchmarking GPU and TPU Performance with Graph Neural Networks

Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly optimized for dense data representations. However, sparse representations such as graphs are prevalent in many domains, including science. It […]
Oct, 30

gSuite: A Flexible and Framework Independent Benchmark Suite for Graph Neural Network Inference on GPUs

As the interest to Graph Neural Networks (GNNs) is growing, the importance of benchmarking and performance characterization studies of GNNs is increasing. So far, we have seen many studies that investigate and present the performance and computational efficiency of GNNs. However, the work done so far has been carried out using a few high-level GNN […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: