24659

Posts

Feb, 28

Multi-GPU performance optimization of a computational fluid dynamics code using OpenACC

This article investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on multiple platforms. The article shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU.Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple […]
Feb, 28

BASEMENT v3: a modular freeware for river process modelling over multiple computational backends

Modelling river physical processes is of critical importance for flood protection, river management and restoration of riverine environments. Developments in algorithms and computational power have led to a wider spread of river simulation tools. However, the use of two-dimensional models can still be hindered by complexity in the setup and the high computational costs. Here […]
Feb, 28

GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and Python

As an increasing number of leadership-class systems embrace GPU accelerators in the race towards exascale, efficient communication of GPU data is becoming one of the most critical components of high-performance computing. For developers of parallel programming models, implementing support for GPU-aware communication using native APIs for GPUs such as CUDA can be a daunting task […]
Feb, 23

Using hardware performance counters to speed up autotuning convergence on GPUs

Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of hardware and specific data characteristics can be extremely challenging. The autotuning of performance-relevant source-code parameters allows for automatic optimization […]
Feb, 21

Optimal program variant generation for hybrid manycore systems

Field Programmable Gate Arrays promise to deliver superior energy efficiency in heterogeneous high performance computing, as compared to multicore CPUs and GPUs. The rate of adoption is however hampered by the relative difficulty of programming FPGAs. High-level synthesis tools such as Xilinx Vivado, Altera OpenCL or Intel’s HLS address a large part of the programmability […]
Feb, 21

A Newcomer In The PGAS World – UPC++ vs UPC: A Comparative Study

A newcomer in the Partitioned Global Address Space (PGAS) ‘world’ has arrived in its version 1.0: Unified Parallel C++ (UPC++). UPC++ targets distributed data structures where communication is irregular or fine-grained. The key abstractions are global pointers, asynchronous programming via RPC, futures and promises. UPC++ API for moving non-contiguous data and handling memories with different […]
Feb, 21

INSTA-YOLO: Real-Time Instance Segmentation

Instance segmentation has gained recently huge attention in various computer vision applications. It aims at providing different IDs to different objects of the scene, even if they belong to the same class. Instance segmentation is usually performed as a two-stage pipeline. First, an object is detected, then semantic segmentation within the detected box area is […]
Feb, 21

cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

Nonuniform fast Fourier transforms dominate the computational cost in many applications including image reconstruction and signal processing. We thus present a general-purpose GPU-based CUDA library for type 1 (nonuniform to uniform) and type 2 (uniform to nonuniform) transforms in dimensions 2 and 3, in single or double precision. It achieves high performance for a given […]
Feb, 21

A Survey of Machine Learning for Computer Architecture and Systems

It has been a long time that computer architecture and systems are optimized to enable efficient execution of machine learning (ML) algorithms or models. Now, it is time to reconsider the relationship between ML and systems, and let ML transform the way that computer architecture and systems are designed. This embraces a twofold meaning: the […]
Feb, 14

Serverless Computing Strategies on Cloud Platforms

With the development of Cloud Computing, the delivery of virtualized resources over the Internet has greatly grown in recent years. Functions as a Service (FaaS), one of the newest service models within Cloud Computing, allows the development and implementation of event-based applications that cover managed services in public and on-premises Clouds. Public Cloud Computing providers […]
Feb, 14

Optimization of Data Assignment for Parallel Processing in a Hybrid Heterogeneous Environment Using Integer Linear Programming

In the paper we investigate a practical approach to application of integer linear programming for optimization of data assignment to compute units in a multi-level heterogeneous environment with various compute devices, including CPUs, GPUs and Intel Xeon Phis. The model considers an application that processes a large number of data chunks in parallel on various […]
Feb, 14

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures

We have developed several autotuning benchmarks in CUDA that take into account performance-relevant source-code parameters and reach near peak-performance on various GPU architectures. We have used them during the development and evaluation of a novel search method for tuning space proposed in [1]. With our framework Kernel Tuning Toolkit, freely available at Github, we measured […]

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: