24498

Posts

Feb, 7

Dependable Embedded Systems

This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus […]
Feb, 7

triSYCL for Xilinx FPGA

Khronos SYCL is a C++ based open-source specification that aims to increase the programmability of heterogeneous architectures. Several SYCL implementations exist, with variations both in terms of conformance to the specification; as well as in the range of hardware they target. Intel recently contributed the first open-source feature-complete SYCL implementation to the LLVM compiler project. […]
Feb, 7

Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach

Deep learning researchers and practitioners usually leverage GPUs to help train their deep neural networks (DNNs) faster. However, choosing which GPU to use is challenging both because (i) there are many options, and (ii) users grapple with competing concerns: maximizing compute performance while minimizing costs. In this work, we present a new practical technique to […]
Feb, 7

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine-tuning models pre-trained on a large corpus of data. However, our experiments show that even fine-tuning on models like BERT can take many hours when using GPUs. While prior work proposes limiting the number of layers that are […]
Feb, 7

Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks?

Graphics Processing Units (GPUs) are currently the dominating programmable architecture for Deep Learning (DL) accelerators. The adoption of Field Programmable Gate Arrays (FPGAs) in DL accelerators is however getting momentum. In this paper, we demonstrate that Direct Hardware Mapping (DHM) of a Convolutional Neural Network (CNN) on an embedded FPGA substantially outperforms a GPU implementation […]
Jan, 31

C-for-Metal: High Performance SIMD Programming on Intel GPUs

The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized. To close this performance gap […]
Jan, 31

Performance of CPU and GPU HPC Architectures for off-design aircraft simulation

This paper presents a detailed analysis of the relative performance and cost of GPU and CPU architectures for a full aircraft RANS simulation using the CFD code zCFD. Using Amazon Web Services as the platform, several generations of NVIDIA GPUs are assessed (T4, V100, and A100) and compared to x86 Intel Broadwell and Skylake CPUs. […]
Jan, 31

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

Dask is a popular parallel and distributed computing framework, which rivals Apache Spark to enable task-based scalable processing of big data. The Dask Distributed library forms the basis of this computing engine and provides support for adding new communication devices. It currently has two communication devices: one for TCP and the other for high-speed networks […]
Jan, 31

Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents

We present Text2Gestures, a transformer-based learning method to interactively generate emotive full-body gestures for virtual agents aligned with natural language text inputs. Our method generates emotionally expressive gestures by utilizing the relevant biomechanical features for body expressions, also known as affective features. We also consider the intended task corresponding to the text and the target […]
Jan, 31

CPU/GPU Code Acceleration on Heterogeneous Systems and Code Verification for CFD Applications

Computational Fluid Dynamics (CFD) applications usually involve intensive computations, which can be accelerated through using open accelerators, especially GPUs due to their common use in the scientific computing community. In addition to code acceleration, it is important to ensure that the code and algorithm are implemented numerically correctly, which is called code verification. This dissertation […]
Jan, 24

Easy and Efficient Agent-based Simulations with the OpenABL Language and Compiler

Agent-based simulations represent an effective scientific tool, with numerous applications from social sciences to biology, which aims to emulate or predict complex phenomena through a set of simple rules performed by multiple agents. To simulate a large number of agents with complex models, practitioners have developed high-performance parallel implementations, often specialized for particular scenarios and […]
Jan, 24

Performance Analysis and Improvement of Parallel Differential Evolution

Differential evolution (DE) is an effective global evolutionary optimization algorithm using to solve global optimization problems mainly in a continuous domain. In this field, researchers pay more attention to improving the capability of DE to find better global solutions, however, the computational performance of DE is also a very interesting aspect especially when the problem […]

* * *

* * *

* * *

HGPU group © 2010-2022 hgpu.org

All rights belong to the respective authors

Contact us: