24495

Posts

Feb, 7

Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach

Deep learning researchers and practitioners usually leverage GPUs to help train their deep neural networks (DNNs) faster. However, choosing which GPU to use is challenging both because (i) there are many options, and (ii) users grapple with competing concerns: maximizing compute performance while minimizing costs. In this work, we present a new practical technique to […]
Feb, 7

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine-tuning models pre-trained on a large corpus of data. However, our experiments show that even fine-tuning on models like BERT can take many hours when using GPUs. While prior work proposes limiting the number of layers that are […]
Feb, 7

Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks?

Graphics Processing Units (GPUs) are currently the dominating programmable architecture for Deep Learning (DL) accelerators. The adoption of Field Programmable Gate Arrays (FPGAs) in DL accelerators is however getting momentum. In this paper, we demonstrate that Direct Hardware Mapping (DHM) of a Convolutional Neural Network (CNN) on an embedded FPGA substantially outperforms a GPU implementation […]
Jan, 31

C-for-Metal: High Performance SIMD Programming on Intel GPUs

The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized. To close this performance gap […]
Jan, 31

Performance of CPU and GPU HPC Architectures for off-design aircraft simulation

This paper presents a detailed analysis of the relative performance and cost of GPU and CPU architectures for a full aircraft RANS simulation using the CFD code zCFD. Using Amazon Web Services as the platform, several generations of NVIDIA GPUs are assessed (T4, V100, and A100) and compared to x86 Intel Broadwell and Skylake CPUs. […]
Jan, 31

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

Dask is a popular parallel and distributed computing framework, which rivals Apache Spark to enable task-based scalable processing of big data. The Dask Distributed library forms the basis of this computing engine and provides support for adding new communication devices. It currently has two communication devices: one for TCP and the other for high-speed networks […]
Jan, 31

CPU/GPU Code Acceleration on Heterogeneous Systems and Code Verification for CFD Applications

Computational Fluid Dynamics (CFD) applications usually involve intensive computations, which can be accelerated through using open accelerators, especially GPUs due to their common use in the scientific computing community. In addition to code acceleration, it is important to ensure that the code and algorithm are implemented numerically correctly, which is called code verification. This dissertation […]
Jan, 31

Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents

We present Text2Gestures, a transformer-based learning method to interactively generate emotive full-body gestures for virtual agents aligned with natural language text inputs. Our method generates emotionally expressive gestures by utilizing the relevant biomechanical features for body expressions, also known as affective features. We also consider the intended task corresponding to the text and the target […]
Jan, 24

Easy and Efficient Agent-based Simulations with the OpenABL Language and Compiler

Agent-based simulations represent an effective scientific tool, with numerous applications from social sciences to biology, which aims to emulate or predict complex phenomena through a set of simple rules performed by multiple agents. To simulate a large number of agents with complex models, practitioners have developed high-performance parallel implementations, often specialized for particular scenarios and […]
Jan, 24

Performance Analysis and Improvement of Parallel Differential Evolution

Differential evolution (DE) is an effective global evolutionary optimization algorithm using to solve global optimization problems mainly in a continuous domain. In this field, researchers pay more attention to improving the capability of DE to find better global solutions, however, the computational performance of DE is also a very interesting aspect especially when the problem […]
Jan, 24

Non-Parametric Adaptive Network Pruning

Popular network pruning algorithms reduce redundant information by optimizing hand-crafted parametric models, and may cause suboptimal performance and long time in selecting filters. We innovatively introduce non-parametric modeling to simplify the algorithm design, resulting in an automatic and efficient pruning approach called EPruner. Inspired by the face recognition community, we use a message passing algorithm […]
Jan, 24

Learning Massive Graph Embeddings on a Single Machine

We propose a new framework for computing the embeddings of large-scale graphs on a single machine. A graph embedding is a fixed length vector representation for each node (and/or edge-type) in a graph and has emerged as the de-facto approach to apply modern machine learning on graphs. We identify that current systems for learning the […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: