24468

Posts

Jan, 31

Performance of CPU and GPU HPC Architectures for off-design aircraft simulation

This paper presents a detailed analysis of the relative performance and cost of GPU and CPU architectures for a full aircraft RANS simulation using the CFD code zCFD. Using Amazon Web Services as the platform, several generations of NVIDIA GPUs are assessed (T4, V100, and A100) and compared to x86 Intel Broadwell and Skylake CPUs. […]
Jan, 31

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

Dask is a popular parallel and distributed computing framework, which rivals Apache Spark to enable task-based scalable processing of big data. The Dask Distributed library forms the basis of this computing engine and provides support for adding new communication devices. It currently has two communication devices: one for TCP and the other for high-speed networks […]
Jan, 31

CPU/GPU Code Acceleration on Heterogeneous Systems and Code Verification for CFD Applications

Computational Fluid Dynamics (CFD) applications usually involve intensive computations, which can be accelerated through using open accelerators, especially GPUs due to their common use in the scientific computing community. In addition to code acceleration, it is important to ensure that the code and algorithm are implemented numerically correctly, which is called code verification. This dissertation […]
Jan, 31

Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents

We present Text2Gestures, a transformer-based learning method to interactively generate emotive full-body gestures for virtual agents aligned with natural language text inputs. Our method generates emotionally expressive gestures by utilizing the relevant biomechanical features for body expressions, also known as affective features. We also consider the intended task corresponding to the text and the target […]
Jan, 24

Easy and Efficient Agent-based Simulations with the OpenABL Language and Compiler

Agent-based simulations represent an effective scientific tool, with numerous applications from social sciences to biology, which aims to emulate or predict complex phenomena through a set of simple rules performed by multiple agents. To simulate a large number of agents with complex models, practitioners have developed high-performance parallel implementations, often specialized for particular scenarios and […]
Jan, 24

Performance Analysis and Improvement of Parallel Differential Evolution

Differential evolution (DE) is an effective global evolutionary optimization algorithm using to solve global optimization problems mainly in a continuous domain. In this field, researchers pay more attention to improving the capability of DE to find better global solutions, however, the computational performance of DE is also a very interesting aspect especially when the problem […]
Jan, 24

Non-Parametric Adaptive Network Pruning

Popular network pruning algorithms reduce redundant information by optimizing hand-crafted parametric models, and may cause suboptimal performance and long time in selecting filters. We innovatively introduce non-parametric modeling to simplify the algorithm design, resulting in an automatic and efficient pruning approach called EPruner. Inspired by the face recognition community, we use a message passing algorithm […]
Jan, 24

Learning Massive Graph Embeddings on a Single Machine

We propose a new framework for computing the embeddings of large-scale graphs on a single machine. A graph embedding is a fixed length vector representation for each node (and/or edge-type) in a graph and has emerged as the de-facto approach to apply modern machine learning on graphs. We identify that current systems for learning the […]
Jan, 24

StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems

Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the general case of mapping directed acyclic graphs of heterogeneous stencil computations to spatial computing systems, assuming large input programs without an […]
Jan, 17

Instruments of Productivity for High Performance Computing

High performance computing (HPC) is now well established as the cornerstone for building and conducting software simulations in numerous scientific and industrial fields. The hardware complexity of supercomputers is steadily increasing, however, to deliver ever improved computing performance, causing the complexity of HPC application development to increase as well. As a result, the need for […]
Jan, 17

Implementation of Autoencoders with Systolic Arrays through OpenCL

In the world of algorithm acceleration and the implementation of deep neural networks’ recall phase, OpenCL based solutions have a clear tendency to produce perfectly adapted kernels in graphic processor unit (GPU) architectures. However, they fail to obtain the same results when applied to field-programmable gate array (FPGA) based architectures. This situation, along with an […]
Jan, 17

CFD code adaptation to the FPGA architecture

For the last years, we observe the intensive development of accelerated computing platforms. Although current trends indicate a well-established position of GPU devices in the HPC environment, FPGA (Field-Programmable Gate Array) aspires to be an alternative solution to offload the CPU computation. This paper presents a systematic adaptation of four various CFD (Computational Fluids Dynamic) […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org