29215

Posts

May, 20

Assessing Intel OneAPI capabilities and cloud-performance for heterogeneous computing

This work presents a performance-oriented study of a heterogeneous application developed with Intel OneAPI to solve two well-known diffusion problems: heat diffusion and image denoising. We have explored CPU+iGPU and CPU+FPGA schemes, applying dynamic load balancing and conducting experiments on Intel DevCloud. The results demonstrate that the CPU+iGPU scheme outperforms the execution times achieved by […]
May, 20

From GPUs to AI and quantum: three waves of acceleration in bioinformatics

The enormous growth in the amount of data generated by the life sciences is continuously shifting the field from model-driven science towards data-driven science. The need for efficient processing has led to the adoption of massively parallel accelerators such as graphics processing units (GPUs). Consequently, the development of bioinformatics methods nowadays often heavily depends on […]
May, 20

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach

GPU-based heterogeneous architectures are now commonly used in HPC clusters. Due to their architectural simplicity specialized for data-level parallelism, GPUs can offer much higher computational throughput and memory bandwidth than CPUs in the same generation do. However, as the available resources in GPUs have increased exponentially over the past decades, it has become increasingly difficult […]
May, 20

Predicting NVIDIA’s Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH Models

Forecasting stock prices remains a considerable challenge in financial markets, bearing significant implications for investors, traders, and financial institutions. Amid the ongoing AI revolution, NVIDIA has emerged as a key player driving innovation across various sectors. Given its prominence, we chose NVIDIA as the subject of our study.
May, 20

Workload Scheduling on Heterogeneous Devices

Hardware accelerators have become the backbone of many cloud and HPC workloads, but workloads tend to statically choose accelerators leaving devices unused while others are oversubscribed. We propose a holistic framework that allows a computational kernel to span across multiple devices on a node, as well as multiple applications being scheduled on the same node. […]
May, 12

Direct Numerical Simulation of Turbulence on Heterogenous Computer Systems: Architectures, Algorithms, and Applications

Direct numerical simulations (DNS) of turbulence have a virtually unbounded need for computing power. To carry out these simulations, software, computer architectures, and algorithms must operate as efficiently as possible to amortize the large computational cost. However, in a computing landscape increasingly incorporating heterogeneous computer systems, changes are necessary. In this thesis, we consider how […]
May, 12

Automated Deep Learning Optimization via DSL-Based Source Code Transformation

As deep learning models become increasingly bigger and more complex, it is critical to improve model training and inference efficiency. Though a variety of highly optimized libraries and packages (known as DL kernels) have been developed, it is tedious and time-consuming to figure out which kernel to use, where to use, and how to use […]
May, 12

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored […]
May, 12

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically cannot fully utilize all resources within a node/chip, co-scheduling (or co-locating) multiple programs with complementary resource requirements is a promising solution. Meanwhile, […]
May, 12

CuPBoP: Making CUDA a Portable Language

CUDA is designed speciically for NVIDIA GPUs and is not compatible with non-NVIDIA devices. Enabling CUDA execution on alternative backends could greatly beneit the hardware community by fostering a more diverse software ecosystem. To address the need for portability, our objective is to develop a framework that meets key requirements, such as extensive coverage, comprehensive […]
May, 5

A Survey of Deep Learning Library Testing Methods

In recent years, software systems powered by deep learning (DL) techniques have significantly facilitated people’s lives in many aspects. As the backbone of these DL systems, various DL libraries undertake the underlying optimization and computation. However, like traditional software, DL libraries are not immune to bugs, which can pose serious threats to users’ personal property […]
May, 5

Porting HPC Applications to AMD Instinct MI300A Using Unified Memory and OpenMP

AMD Instinct MI300A is the world’s first data center accelerated processing unit (APU) with memory shared between the AMD "Zen 4" EPYC cores and third generation CDNA compute units. A single memory space offers several advantages: i) it eliminates the need for data replication and costly data transfers, ii) it substantially simplifies application development and […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: