21297

Posts

May, 24

SYCL-Bench: A Versatile Single-Source Benchmark Suite for Heterogeneous Computing

SYCL is a royalty-free open standard from the Khronos group that enables heterogeneous programming for a broad range of parallel devices, including multicore CPUs, GPUs, and FPGAs. SYCL relies on pure C++, without any additional attributes or pragmas. While SYCL kernels follow a data-parallel model, they are implicitly organized in a task graph built by […]
May, 24

PDAWL: Profile-based Iterative Dynamic Adaptive WorkLoad Balance on Heterogeneous Architectures

While High Performance Computing systems are increasingly based on heterogeneous cores, their effectiveness depends on how well the scheduler can allocate workloads onto appropriate computing devices and how communication and computation can be overlapped. With different types of resources integrated into one system, the complexity of the scheduler correspondingly increases. Moreover, for applications with varying […]
May, 24

Literature Review and Implementation Overview: High Performance Computing with Graphics Processing Units for Classroom and Research Use

In this report, I discuss the history and current state of GPU HPC systems. Although high-power GPUs have only existed a short time, they have found rapid adoption in deep learning applications. I also discuss an implementation of a commodity-hardware NVIDIA GPU HPC cluster for deep learning research and academic teaching use.
May, 24

BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages

We introduce BlaBla, an open-source Python library for extracting linguistic features with proven clinical relevance to neurological and psychiatric diseases across many languages. BlaBla is a unifying framework for accelerating and simplifying clinical linguistic research. The library is built on state-of-the-art NLP frameworks and supports multithreaded/GPU-enabled feature extraction via both native Python calls and a […]
May, 17

Deep Learning with GO

Current research in deep learning is primarily focused on using Python as a support language. Go, an emerging language, that has many benefits including native support for concurrency has seen a rise in adoption over the past few years. However, this language is not widely used to develop learning models due to the lack of […]
May, 17

Employing OpenCL as a Standard Hardware Abstraction in a Distributed Embedded System: A Case Study

The open computing language (OpenCL) is a standard open source specification for parallel computing on heterogeneous architectures. OpenCL offers a set of abstract models for substantial acceleration in parallel computing and is supported by most of the leading hardware vendors. In this paper, we present a systematic approach for employing OpenCL as a hardware abstraction […]
May, 17

Heterogeneous CPU/GPU co-execution of CFD simulations on the POWER9 architecture: Application to airplane aerodynamics

High fidelity Computational Fluid Dynamics simulations are generally associated with large computing requirements, which are progressively acute with each new generation of supercomputers. However, significant research efforts are required to unlock the computing power of leading-edge systems, currently referred to as pre-Exascale systems, based on increasingly complex architectures. In this paper, we present the approach […]
May, 17

Task-Based Parallel Strategies for CFD Application in Heterogeneous CPU/GPU Resources

Parallel applications executing in contemporary heterogeneous clusters are complex to code and optimize. The task-based programming model is an alternative to handle the coding complexity. This model consists of splitting the problem domain into tasks with dependencies through a directed acyclic graph, and submit the set of tasks to a runtime scheduler that maps each […]
May, 17

Parallel Programming Models for Heterogeneous Many-Cores: A Survey

Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to match the underlying heterogeneous platform. In this article, we provide […]
May, 10

From Constraint Programming to Heterogeneous Parallelism

The scaling limitations of multi-core processor development have led to a diversification of the processor cores used within individual computers. Heterogeneous computing has become widespread, involving the cooperation of several structurally different processor cores. Central processor (CPU) cores are most frequently complemented with graphics processors (GPUs), which despite their name are suitable for many highly […]
May, 10

Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra

This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the computation phase. It uses the CPU to provide a first-pass re-organization of the matrix elements, allowing the FPGA to focus on the computation. […]
May, 10

Importance of Data Loading Pipeline in Training Deep Neural Networks

Training large-scale deep neural networks is a long, time-consuming operation, often requiring many GPUs to accelerate. In large models, the time spent loading data takes a significant portion of model training time. As GPU servers are typically expensive, tricks that can save training time are valuable.Slow training is observed especially on real-world applications where exhaustive […]

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: