Posts
May, 17
Task-Based Parallel Strategies for CFD Application in Heterogeneous CPU/GPU Resources
Parallel applications executing in contemporary heterogeneous clusters are complex to code and optimize. The task-based programming model is an alternative to handle the coding complexity. This model consists of splitting the problem domain into tasks with dependencies through a directed acyclic graph, and submit the set of tasks to a runtime scheduler that maps each […]
May, 17
Parallel Programming Models for Heterogeneous Many-Cores: A Survey
Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to match the underlying heterogeneous platform. In this article, we provide […]
May, 10
From Constraint Programming to Heterogeneous Parallelism
The scaling limitations of multi-core processor development have led to a diversification of the processor cores used within individual computers. Heterogeneous computing has become widespread, involving the cooperation of several structurally different processor cores. Central processor (CPU) cores are most frequently complemented with graphics processors (GPUs), which despite their name are suitable for many highly […]
May, 10
Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra
This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the computation phase. It uses the CPU to provide a first-pass re-organization of the matrix elements, allowing the FPGA to focus on the computation. […]
May, 10
Importance of Data Loading Pipeline in Training Deep Neural Networks
Training large-scale deep neural networks is a long, time-consuming operation, often requiring many GPUs to accelerate. In large models, the time spent loading data takes a significant portion of model training time. As GPU servers are typically expensive, tricks that can save training time are valuable.Slow training is observed especially on real-world applications where exhaustive […]
May, 10
Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels
Energy optimization is an increasingly important aspect of today’s high-performance computing applications. In particular, dynamic voltage and frequency scaling (DVFS) has become a widely adopted solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies manually to minimize energy consumption while […]
May, 10
Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning
For 35 years, ab initio molecular dynamics (AIMD) has been the method of choice for understanding complex materials and molecules at the atomic scale from first principles. However, most applications of AIMD are limited to systems with thousands of atoms due to the high computational complexity. We report that a machine learning-based molecular simulation protocol […]
May, 4
An Overview on the Latest Nature-Inspired and Metaheuristics-Based Image Registration Algorithms
The development of automated image registration (IR) methods is a well-known issue within the computer vision (CV) field and it has been largely addressed from multiple viewpoints. IR has been applied to a high number of real-world scenarios ranging from remote sensing to medical imaging, artificial vision, and computer-aided design. In the last two decades, […]
May, 3
Tools for GPU Computing – Debugging and Performance Analysis of Heterogenous HPC Applications
General purpose GPUs are now ubiquitous in high-end supercomputing. All but one (the Japanese Fugaku system, which is based on ARM processors) of the announced (pre-)exascale systems contain vast amounts of GPUs that deliver the majority of the performance of these systems. Thus, GPU programming will be a necessity for application developers using high-end HPC […]
May, 3
AutoParBench: A Unified Test Framework for OpenMP-based Parallelizers
This paper describes AutoParBench, a framework to test OpenMP-based automatic parallelization tools. The core idea of this framework is a common representation, called a "JSON snapshot", that normalizes the output produced by auto-parallelizers. By converting—automatically—this output to the common representation, AutoParBench lets us compare auto-parallelizers among themselves, and compare them semantically against a reference collection. […]
May, 3
Leveraging Data-Flow Information for Efficient Scheduling of Task-Parallel Programs on Heterogeneous Systems
Writing efficient programs for heterogeneous platforms is challenging: programmers must deal with multiple programming models, partition work for CPUs and accelerators with different compute capabilities, requiring different amounts of parallelism, and manage memory in multiple distinct address spaces. Consequently, programming models which only require expressing parallelism and data dependences can not only unburden the programmer […]
May, 3
Tools for Reduced Precision Computation: A Survey
The use of reduced precision to improve performance metrics such as computation latency and power consumption is a common practice in the embedded systems field. This practice is emerging as a new trend in High Performance Computing (HPC), especially when new error-tolerant applications are considered. However, standard compiler frameworks do not support automated precision customization, […]