Posts
Jul, 17
Heterogeneous Energy-aware Load Balancing for Industry 4.0 and IoT Environments
With the improvement of global infrastructure, Cyber-Physical Systems (CPS) have become an important component of Industry 4.0. Both the application as well as the machine work together to improve the task of interdependencies. Machine learning methods in CPS require the monitoring of computational algorithms, including adopting optimizations, fine-tuning cyber systems, improving resource utilization, as well […]
Jul, 17
Just-in-Time Compilation and Link-Time Optimization for OpenMP Target Offloading
Following the mass adoption of external accelerators for high performance computing, the overall performance of many applications has become increasingly dependent on relatively small accelerated kernels. As static analysis is fundamentally limited by dynamic values and external definitions, standard ahead-of-time compilation is not always sufficient to achieve the best performance. Furthermore, many users looking to […]
Jul, 17
The OpenMP Cluster Programming Model
Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages (e.g., C++ and CUDA), and specialized runtimes (e.g., Charm++ and Legion). On the other hand, task parallelism has shown to be an efficient […]
Jul, 17
High Performance Simulation for Scalable Multi-Agent Reinforcement Learning
Multi-agent reinforcement learning experiments and open-source training environments are typically limited in scale, supporting tens or sometimes up to hundreds of interacting agents. In this paper we demonstrate the use of Vogue, a high performance agent based model (ABM) framework. Vogue serves as a multi-agent training environment, supporting thousands to tens of thousands of interacting […]
Jul, 10
Design and Implementation of CNN-FPGA accelerator based on Open Computing Language
In a wide range of applications, convolutional neural networks (CNNs) have been widely used, including face and speech recognition, picture retrieval and classification, and automated driving. As a result, CNN accelerators have become a popular topic of discourse. CNN Accelerators Graphics processing units (GPU) are often employed in CNN accelerators, and they are referred to […]
Jul, 10
Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of […]
Jul, 10
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs
While parallelism remains the main source of performance, architectural implementations and programming models change with each new hardware generation, often leading to costly application re-engineering. Most tools for performance portability require manual and costly application porting to yet another programming model. We propose an alternative approach that automatically translates programs written in one programming model […]
Jul, 10
DarKnight: An Accelerated Framework for Privacy and Integrity Preserving Deep Learning Using Trusted Hardware
Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train or infer with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. Tackling such a […]
Jul, 10
FPGA Implementation of Bluetooth Low Energy Physical Layer with OpenCL
This dissertation is primarily presenting the design of Digital Signal Processing (DSP) between the transmission in Bluetooth Low Energy Physical Layer (BLE PHY), and its implementation in a Field Programmable Gate Array (FPGA) device with Open Computing Language (OpenCL). During the design of DSP, it bases on the In-Phase/Quadrature-Phase (IQ) architecture to construct the modulation […]
Jul, 3
Novel Parallel Approaches to Efficiently Solve Spatial Problems on Heterogeneous CPU-GPU Systems
In recent years, approaches that seek to extract valuable information from large datasets have become particularly relevant in today’s society. In this category, we can highlight those problems that comprise data analysis distributed across two-dimensional scenarios called spatial problems. These usually involve processing (i) a series of features distributed across a given plane or (ii) […]
Jul, 3
Evaluation of Intel’s DPC++ Compatibility Tool in heterogeneous computing
The Intel DPC++ Compatibility Tool is a component of the Intel oneAPI Base Toolkit. This tool automatically transforms CUDA code into Data Parallel C++ (DPC++), thus assisting in the migration process. DPC++ is an implementation of the programming standard for heterogeneous computing known as SYCL, which unifies the development of parallel applications on CPUs, GPUs […]
Jul, 3
Optimizing the Performance of Parallel and Concurrent Applications Based on Asynchronous Many-Task Runtimes
Nowadays, High-performance Computing (HPC) scientific applications often face performance challenges when running on heterogeneous supercomputers, so do scalability, portability, and efficiency issues. For years, supercomputer architectures have been rapidly changing and becoming more complex, and this challenge will become even more complicated as we enter the exascale era, where computers will exceed one quintillion calculations […]