Posts
Mar, 15
Abstracting OpenCL for Multi-Application Workloads on CPU-FPGA Clusters
Field-programmable gate arrays (FPGAs) continue to see integration in data centres, where customized hardware accelerators provide improved performance for cloud workloads. However, existing programming models for such environments typically require a manual assignment of application tasks between CPUs and FPGA-based accelerators. Furthermore, coordinating the execution of tasks from multiple applications necessitates the use of a […]
Mar, 15
Automated test generation for OpenCL kernels using fuzzing and constraint solving
Graphics Processing Units (GPUs) are massively parallel processors offering performance acceleration and energy efficiency unmatched by current processors (CPUs) in computers. These advantages along with recent advances in the programmability of GPUs have made them attractive for general-purpose computations. Despite the advances in programmability, GPU kernels are hard to code and analyse due to the […]
Mar, 15
Data Movement Optimization for High-Performance Computing
Tuning codes to make efficient use of high-performance computing systems is known to be hard. Programmers have to schedule their computations to thousands of compute cores having the compute and data movement costs in mind. The necessary code transformations – for example, to overlap computation and inter-node communication – are well known. But the complex […]
Mar, 15
Towards Green Computing: A Survey of Performance and Energy Efficiency of Different Platforms using OpenCL
When considering different hardware platforms, not just the time-to-solution can be of importance but also the energy necessary to reach it. This is not only the case with battery powered and mobile devices but also with high-performance parallel cluster systems due to financial and practical limits on power consumption and cooling. Recent developments in hard- […]
Mar, 15
Performance and energy footprint assessment of FPGAs and GPUs on HPC systems using Astrophysics application
New challenges in Astronomy and Astrophysics (AA) are urging the need for a large number of exceptionally computationally intensive simulations. "Exascale" (and beyond) computational facilities are mandatory to address the size of theoretical problems and data coming from the new generation of observational facilities in AA. Currently, the High Performance Computing (HPC) sector is undergoing […]
Mar, 8
Solving convex optimization problems on FPGA using OpenCL
The application of accelerators in HPC applications has seen enormous growth in the last decade. In the field of HPC demands on throughput are steadily growing. Not all of the algorithms used have a clear HW architecture which performs the best. Our work explores the performance of different HW architectures in solving a convex optimization […]
Mar, 8
Portable and Performant GPU/Heterogeneous Asynchronous Many-Task Runtime System
Asynchronous many-task (AMT) runtimes are maturing as a model for computing simulations on a diverse range of architectures at large-scale. The Uintah AMT framework is driven by a philosophy of maintaining an application layer distinct from the underlying runtime while operating on an adaptive mesh grid. This model has enabled task developers to focus on […]
Mar, 8
ADWPNAS: Architecture-Driven Weight Prediction for Neural Architecture Search
How to discover and evaluate the true strength of models quickly and accurately is one of the key challenges in Neural Architecture Search (NAS). To cope with this problem, we propose an Architecture-Driven Weight Prediction (ADWP) approach for neural architecture search (NAS). In our approach, we first design an architecture-intensive search space and then train […]
Mar, 8
Fast Gunrock Subgraph Matching (GSM) on GPUs
In this paper, we propose a novel method, GSM (Gunrock Subgraph Matching), to compute graph matching (subgraph isomorphism) on GPUs. In contrast to previous approaches, GSM is BFS-based: possible matches are explored simultaneously in a breadth-first strategy and thus can be mapped onto GPUs in a massively parallel fashion. Our implementation on the Gunrock graph […]
Mar, 8
Inline Vector Compression for Computational Physics
A novel inline data compression method is presented for single-precision vectors in three dimensions. The primary application of the method is for accelerating computational physics calculations where the throughput is bound by memory bandwidth. The scheme employs spherical polar coordinates, angle quantisation, and a bespoke floating-point representation of the magnitude to achieve a fixed compression […]
Mar, 1
AvA: Accelerated Virtualization of Accelerators
Applications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore’s Law. These trends are in conflict: cloud applications run on virtual platforms, but existing virtualization techniques have not provided production-ready solutions for accelerators. As a result, cloud providers expose accelerators by dedicating physical […]
Mar, 1
Accelerating CNN on FPGA: An Implementation of MobileNet on FPGA
Convolutional Neural Network is a deep learning algorithm that brings revolutionary impact on computer vision area. One of its applications is image classification. However, problem exists in this algorithm that it involves huge number of operations and parameters, which limits its possibility in time and resource restricted embedded applications. MobileNet, a neural network that uses […]