Posts
Sep, 13
Accelerating High-Order Stencils on GPUs
Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of proposed enhancements work well for high-order stencils, such […]
Sep, 13
HPX – The C++ Standard Library for Parallelism and Concurrency
The new challenges presented by exascale system architectures have resulted in difficulty achieving the desired scalability using traditional distributed-memory runtimes. Asynchronous many-task systems (AMT) are based on a new paradigm showing promise in addressing these challenges, providing application developers with a productive and performant approach to programming on next generation systems. HPX is a C++ […]
Sep, 13
GPA: A GPU Performance Advisor Based on Instruction Sampling
Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained suggestions at the kernel level, if any. In this paper, we describe GPA, a performance advisor for NVIDIA GPUs that suggests potential code optimization opportunities at a hierarchy of levels, including individual […]
Sep, 13
Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs
This paper surveys a range of methods to collect necessary performance data on Intel CPUs and NVIDIA GPUs for hierarchical Roofline analysis. As of mid-2020, two vendor performance tools, Intel Advisor and NVIDIA Nsight Compute, have integrated Roofline analysis into their supported feature set. This paper fills the gap for when these tools are not […]
Sep, 6
Performance portability through machine learning guided kernel selection in SYCL libraries
Automatically tuning parallel compute kernels allows libraries and frameworks to achieve performance on a wide range of hardware, however these techniques are typically focused on finding optimal kernel parameters for particular input sizes and parameters. General purpose compute libraries must be able to cater to all inputs and parameters provided by a user, and so […]
Sep, 6
Hash-Based Authentication Revisited in the Age of High-Performance Computers
Hash-based authentication is a widespread technique for protecting passwords in many modern software systems including databases. A hashing function is a one-way mathematical function that is used in various security contexts in this domain. In this paper, we revisit three popular hashing algorithms (MD5, SHA-1, and NTLM), that are considered weak or insecure. More specifically, […]
Sep, 6
Array-Oriented Languages and Polyhedral Compilation
Collection-oriented languages are characterised by their provision of a set of primitives which work on data in aggregate, and are intrinsically suited to parallelisation. Polyhedral compilation systems aim at optimising aggregate computations using an abstract, mathematical representation. However, they are typically run as optimisation passes not over high-level programmes representing aggregate computations, but over the […]
Sep, 6
A Distributed Architecture for Smart Recycling Using Machine Learning
Recycling is vital for a sustainable and clean environment. Developed and developing countries are both facing the problem of solid management waste and recycling issues. Waste classification is a good solution to separate the waste from the recycle materials. In this work, we propose a cloud based classification algorithm for automated machines in recycling factories […]
Sep, 6
Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA
Decision trees are machine learning models commonly used in various application scenarios. In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets due to their stringent data storage requirement. Online decision tree learning algorithms have been devised to tackle this problem by concurrently training with incoming samples […]
Aug, 30
Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer
JavaScript (JS) engine vulnerabilities pose significant security threats affecting billions of web browsers. While fuzzing is a prevalent technique for finding such vulnerabilities, there have been few studies that leverage the recent advances in neural network language models (NNLMs). In this paper, we present Montage, the first NNLM-guided fuzzer for finding JS engine vulnerabilities. The […]
Aug, 30
Machine Learning in Compilers: Past, Present and Future
Writing optimising compilers is difficult. The range of programs that may be presented to the compiler is huge and the systems on which they run are complex, heterogeneous, non-deterministic, and constantly changing. The space of possible optimisations is also vast, making it very hard for compiler writers to design heuristics that take all of these […]
Aug, 30
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
Specialized accelerators such as GPUs, TPUs, FPGAs, and custom ASICs have been increasingly deployed to train deep learning models. These accelerators exhibit heterogeneous performance behavior across model architectures. Existing schedulers for clusters of accelerators, which are used to arbitrate these expensive training resources across many users, have shown how to optimize for various multi-job, multi-user […]