Posts
Jun, 14
The Rodinia Benchmark Suite in SYCL
We apply the SYCL programming model to the Rodinia benchmark suite, describe the transformations from the OpenCL implementations to the SYCL implementations, and evaluate the benchmarks on microprocessors with a CPU and an integrated GPU. The publicly available implementations of the benchmark suite will track the development of the SYCL compilers, and provide programs for […]
Jun, 7
OpenABLext: An automatic code generation framework for agent-based simulations on CPU-GPU-FPGA heterogeneous platforms
The execution of agent-based simulations (ABSs) on hardware accelerator devices such as graphics processing units (GPUs) has been shown to offer great performance potentials. However, in heterogeneous hardware environments, it can become increasingly difficult to find viable partitions of the simulation and provide implementations for different hardware devices. To automate this process, we present OpenABLext, […]
Jun, 7
Investigating Single Precision Floating General Matrix Multiply in Heterogeneous
The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect […]
May, 31
Evaluating the performance of HPC-style SYCL applications
SYCL is a parallel programming model for developing single-source programs for running on heterogeneous platforms. To this end, it allows for one code to be written which can run on a different architectures. For this study, we develop applications in SYCL which are representative of those often used in High-Performance Computing. Their performance is benchmarked […]
May, 24
HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy
The pervasive adoption of Deep Learning (DL) and Graph Processing (GP) makes it a de facto requirement to build large-scale clusters of heterogeneous accelerators including GPUs and FPGAs. The OpenCL programming framework can be used on the individual nodes of such clusters but is not intended for deployment in a distributed manner. Fortunately, the original […]
May, 10
Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels
Energy optimization is an increasingly important aspect of today’s high-performance computing applications. In particular, dynamic voltage and frequency scaling (DVFS) has become a widely adopted solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies manually to minimize energy consumption while […]
Apr, 26
Automatic Parallelization for Heterogeneous Embedded Systems
Recent years have seen an increase of heterogeneous architectures combining multi-core CPUs with accelerators such as GPU, FPGA, and Intel Xeon Phi. GPU can achieve significant performance for certain categories of application. Nevertheless, achieving this performance with low-level APIs (e.g. CUDA, OpenCL) requires to rewrite the sequential code, to have a good knowledge of GPU […]
Apr, 19
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System
Tensor computation plays a paramount role in a broad range of domains, including machine learning, data analytics, and scientific computing. The wide adoption of tensor computation and its huge computation cost has led to high demand for flexible, portable, and high-performance library implementation on heterogeneous hardware accelerators such as GPUs and FPGAs. However, the current […]
Apr, 5
Deep Learning for Compilers
Constructing compilers is hard. Optimising compilers are multi-million dollar projects spanning years of development, yet remain unable to fully exploit the available performance, and are prone to bugs. The rapid transition to heterogeneous parallelism and diverse architectures has raised demand for aggressively-optimising compilers to an all time high, leaving compiler developers struggling to keep up. […]
Mar, 29
Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations
Today, many big data applications require massively parallel tasks to compute complicated mathematical operations. To perform parallel tasks, platforms like CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are widely used and developed to enhance the throughput of massively parallel tasks. There is also a need for high-level abstractions and platform-independence over those […]
Mar, 29
Characterizing Optimizations to Memory Access Patterns using Architecture-Independent Program Features
High-performance computing developers are faced with the challenge of optimizing the performance of OpenCL workloads on diverse architectures. The Architecture-Independent Workload Characterization (AIWC) tool is a plugin for the Oclgrind OpenCL simulator that gathers metrics of OpenCL programs that can be used to understand and predict program performance on an arbitrary given hardware architecture. However, […]
Feb, 23
Performance Counters based Power Modeling of Mobile GPUs using Deep Learning
GPUs have recently become important computational units on mobile devices, resulting in heterogeneous devices that can run a variety of parallel processing applications. While developing and optimizing such applications, estimating power consumption is of immense importance as energy efficiency has become the key design constraint to optimize for on these platforms. In this work, we […]