high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Energy Efficient Computing on Multi-core Processors: Vectorization and Compression Techniques

Energy Efficient Computing on Multi-core Processors: Vectorization and Compression Techniques

Abdullah Al Hasib

Norwegian University of Science and Technology, Faculty of Information Technology and Electrical Engineering, Department of Computer Science

Norwegian University of Science and Technology, 2018

BibTeX

Download (PDF)

View

Source

2666

views

Over the past few years, energy consumption has become the main limiting factor for computing in general. This has led CPU vendors to aggressively promote parallel computing using multiple cores without significantly increasing the thermal design power of the processor. However, achieving maximum performance and energy efficiency from the available resources on the multi-core and many-core platforms mandates efficient exploitation of the existing and emerging architectural features at the application level. This thesis presents the study of some of the existing and emerging technologies in order to identify the potential of exploiting these technologies in achieving high performance and energy efficiency for a set of Smart Grid applications on Intel multi-core and many-core platforms. The first part of this thesis explores the energy efficiency impact of different multi-core programming techniques for a selected set of benchmarks and smart grid applications on Intel SandyBridge and Haswell multi-core processors. These techniques include different parallelism techniques such as thread-level parallelism using OpenMP, task-based parallelism using OmpSs, data parallelism using SIMD (Single Instruction Multiple Data) instruction sets, code optimizations and use of different existing optimized math libraries. In our initial case studies, SIMD vectorization is proven very effective in providing both high performance and energy efficiency. Though the SIMD vectorization is proven very effective, it can also exert pressure on the available memory bandwidth for some applications like Powel Time-Series Kernel, causing under-utilization of the computing resources and thus energy inefficient executions. In the second part of this research, we investigate the opportunities of improving the performance of SIMD vectorization for memory-bound applications using SIMD data compression, SIMD software prefetching, SIMD shuffling, code-blocking and other code transformation techniques. The key idea is to reduce the data movement across memory hierarchy by using the idle CPU time. We show that integration of data compression is feasible on the Intel multicore platforms, as long as we can do it in a reasonable time. We present a comprehensive discussion on the SIMD compression techniques and the code transformations required for achieving efficient SIMD computations for memory/cache bound applications using Powel time series kernel as a demonstrator application. Finally, we perform feasibility study of SIMD optimization and compression techniques across other application domains using k-means clustering algorithm and full-search motion estimation algorithm. We also extended our experiments on Intel many-core architecture using Intel Xeon Phi coprocessor.

Tags: Compression, Computer science, Data parallelism, Intel Xeon Phi, Prefetch, Programming techniques, Thesis

June 20, 2018 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Energy Efficient Computing on Multi-core Processors: Vectorization and Compression Techniques

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Energy Efficient Computing on Multi-core Processors: Vectorization and Compression Techniques

Share this:

Recent source codes

Most viewed papers (last 30 days)