high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Energy Efficiency Studies of Mont Blanc Applications

Energy Efficiency Studies of Mont Blanc Applications

Mads Holden

Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science, Norwegian University of Science and Technology

Norwegian University of Science and Technology, 2013

@article{holden2013energy,

title={Energy Efficiency Studies of Mont Blanc Applications},

author={Holden, Mads},

year={2013},

publisher={Institutt for datateknikk og informasjonsvitenskap}

}

Download (PDF)

View

Source

Source codes

Package:

Energy Efficiency Studies of Mont Blanc Applications

3056

views

In this thesis, the performance and energy efficiency of four different implementations of matrix multiplication, written in OmpSs and OpenCL, is tested and evaluated. The benchmarking is done using an Intel Ivy Bridge Core i7 3770K. The results are evaluated and discussed with regards to different optimization configurations, like vectorization and multi-threading. Energy measurements are taken using PAPI, which in turn uses the Running Average Power Limit interface in the Intel processor to take energy readings. Performance is presented using MFLOPS, while energy efficiency is compared using MFLOPS/W, watts used, and the energy delay product and energy delay squared. The OpenCL versions are compared with and without vectorization. One of the applications using OmpSs is also measured with regards to vectorization, and also number of threads. The last OmpSs version uses the BLAS implementation ATLAS, which is already vectorized. Therefore it is only compared using number of threads. SSE and AVX vectorization is shown to significantly improve performance while using little to no extra energy per second for all implementations. Multi-threading also gives higher performance, however this consumes more energy. Running with eight threads was shown to spend more energy while performing worse when using ATLAS. The OmpSs version using ATLAS was both the fastest and most energy efficient, peaking at 125 GFLOPS and 2.7 GLOPS/W while running with four threads and using AVX.

Tags: ARM, Computer science, Energy-efficient computing, Matrix multiplication, OpenCL, Package, Thesis

October 21, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Energy Efficiency Studies of Mont Blanc Applications

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Energy Efficiency Studies of Mont Blanc Applications

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)