high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » OpenCL Acceleration for TensorFlow

OpenCL Acceleration for TensorFlow

Mehdi Goli, Luke Iwanski, John Lawson, Uwe Dolinsky, Andrew Richards

Codeplay Software Ltd.

SysML Conference, 2018

BibTeX

Download (PDF)

View

Source

3145

views

There is huge demand for targeting complex and large-scale machine learning applications particularly those based on popular actively-maintained frameworks such as TensorFlow and CAFFE to a variety of platforms with accelerators ranging from high-end desktop GPUs to resource-constrained embedded or mobile GPUs, FPGAs, and DSPs. However, to deliver good performance different platforms may require different algorithms or data structures, yet code should be easily portable and reused as much as possible across different devices. The open SYCL standard addresses this by providing parallel processing through a single-source programming model enabling the same standard C++ code to be used on the CPU and accelerator. This allows high-level C++ abstractions and templates to be used to quickly configure device and host code to cover specific features of the platform. By targeting OpenCL, SYCL enables C++ applications such as TensorFlow to run efficiently on OpenCL devices without having to write OpenCL code.

Tags: ARM, Computer science, Deep learning, Machine learning, OpenCL, Performance, SYCL, TensorFlow

March 3, 2018 by hgpu

Rating: 1.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

OpenCL Acceleration for TensorFlow

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

OpenCL Acceleration for TensorFlow

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)