high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Computer vision » Transparent Acceleration of Java-based Deep Learning Engines

Transparent Acceleration of Java-based Deep Learning Engines

Athanasios Stratikopoulos, Mihai-Cristian Olteanu, Ian Vaughan, Zoran Sevarac, Nikos Foutris, Juan Fumero, Christos Kotselidis

The University of Manchester, Manchester, United Kingdom

International Conference on Managed Programming Languages & Runtimes (MPLR ’20), 2020

DOI:10.1145/3426182.3426188

BibTeX

Download (PDF)

View

Source

Source codes

Package:

visrec-api: specification of standard Visual Recognition API for Java (JSR381)

2358

views

The advent of modern cloud services, along with the huge volume of data produced on a daily basis, have increased the demand for fast and efficient data processing. This demand is common among numerous application domains, such as deep learning, data mining, and computer vision. In recent years, hardware accelerators have been employed as a means to meet this demand, due to the high parallelism that these applications exhibit. Although this approach can yield high performance, the development of new deep learning neural networks on heterogeneous hardware requires a steep learning curve. The main reason is that existing deep learning engines support the static compilation of the accelerated code, that can be accessed via wrapper calls from a wide range of managed programming languages (e.g., Java, Python, Scala). Therefore, the development of high-performance neural network architectures is fragmented between programming models, thereby forcing developers to manually specialize the code for heterogeneous execution. The specialization of the applications’ code for heterogeneous execution is not a trivial task, as it requires developers to have hardware expertise and use a low-level programming language, such as OpenCL, CUDA or High Level Synthesis (HLS) tools. In this paper we showcase how we have employed TornadoVM, a state-of-the-art heterogeneous programming framework to transparently accelerate Deep Netts on heterogeneous hardware. Our work shows how a pure Java-based deep learning neural network engine can be dynamically compiled at runtime and specialized for particular hardware accelerators, without requiring developers to employ any low-level programming framework typically used for such devices. Our preliminary results show up to 6.45x end-to-end performance speedup and up to 88.5x kernel performance speedup, when executing the feed forward process of the network’s training on the GPUs against the sequential execution of the original Deep Netts framework.

Tags: AMD RX Vega 64, ATI, Computer science, Computer vision, Deep learning, Java, nVidia, nVidia GeForce GTX 1050 Ti, OpenCL, Package

October 4, 2020 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Transparent Acceleration of Java-based Deep Learning Engines

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Transparent Acceleration of Java-based Deep Learning Engines

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)