high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Firas Abuzaid, Stefan Hadjis, Ce Zhang, Christopher Re

Stanford University

arXiv:1504.04343 [cs.LG], (16 Apr 2015)

BibTeX

Download (PDF)

View

Source

2435

views

We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve up to one order of magnitude throughput improvement on convolutional layers over Caffe. Moreover, with these improvements, the end-to-end training time for CNNs is directly proportional to the FLOPS delivered by the CPU, which enables us to efficiently train hybrid CPU-GPU systems for CNNs.

Tags: Computer science, CUDA, Deep learning, Machine learning, Neural networks, nVidia, nVidia GRID K520, Tesla K40

April 20, 2015 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)