high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance and Scalability of GPU-Based Convolutional Neural Networks

Performance and Scalability of GPU-Based Convolutional Neural Networks

Daniel Strigl, Klaus Kofler, Stefan Podlipnig

Distrib. & Parallel Syst. Group, Univ. of Innsbruck, Innsbruck, Austria

2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010

DOI:10.1109/PDP.2010.43

BibTeX

Download (PDF)

View

Source

2320

views

In this paper we present the implementation of a framework for accelerating training and classification of arbitrary Convolutional Neural Networks (CNNs) on the GPU. CNNs are a derivative of standard Multilayer Perceptron (MLP) neural networks optimized for two-dimensional pattern recognition problems such as Optical Character Recognition (OCR) or face detection. We describe the basic parts of a CNN and demonstrate the performance and scalability improvement that can be achieved by shifting the computation-intensive tasks of a CNN to the GPU. Depending on the network topology training and classification on the GPU performs 2 to 24 times faster than on the CPU. Furthermore, the GPU version scales much better than the CPU implementation with respect to the network size.

Tags: Computer science, CUDA, Neural networks, nVidia

March 8, 2011 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Performance and Scalability of GPU-Based Convolutional Neural Networks

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Performance and Scalability of GPU-Based Convolutional Neural Networks

Share this:

Recent source codes

Most viewed papers (last 30 days)