high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » BIDMach: Large-scale Learning with Zero Memory Allocation

BIDMach: Large-scale Learning with Zero Memory Allocation

John Canny, Huasha Zhao

Computer Science Division, UC Berkeley, Berkeley, CA 94720

Big Learning: Advances in Algorithms and Data Management, 2013

@article{canny2013bidmach,

title={BIDMach: Large-scale Learning with Zero Memory Allocation},

author={Canny, John and Zhao, Huasha},

year={2013}

}

Download (PDF)

View

Source

Source codes

Package:

BIDMach: CPU and GPU-accelerated Machine Learning Library

5695

views

This paper describes recent work on the BIDMach toolkit for large-scale machine learning. BIDMach has demonstrated single-node performance that exceeds that of published cluster systems for many common machine-learning task. BIDMach makes full use of both CPU and GPU acceleration (through a sister library BIDMat), and requires only modest hardware (commodity GPUs). One of the challenges of reaching this level of performance is the allocation barrier. While it is simple and expedient to allocate and recycle matrix (or graph) objects in expressions, this approach is too slow to match the arithmetic throughput possible on either GPUs or CPUs. In this paper we describe a caching approach that allows code with complex matrix (graph) expressions to run at massive scale, i.e. multi-terabyte data, with zero memory allocation after initial start-up. We present a number of new benchmarks that leverage this approach.

Tags: Benchmarking, Computer science, CUDA, Machine learning, nVidia, nVidia GeForce GTX 690

December 25, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

BIDMach: Large-scale Learning with Zero Memory Allocation

Package:

Recent source codes

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

ROCm's implementation of Gromacs

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Most viewed papers (last 30 days)

BIDMach: Large-scale Learning with Zero Memory Allocation

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)