high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs

MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs

Zeyi Wen, Rui Zhang, Kotagiri Ramamohanarao, Jianzhong Qi, Kerry Taylor

University of Melbourne, Australia

The IEEE International Conference on Data Mining (ICDM), 2014

@article{wen2014mascot,

title={MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs},

author={Wen, Zeyi and Zhang, Rui and Ramamohanarao, Kotagiri and Qi, Jianzhong and Taylor, Kerry},

year={2014}

}

Download (PDF)

View

Source

1860

views

Cross-validation is a commonly used method for evaluating the effectiveness of Support Vector Machines (SVMs). However, existing SVM cross-validation algorithms are not scalable to large datasets because they have to (i) hold the whole dataset in memory and/or (ii) perform a very large number of kernel value computation. In this paper, we propose a scheme to dramatically improve the scalability and efficiency of SVM cross-validation through the following key ideas. (i) To avoid holding the whole dataset in the memory and avoid performing repeated kernel value computation, we precompute the kernel values and reuse them. (ii) We store the precomputed kernel values to a high-speed storage framework, consisting of CPU memory extended by solid state drives (SSDs) and GPU memory as a cache, so that reusing (i.e., reading) kernel values takes much lesser time than computing them on-the-fly. (iii) To further improve the efficiency of the SVM training, we apply a number of techniques for the extreme example search algorithm, design a parallel kernel value read algorithm, propose a caching strategy well-suited to the characteristics of the storage framework, and parallelize the tasks on the GPU and the CPU. For datasets of sizes that existing algorithms can handle, our scheme achieves several orders of magnitude of speedup. More importantly, our scheme enables SVM cross-validation on datasets of very large scale that existing algorithms are unable to handle.

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia GeForce GTX 460, Vector Machine

September 25, 2014 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs

Share this:

Recent source codes

Most viewed papers (last 30 days)