high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Combining approximate inference methods for efficient learning on large computer clusters

Combining approximate inference methods for efficient learning on large computer clusters

Zhenwen Dai, Jacquelyn A. Shelton, Jorg Bornschein, Abdul Saboor Sheikh, Jorg Lucke

Frankfurt Institute for Advanced Studies,Germany

Workshop on Big Learning: Algorithms, Systems, and Tools for Learning at Scale (NIPS’11), 2011

BibTeX

Download (PDF)

View

Source

2028

views

An important challenge in machine learning is to develop learning algorithms that can handle large amounts of data at a realistically large scale. This entails not only the development of algorithms that can be efficiently trained to infer parameters of the model in a given dataset, but also demands careful thought about the tools (both software and hardware) used in their implementation. Based on the previously developed framework of parallel Expectation Maximization (EM) learning [1], we extend it to different models with corresponding parallelization techniques. To further tackle problems of computational complexity and to utilize the capability of the parallel computing hardware (CPU/GPU clusters), we developed a set of techniques which can be catered to specific large-scale learning problems. For instance, we design a dynamic data repartition technique for "Gaussian sparse coding" (Sec. 3.2), use specialized GPU kernels for translation invariant learning (Sec. 3.3), and show how sampling can be used to further scale the learning on very high dimensional data (Sec. 3.4). We propose these as examples of a parallelization toolbox which can be creatively combined and exploited in model-task driven ways. The framework is a lightweight and easy to use implementation of Python which facilitates the development of massive parallel machine learning algorithms using Message Passing Interface (MPI) for communication between the compute nodes. Once algorithms are integrated into the framework, they can be executed on large numbers of processor cores and can be applied to large sets of data. Some of the numerical experiments we performed ran on InfiniBand interconnected clusters and used up to 5000 parallel processor cores with more than 10^17 floating point operations. For reasonably balanced meta-parameters (number of data points vs. number of latent variables vs. number of model parameters to be inferred), we observe close to linear runtime scaling behavior with respect to the number of cores in use.

Tags: Algorithms, Computational Complexity, Computer science, GPU cluster, Machine learning, MPI, nVidia, nVidia GeForce GTX 480, OpenCL, Python

January 19, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Combining approximate inference methods for efficient learning on large computer clusters

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Combining approximate inference methods for efficient learning on large computer clusters

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)