high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Kaiwei Li, Jianfei Chen, Wenguang Chen, Jun Zhu

Tsinghua Universtiy

arXiv:1610.02496 [cs.DC], (8 Oct 2016)

@article{li2016saberlda,

title={SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs},

author={Li, Kaiwei and Chen, Jianfei and Chen, Wenguang and Zhu, Jun},

year={2016},

month={oct},

archivePrefix={"arXiv"},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

2184

views

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time and space complexity is linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity and scales well to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a new warp-based sampling kernel, and an efficient sparse count matrix updating algorithm that improves locality, makes efficient utilization of GPU warps, and reduces memory consumption. xperiments show that SaberLDA can learn from billions-token-scale data with up to 10,000 topics, which is almost two orders of magnitude larger than that of the previous GPU-based systems. With a single GPU card, SaberLDA is able to earn 10,000 topics from a dataset of billions of tokens in a few hours, which is only achievable with clusters with tens of machines before.

Tags: Algorithms, Computer science, CUDA, Information Retrieval, Latent Dirichlet allocation, Machine learning, nVidia, nVidia GeForce GTX 1080, nVidia GeForce GTX Titan X

October 12, 2016 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Your response

Recent source codes

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

BoltzGen:Toward Universal Binder Design

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

tritonBLAS: A Lightweight Triton-based General Matrix Multiplication (GEMM) Library

Most viewed papers (last 30 days)

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)