high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » A Unified Optimization Approach for Sparse Tensor Operations on GPUs

A Unified Optimization Approach for Sparse Tensor Operations on GPUs

Bangtian Liu, Chengyao Wen, Anand D.Sarwate, Maryam Mehri Dehnavi

Rutgers, The State University of New Jersey

arXiv:1705.09905 [cs.MS], (28 May 2017)

BibTeX

Download (PDF)

View

Source

2536

views

Sparse tensors appear in many large-scale applications with multidimensional and sparse data. While multidimensional sparse data often need to be processed on manycore processors, attempts to develop highly-optimized GPU-based implementations of sparse tensor operations are rare. The irregular computation patterns and sparsity structures as well as the large memory footprints of sparse tensor operations make such implementations challenging. We leverage the fact that sparse tensor operations share similar computation patterns to propose a unified tensor representation called F-COO. Combined with GPU-specific optimizations, F-COO provides highly-optimized implementations of sparse tensor computations on GPUs. The performance of the proposed unified approach is demonstrated for tensor-based kernels such as the Sparse Matricized Tensor- Times-Khatri-Rao Product (SpMTTKRP) and the Sparse Tensor- Times-Matrix Multiply (SpTTM) and is used in tensor decomposition algorithms. Compared to state-of-the-art work we improve the performance of SpTTM and SpMTTKRP up to 3.7 and 30.6 times respectively on NVIDIA Titan-X GPUs. We implement a CANDECOMP/PARAFAC (CP) decomposition and achieve up to 14.9 times speedup using the unified method over state-of-the-art libraries on NVIDIA Titan-X GPUs.

Tags: Algorithms, Computer science, CUDA, Linear Algebra, Mathematical Software, nVidia, nVidia GeForce GTX Titan X, Sparse matrix

June 1, 2017 by hgpu

Rating: 1.8/5. From 3 votes.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

A Unified Optimization Approach for Sparse Tensor Operations on GPUs

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

A Unified Optimization Approach for Sparse Tensor Operations on GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)