CSTF: Large-scale sparse tensor factorizations on distributed platforms

Zachary Blanco, Bangtian Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Tensors, or N-dimensional arrays, are increasingly used to represent multi-dimensional data. Sparse tensor decomposition algorithms are of particular interest in analyzing and compressing big datasets due to the fact that most of real-world data is sparse and multidimensional. However, state-of-the-art tensor decomposition algorithms are not scalable for overwhelmingly large and higher-order sparse tensors on distributed platforms. In this paper, we use the MapReduce model and the Spark engine to implement tensor factorizations on distributed platforms. The proposed CSTF algorithm, Cloud-based Sparse Tensor Factorization, is a scalable distributed algorithm for tensor decompositions for large data. It uses the coordinate storage format (COO) to operate on the tensor nonzeros directly, thus, eliminating the need for tensor unfolding and the storage of intermediate data. Also, a novel queuing strategy (QCOO) is proposed to exploit the dependency and data reuse between a sequence of tensor operations in tensor decomposition algorithms. Details on the key-value storage paradigm and Spark features used to implement the algorithm and the data reuse strategies are also provided. The queuing strategy reduces data communication costs by 35% for 3rd-order tensors and by 31% for 4th-order tensors over the COO-based implementation respectively. Compared to the state-of-the-art work, BIGtensor, CSTF achieves 2.2× to 6.9× speedup for 3rd-order tensor decompositions.

Original languageEnglish (US)
Title of host publicationProceedings of the 47th International Conference on Parallel Processing, ICPP 2018
PublisherAssociation for Computing Machinery
ISBN (Print)9781450365109
DOIs
StatePublished - Aug 13 2018
Event47th International Conference on Parallel Processing, ICPP 2018 - Eugene, United States
Duration: Aug 14 2018Aug 16 2018

Publication series

NameACM International Conference Proceeding Series

Other

Other47th International Conference on Parallel Processing, ICPP 2018
CountryUnited States
CityEugene
Period8/14/188/16/18

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'CSTF: Large-scale sparse tensor factorizations on distributed platforms'. Together they form a unique fingerprint.

  • Cite this

    Blanco, Z., & Liu, B. (2018). CSTF: Large-scale sparse tensor factorizations on distributed platforms. In Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018 [a21] (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3225058.3225133