IRNov 14, 2020

RecTen: A Recursive Hierarchical Low Rank Tensor Factorization Method to Discover Hierarchical Patterns in Multi-modal Data

arXiv:2011.07363v1
AI Analysis

This addresses the need for hierarchical analysis in multi-modal data for domains like online security, though it is incremental as it builds on existing tensor decomposition methods.

The authors tackled the problem of discovering hierarchical patterns in multi-modal data by proposing RecTen, a recursive hierarchical tensor factorization method that recursively decomposes clusters and identifies termination conditions, which revealed clusters of users in online forums enabling early detection of events like ransomware outbreaks.

How can we expand the tensor decomposition to reveal a hierarchical structure of the multi-modal data in a self-adaptive way? Current tensor decomposition provides only a single layer of clusters. We argue that with the abundance of multimodal data and time-evolving networks nowadays, the ability to identify emerging hierarchies is important. To this effect, we propose RecTen, a recursive hierarchical soft clustering approach based on tensor decomposition. Our approach enables us to: (a) recursively decompose clusters identified in the previous step, and (b) identify the right conditions for terminating this process. In the absence of proper ground truth, we evaluate our approach with synthetic data and test its sensitivity to different parameters. We also apply RecTen on five real datasets which involve the activities of users in online discussion platforms, such as security forums. This analysis helps us reveal clusters of users with interesting behaviors, including but not limited to early detection of some real events like ransomware outbreaks, the emergence of a blackmarket of decryption tools, and romance scamming. To maximize the usefulness of our approach, we develop a tool which can help the data analysts and overall research community by identifying hierarchical structures. RecTen is an unsupervised approach which can be used to take the pulse of the large multi-modal data and let the data discover its own hidden structures by itself.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes