IRMay 3, 2021
Improving Community Detection Performance in Heterogeneous Music Network by Learning Edge-type Usefulness DistributionZheng Gao, Chun Guo, Shutian Ma et al.
With music becoming an essential part of daily life, there is an urgent need to develop recommendation systems to assist people targeting better songs with fewer efforts. As the interactions between users and songs naturally construct a complex network, community detection approaches can be applied to reveal users' potential interests on songs by grouping relevant users & songs to the same community. However, as the types of interaction could be heterogeneous, it challenges conventional community detection methods designed originally for homogeneous networks. Although there are existing works on heterogeneous community detection, they are mostly task-driven approaches and not feasible for specific music recommendation. In this paper, we propose a genetic based approach to learn an edge-type usefulness distribution (ETUD) for all edge-types in heterogeneous music networks. ETUD can be regarded as a linear function to project all edges to the same latent space and make them comparable. Therefore a heterogeneous network can be converted to a homogeneous one where those conventional methods are eligible to use. We validate the proposed model on a heterogeneous music network constructed from an online music streaming service. Results show that for conventional methods, ETUD can help to detect communities significantly improving music recommendation accuracy while simultaneously reducing user searching cost.
IRSep 6, 2020
Efficient Personalized Community Detection via Genetic EvolutionZheng Gao, Chun Guo, Xiaozhong Liu
Personalized community detection aims to generate communities associated with user need on graphs, which benefits many downstream tasks such as node recommendation and link prediction for users, etc. It is of great importance but lack of enough attention in previous studies which are on topics of user-independent, semi-supervised, or top-K user-centric community detection. Meanwhile, most of their models are time consuming due to the complex graph structure. Different from these topics, personalized community detection requires to provide higher-resolution partition on nodes that are more relevant to user need while coarser manner partition on the remaining less relevant nodes. In this paper, to solve this task in an efficient way, we propose a genetic model including an offline and an online step. In the offline step, the user-independent community structure is encoded as a binary tree. And subsequently an online genetic pruning step is applied to partition the tree into communities. To accelerate the speed, we also deploy a distributed version of our model to run under parallel environment. Extensive experiments on multiple datasets show that our model outperforms the state-of-arts with significantly reduced running time.
IRJul 19, 2020
Counterfactual Learning to Rank using Heterogeneous Treatment Effect EstimationMucun Tian, Chun Guo, Vito Ostuni et al.
Learning-to-Rank (LTR) models trained from implicit feedback (e.g. clicks) suffer from inherent biases. A well-known one is the position bias -- documents in top positions are more likely to receive clicks due in part to their position advantages. To unbiasedly learn to rank, existing counterfactual frameworks first estimate the propensity (probability) of missing clicks with intervention data from a small portion of search traffic, and then use inverse propensity score (IPS) to debias LTR algorithms on the whole data set. These approaches often assume the propensity only depends on the position of the document, which may cause high estimation variance in applications where the search context (e.g. query, user) varies frequently. While context-dependent propensity models reduce variance, accurate estimations may require randomization or intervention on a large amount of traffic, which may not be realistic in real-world systems, especially for long tail queries. In this work, we employ heterogeneous treatment effect estimation techniques to estimate position bias when intervention click data is limited. We then use such estimations to debias the observed click distribution and re-draw a new de-biased data set, which can be used for any LTR algorithms. We conduct simulations with varying experiment conditions and show the effectiveness of the proposed method in regimes with long tail queries and sparse clicks.
CROct 17, 2018
Understanding the Related-Key Security of Feistel Ciphers from a Provable PerspectiveChun Guo
We initiate the provable related-key security treatment for models of practical Feistel ciphers. In detail, we consider Feistel networks with four whitening keys $w_i(k)$ ($i=0,1,2,3$) and round-functions of the form $f(γ_i(k)\oplus X)$, where $k$ is the main-key, $w_i$ and $γ_i$ are efficient transformations, and $f$ is a public ideal function or permutation that the adversary is allowed to query. We investigate conditions on the key-schedules that are sufficient for security against XOR-induced related-key attacks up to $2^{n/2}$ adversarial queries. When the key-schedules are non-linear, we prove security for 4 rounds. When only affine key-schedules are used, we prove security for 6 rounds. These also imply secure tweakable Feistel ciphers in the Random Oracle model. By shuffling the key-schedules, our model unifies both the DES-like structure (known as Feistel-2 scheme in the cryptanalytic community, a.k.a. key-alternating Feistel due to Lampe and Seurin, FSE 2014) and the Lucifer-like model (previously analyzed by Guo and Lin, TCC 2015). This allows us to derive concrete implications on these two (more common) models, and helps understanding their differences---and further understanding the related-key security of Feistel ciphers.