SIJan 28, 2023
Unveiling Environmental Sensitivity of Individual Gains in Influence MaximizationXinyan Su, Zhiheng Zhang, Jiyan Qiu
Influence Maximization (IM) is to identify the seed set to maximize information dissemination in a network. Elegant IM algorithms could naturally extend to cases where each node is equipped with a specific weight, reflecting individual gains to measure the node's importance. Prevailing literature typically assumes such individual gains remain constant throughout the cascade process and are solvable through explicit formulas based on the node's characteristics and network topology. However, this assumption is not always feasible for two reasons: 1)Unobservability: The individual gains of each node are primarily evaluated by the difference between the outputs in the activated and non-activated states. In practice, we can only observe one of these states, with the other remaining unobservable post-propagation. 2)Environmental sensitivity: In addition to the node's inherent properties, individual gains are also sensitive to the activation status of surrounding nodes, which is dynamic during iteration even when the network topology remains static. To address these challenges, we extend the consideration of IM to a broader scenario with dynamic node individual gains, leveraging causality techniques. In our paper, we introduce a Causal Influence Maximization (CauIM) framework and develop two algorithms, G-CauIM and A-CauIM, where the latter incorporates a novel acceleration technique. Theoretically, we establish the generalized lower bound of influence spread and provide robustness analysis. Empirically, in synthetic and real-world experiments, we demonstrate the effectiveness and robustness of our algorithms.
MEFeb 23
Orthogonal Uplift Learning with Permutation-Invariant Representations for Combinatorial TreatmentsXinyan Su, Jiacan Gao, Mingyuan Ma et al.
We study uplift estimation for combinatorial treatments. Uplift measures the pure incremental causal effect of an intervention (e.g., sending a coupon or a marketing message) on user behavior, modeled as a conditional individual treatment effect. Many real-world interventions are combinatorial: a treatment is a policy that specifies context-dependent action distributions rather than a single atomic label. Although recent work considers structured treatments, most methods rely on categorical or opaque encodings, limiting robustness and generalization to rare or newly deployed policies. We propose an uplift estimation framework that aligns treatment representation with causal semantics. Each policy is represented by the mixture it induces over contextaction components and embedded via a permutation-invariant aggregation. This representation is integrated into an orthogonalized low-rank uplift model, extending Robinson-style decompositions to learned, vector-valued treatments. We show that the resulting estimator is expressive for policy-induced causal effects, orthogonally robust to nuisance estimation errors, and stable under small policy perturbations. Experiments on large-scale randomized platform data demonstrate improved uplift accuracy and stability in long-tailed policy regimes
LGOct 30, 2023
Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive RecommendationJialin Liu, Xinyan Su, Zeyu He et al.
Rewards serve as a measure of user satisfaction and act as a limiting factor in interactive recommender systems. In this research, we focus on the problem of learning to reward (LTR), which is fundamental to reinforcement learning. Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization, or assume that user-agent interactions provide perfect demonstrations, which is not feasible in practice. Ideally, we aim to employ a unified approach that optimizes both the reward and policy using compositional demonstrations. However, this requirement presents a challenge since rewards inherently quantify user feedback on-policy, while recommender agents approximate off-policy future cumulative valuation. To tackle this challenge, we propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties. Our method utilizes discounted stationary distribution correction to combine LTR and recommender agent evaluation. To fulfill the compositional requirement, we incorporate the concept of pessimism through conservation. Specifically, we modify the vanilla correction using Bellman transformation and enforce KL regularization to constrain consecutive policy updates. We use two real-world datasets which represent two compositional coverage to conduct empirical studies, the results also show that the proposed method relatively improves both effectiveness (2.3\%) and efficiency (11.53\%)
LGOct 30, 2023
A General Neural Causal Model for Interactive RecommendationJialin Liu, Xinyan Su, Peng Zhou et al.
Survivor bias in observational data leads the optimization of recommender systems towards local optima. Currently most solutions re-mines existing human-system collaboration patterns to maximize longer-term satisfaction by reinforcement learning. However, from the causal perspective, mitigating survivor effects requires answering a counterfactual problem, which is generally unidentifiable and inestimable. In this work, we propose a neural causal model to achieve counterfactual inference. Specifically, we first build a learnable structural causal model based on its available graphical representations which qualitatively characterizes the preference transitions. Mitigation of the survivor bias is achieved though counterfactual consistency. To identify the consistency, we use the Gumbel-max function as structural constrains. To estimate the consistency, we apply reinforcement optimizations, and use Gumbel-Softmax as a trade-off to get a differentiable function. Both theoretical and empirical studies demonstrate the effectiveness of our solution.