SISep 6, 2021
Online Learning of Independent Cascade Models with Node-level FeedbackShuoguang Yang, Van-Anh Truong
We propose a detailed analysis of the online-learning problem for Independent Cascade (IC) models under node-level feedback. These models have widespread applications in modern social networks. Existing works for IC models have only shed light on edge-level feedback models, where the agent knows the explicit outcome of every observed edge. Little is known about node-level feedback models, where only combined outcomes for sets of edges are observed; in other words, the realization of each edge is censored. This censored information, together with the nonlinear form of the aggregated influence probability, make both parameter estimation and algorithm design challenging. We establish the first confidence-region result under this setting. We also develop an online algorithm achieving a cumulative regret of $\mathcal{O}( \sqrt{T})$, matching the theoretical regret bound for IC models with edge-level feedback.
LGApr 24, 2020
Online Learning with Cumulative Oversampling: Application to Budgeted Influence MaximizationShatian Wang, Shuoguang Yang, Zhen Xu et al.
We propose a cumulative oversampling (CO) method for online learning. Our key idea is to sample parameter estimations from the updated belief space once in each round (similar to Thompson Sampling), and utilize the cumulative samples up to the current round to construct optimistic parameter estimations that asymptotically concentrate around the true parameters as tighter upper confidence bounds compared to the ones constructed with standard UCB methods. We apply CO to a novel budgeted variant of the Influence Maximization (IM) semi-bandits with linear generalization of edge weights, whose offline problem is NP-hard. Combining CO with the oracle we design for the offline problem, our online learning algorithm simultaneously tackles budget allocation, parameter learning, and reward maximization. We show that for IM semi-bandits, our CO-based algorithm achieves a scaled regret comparable to that of the UCB-based algorithms in theory, and performs on par with Thompson Sampling in numerical experiments.
SINov 8, 2019
Online Learning and Optimization Under a New Linear-Threshold Model with Negative InfluenceShuoguang Yang, Shatian Wang, Van-Anh Truong
Problem definition: Corporate brands, grassroots activists, and ordinary citizens all routinely employ Word-of-mouth (WoM) diffusion to promote products and instigate social change. Our work models the formation and spread of negative attitudes via WoM on a social network represented by a directed graph. In an online learning setting, we examine how an agent could simultaneously learn diffusion parameters and choose sets of seed users to initiate diffusions and maximize positive influence. In contrast to edge-level feedback, in which an agent observes the relationship (edge) through which a user (node) is influenced, we more realistically assume node-level feedback, where an agent only observes when a user is influenced and whether that influence is positive or negative. Methodology/results: We propose a new class of negativity-aware Linear Threshold Models. We show that in these models, the expected positive influence spread is a monotone submodular function of the seed set. Therefore, when maximizing positive influence by selecting a seed set of fixed size, a greedy algorithm can guarantee a solution with a constant approximation ratio. For the online learning setting, we propose an algorithm that runs in epochs of growing lengths, each consisting of a fixed number of exploration rounds followed by an increasing number of exploitation rounds controlled by a hyperparameter. Under mild assumptions, we show that our algorithm achieves asymptotic expected average scaled regret that is inversely related to any fractional constant power of the number of rounds. Managerial implications: During seed selection, our negativity-aware models and algorithms allow WoM campaigns to discover and best account for characteristics of local users and propagated content. We also give the first algorithms with regret guarantees for influence maximization under node-level feedback.
LGNov 8, 2019
Beyond Adaptive Submodularity: Adaptive Influence Maximization with Intermediary ConstraintsShatian Wang, Zhen Xu, Van-Anh Truong
We consider a brand with a given budget that wants to promote a product over multiple rounds of influencer marketing. In each round, it commissions an influencer to promote the product over a social network, and then observes the subsequent diffusion of the product before adaptively choosing the next influencer to commission. This process terminates when the budget is exhausted. We assume that the diffusion process follows the popular Independent Cascade model. We also consider an online learning setting, where the brand initially does not know the diffusion parameters associated with the model, and has to gradually learn the parameters over time. Unlike in existing models, the rounds in our model are correlated through an intermediary constraint: each user can be commissioned for an unlimited number of times. However, each user will spread influence without commission at most once. Due to this added constraint, the order in which the influencers are chosen can change the influence spread, making obsolete existing analysis techniques that based on the notion of adaptive submodularity. We devise a sample path analysis to prove that a greedy policy that knows the diffusion parameters achieves at least $1-1/e - ε$ times the expected reward of the optimal policy. In the online-learning setting, we are the first to consider a truly adaptive decision making framework, rather than assuming independent epochs, and adaptivity only within epochs. Under mild assumptions, we derive a regret bound for our algorithm. In our numerical experiments, we simulate information diffusions on four Twitter sub-networks, and compare our UCB-based learning algorithms with several baseline adaptive seeding strategies. Our learning algorithm consistently outperforms the baselines and achieves rewards close to the greedy policy that knows the true diffusion parameters.