Shuai Ma

CL
h-index65
56papers
4,095citations
Novelty51%
AI Score59

56 Papers

CLSep 12, 2023Code
Re-Reading Improves Reasoning in Large Language Models

Xiaohan Xu, Chongyang Tao, Tao Shen et al. · microsoft-research

To enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs), we introduce a simple, yet general and effective prompting method, Re2, i.e., \textbf{Re}-\textbf{Re}ading the question as input. Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), which aim to elicit the reasoning process in the output, Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. Consequently, Re2 demonstrates strong generality and compatibility with most thought-eliciting prompting methods, including CoT. Crucially, Re2 facilitates a "bidirectional" encoding in unidirectional decoder-only LLMs because the first pass could provide global information for the second pass. We begin with a preliminary empirical study as the foundation of Re2, illustrating its potential to enable "bidirectional" attention mechanisms. We then evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality. Our findings indicate that, with the exception of a few scenarios on vanilla ChatGPT, Re2 consistently enhances the reasoning performance of LLMs through a simple re-reading strategy. Further analyses reveal Re2's adaptability, showing how it can be effectively integrated with different LLMs, thought-eliciting prompting, and ensemble strategies. Our code is available at \url{https://github.com/Tebmer/Rereading-LLM-Reasoning/}

ITSep 21, 2022
Robust Information Bottleneck for Task-Oriented Communication with Digital Modulation

Songjie Xie, Shuai Ma, Ming Ding et al.

Task-oriented communications, mostly using learning-based joint source-channel coding (JSCC), aim to design a communication-efficient edge inference system by transmitting task-relevant information to the receiver. However, only transmitting task-relevant information without introducing any redundancy may cause robustness issues in learning due to the channel variations, and the JSCC which directly maps the source data into continuous channel input symbols poses compatibility issues on existing digital communication systems. In this paper, we address these two issues by first investigating the inherent tradeoff between the informativeness of the encoded representations and the robustness to information distortion in the received representations, and then propose a task-oriented communication scheme with digital modulation, named discrete task-oriented JSCC (DT-JSCC), where the transmitter encodes the features into a discrete representation and transmits it to the receiver with the digital modulation scheme. In the DT-JSCC scheme, we develop a robust encoding framework, named robust information bottleneck (RIB), to improve the communication robustness to the channel variations, and derive a tractable variational upper bound of the RIB objective function using the variational approximation to overcome the computational intractability of mutual information. The experimental results demonstrate that the proposed DT-JSCC achieves better inference performance than the baseline methods with low communication latency, and exhibits robustness to channel variations due to the applied RIB framework.

HCJan 14, 2023
Who Should I Trust: AI or Myself? Leveraging Human and AI Correctness Likelihood to Promote Appropriate Trust in AI-Assisted Decision-Making

Shuai Ma, Ying Lei, Xinru Wang et al.

In AI-assisted decision-making, it is critical for human decision-makers to know when to trust AI and when to trust themselves. However, prior studies calibrated human trust only based on AI confidence indicating AI's correctness likelihood (CL) but ignored humans' CL, hindering optimal team decision-making. To mitigate this gap, we proposed to promote humans' appropriate trust based on the CL of both sides at a task-instance level. We first modeled humans' CL by approximating their decision-making models and computing their potential performance in similar instances. We demonstrated the feasibility and effectiveness of our model via two preliminary studies. Then, we proposed three CL exploitation strategies to calibrate users' trust explicitly/implicitly in the AI-assisted decision-making process. Results from a between-subjects experiment (N=293) showed that our CL exploitation strategies promoted more appropriate human trust in AI, compared with only using AI confidence. We further provided practical implications for more human-compatible AI-assisted decision-making.

LGJun 15, 2022
Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Xiaoteng Ma, Shuai Ma, Li Xia et al. · tsinghua

Keeping risk under control is often more crucial than maximizing expected rewards in real-world decision-making situations, such as finance, robotics, autonomous driving, etc. The most natural choice of risk measures is variance, which penalizes the upside volatility as much as the downside part. Instead, the (downside) semivariance, which captures the negative deviation of a random variable under its mean, is more suitable for risk-averse proposes. This paper aims at optimizing the mean-semivariance (MSV) criterion in reinforcement learning w.r.t. steady reward distribution. Since semivariance is time-inconsistent and does not satisfy the standard Bellman equation, the traditional dynamic programming methods are inapplicable to MSV problems directly. To tackle this challenge, we resort to Perturbation Analysis (PA) theory and establish the performance difference formula for MSV. We reveal that the MSV problem can be solved by iteratively solving a sequence of RL problems with a policy-dependent reward function. Further, we propose two on-policy algorithms based on the policy gradient theory and the trust region method. Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in MuJoCo, which demonstrate the effectiveness of our proposed methods.

CVOct 10, 2022
HORIZON: High-Resolution Semantically Controlled Panorama Synthesis

Kun Yan, Lei Ji, Chenfei Wu et al.

Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds. Nevertheless, contemporary panoramic synthesis techniques grapple with the challenge of semantically guiding the content generation process. Although recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, a direct application of these methods to panorama synthesis yields distorted content. In this study, we unveil an innovative framework for generating high-resolution panoramas, adeptly addressing the issues of spherical distortion and edge discontinuity through sophisticated spherical modeling. Our pioneering approach empowers users with semantic control, harnessing both image and text inputs, while concurrently streamlining the generation of high-resolution panoramas using parallel decoding. We rigorously evaluate our methodology on a diverse array of indoor and outdoor datasets, establishing its superiority over recent related work, in terms of both quantitative and qualitative performance metrics. Our research elevates the controllability, efficiency, and fidelity of panorama synthesis to new levels.

HCFeb 17, 2023
Competent but Rigid: Identifying the Gap in Empowering AI to Participate Equally in Group Decision-Making

Chengbo Zheng, Yuheng Wu, Chuhan Shi et al.

Existing research on human-AI collaborative decision-making focuses mainly on the interaction between AI and individual decision-makers. There is a limited understanding of how AI may perform in group decision-making. This paper presents a wizard-of-oz study in which two participants and an AI form a committee to rank three English essays. One novelty of our study is that we adopt a speculative design by endowing AI equal power to humans in group decision-making.We enable the AI to discuss and vote equally with other human members. We find that although the voice of AI is considered valuable, AI still plays a secondary role in the group because it cannot fully follow the dynamics of the discussion and make progressive contributions. Moreover, the divergent opinions of our participants regarding an "equal AI" shed light on the possible future of human-AI relations.

CLAug 18, 2022Code
Ered: Enhanced Text Representations with Entities and Descriptions

Qinghua Zhao, Shuai Ma, Yuxuan Lei

External knowledge,e.g., entities and entity descriptions, can help humans understand texts. Many works have been explored to include external knowledge in the pre-trained models. These methods, generally, design pre-training tasks and implicitly introduce knowledge by updating model weights, alternatively, use it straightforwardly together with the original text. Though effective, there are some limitations. On the one hand, it is implicit and only model weights are paid attention to, the pre-trained entity embeddings are ignored. On the other hand, entity descriptions may be lengthy, and inputting into the model together with the original text may distract the model's attention. This paper aims to explicitly include both entities and entity descriptions in the fine-tuning stage. First, the pre-trained entity embeddings are fused with the original text representation and updated by the backbone model layer by layer. Second, descriptions are represented by the knowledge module outside the backbone model, and each knowledge layer is selectively connected to one backbone layer for fusing. Third, two knowledge-related auxiliary tasks, i.e., entity/description enhancement and entity enhancement/pollution task, are designed to smooth the semantic gaps among evolved representations. We conducted experiments on four knowledge-oriented tasks and two common tasks, and the results achieved new state-of-the-art on several datasets. Besides, we conduct an ablation study to show that each module in our method is necessary. The code is available at https://github.com/lshowway/Ered.

LGMar 3
Heterogeneous Agent Collaborative Reinforcement Learning

Zhixia Zhang, Zixuan Huang, Xin Xia et al.

We introduce Heterogeneous Agent Collaborative Reinforcement Learning (HACRL), a new learning paradigm that addresses the inefficiencies of isolated on-policy optimization. HACRL enables collaborative optimization with independent execution: heterogeneous agents share verified rollouts during training to mutually improve, while operating independently at inference time. Unlike LLM-based multi-agent reinforcement learning (MARL), HACRL does not require coordinated deployment, and unlike on-/off-policy distillation, it enables bidirectional mutual learning among heterogeneous agents rather than one-directional teacher-to-student transfer. Building on this paradigm, we propose HACPO, a collaborative RL algorithm that enables principled rollout sharing to maximize sample utilization and cross-agent knowledge transfer. To mitigate capability discrepancies and policy distribution shifts, HACPO introduces four tailored mechanisms with theoretical guarantees on unbiased advantage estimation and optimization correctness. Extensive experiments across diverse heterogeneous model combinations and reasoning benchmarks show that HACPO consistently improves all participating agents, outperforming GSPO by an average of 3.3\% while using only half the rollout cost.

IVApr 20
Optimally Bridging Semantics and Data: Generative Semantic Communication via Schrödinger Bridge

Dahua Gao, Ruichao Liu, Minxi Yang et al.

Generative Semantic Communication (GSC) is a promising solution for image transmission over narrow-band and high-noise channels. However, existing GSC methods rely on long, indirect transport trajectories from a Gaussian to an image distribution guided by semantics, causing severe hallucination and high computational cost. To address this, we propose a general framework named Schrödinger Bridge-based GSC (SBGSC). By leveraging the Schrödinger Bridge (SB) to construct optimal transport trajectories between arbitrary distributions, SBGSC breaks Gaussian limitations and enables direct generative decoding from semantics to images. Within this framework, we design Diffusion SB-based GSC (DSBGSC). DSBGSC reconstructs the nonlinear drift term of diffusion models using Schrödinger potentials, achieving direct optimal distribution transport to reduce hallucinations and computational overhead. To further accelerate generation, we propose a self-consistency-based objective guiding the model to learn a nonlinear velocity field pointing directly toward the image, bypassing Markovian noise prediction to significantly reduce sampling steps. Simulation results demonstrate that DSBGSC outperforms state-of-the-art GSC methods, improving FID by at least 38% and SSIM by 49.3%, while accelerating inference speed by over 8 times.

LGJan 13
Your Group-Relative Advantage Is Biased

Fengkai Yang, Zherui Chen, Xiaohan Wang et al.

Reinforcement Learning from Verifier Rewards (RLVR) has emerged as a widely used approach for post-training large language models on reasoning tasks, with group-based methods such as GRPO and its variants gaining broad adoption. These methods rely on group-relative advantage estimation to avoid learned critics, yet its theoretical properties remain poorly understood. In this work, we uncover a fundamental issue of group-based RL: the group-relative advantage estimator is inherently biased relative to the true (expected) advantage. We provide the first theoretical analysis showing that it systematically underestimates advantages for hard prompts and overestimates them for easy prompts, leading to imbalanced exploration and exploitation. To address this issue, we propose History-Aware Adaptive Difficulty Weighting (HA-DW), an adaptive reweighting scheme that adjusts advantage estimates based on an evolving difficulty anchor and training dynamics. Both theoretical analysis and experiments on five mathematical reasoning benchmarks demonstrate that HA-DW consistently improves performance when integrated into GRPO and its variants. Our results suggest that correcting biased advantage estimation is critical for robust and efficient RLVR training.

DBMay 22
BCTuner: LLM-Guided Monte Carlo Tree Search for Efficient Blockchain Knob Tuning

Yaoyi Deng, Chongyang Tao, Mingxuan Li et al.

Knob tuning plays a critical role in improving the performance of permissioned blockchains. However, efficient tuning remains challenging due to the architectural complexity of blockchains and the semantic gap between knob-specific logic and the numerical optimization requirements of tuning tools. In addition, configuration changes are often coupled across different stages of the transaction pipeline, making their performance impact difficult to isolate and predict. Since each trial requires deployment and distributed benchmarking, ineffective exploration incurs substantial cost. These challenges motivate BCTuner, a Large Language Model (LLM)-guided framework that combines knowledge-guided reasoning with structured search. BCTuner organizes multi-source tuning knowledge to support LLM-based reasoning over knob semantics, constraints, and deployment context. It formulates tuning as a Monte Carlo Tree Search (MCTS) process over structured action trajectories, where configurations are incrementally constructed, validated, evaluated, and refined rather than generated in one step. BCTuner further applies adaptive pruning to discard infeasible or low-potential branches before system evaluation. We evaluate BCTuner on Hyperledger Fabric and ChainMaker under diverse workloads and network settings. Experimental results show that BCTuner achieves up to 211.38% throughput improvement over default configurations and outperforms the state-of-the-art blockchain tuning method by up to 20% in performance, while requiring up to 8x fewer interactions with the blockchain system.

CVNov 13, 2025Code
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

Huijie Liu, Shuhao Cui, Haoxiang Cao et al.

Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typically rely on lengthy textual prompts, reference images, or parameter-efficient fine-tuning to guide style-aware image generation, but often struggle with style consistency, limited creativity, and complex style representations. In this paper, we affirm that a style is worth one numerical code by introducing the novel task, code-to-style image generation, which produces images with novel, consistent visual styles conditioned solely on a numerical style code. To date, this field has only been primarily explored by the industry (e.g., Midjourney), with no open-source research from the academic community. To fill this gap, we propose CoTyle, the first open-source method for this task. Specifically, we first train a discrete style codebook from a collection of images to extract style embeddings. These embeddings serve as conditions for a text-to-image diffusion model (T2I-DM) to generate stylistic images. Subsequently, we train an autoregressive style generator on the discrete style embeddings to model their distribution, allowing the synthesis of novel style embeddings. During inference, a numerical style code is mapped to a unique style embedding by the style generator, and this embedding guides the T2I-DM to generate images in the corresponding style. Unlike existing methods, our method offers unparalleled simplicity and diversity, unlocking a vast space of reproducible styles from minimal input. Extensive experiments validate that CoTyle effectively turns a numerical code into a style controller, demonstrating a style is worth one code.

LGSep 9, 2022
Self-supervised Learning for Heterogeneous Graph via Structure Information based on Metapath

Shuai Ma, Jian-wei Liu, Xin Zuo

graph neural networks (GNNs) are the dominant paradigm for modeling and handling graph structure data by learning universal node representation. The traditional way of training GNNs depends on a great many labeled data, which results in high requirements on cost and time. In some special scene, it is even unavailable and impracticable. Self-supervised representation learning, which can generate labels by graph structure data itself, is a potential approach to tackle this problem. And turning to research on self-supervised learning problem for heterogeneous graphs is more challenging than dealing with homogeneous graphs, also there are fewer studies about it. In this paper, we propose a SElfsupervised learning method for heterogeneous graph via Structure Information based on Metapath (SESIM). The proposed model can construct pretext tasks by predicting jump number between nodes in each metapath to improve the representation ability of primary task. In order to predict jump number, SESIM uses data itself to generate labels, avoiding time-consuming manual labeling. Moreover, predicting jump number in each metapath can effectively utilize graph structure information, which is the essential property between nodes. Therefore, SESIM deepens the understanding of models for graph structure. At last, we train primary task and pretext tasks jointly, and use meta-learning to balance the contribution of pretext tasks for primary task. Empirical results validate the performance of SESIM method and demonstrate that this method can improve the representation ability of traditional neural networks on link prediction task and node classification task.

LGNov 4, 2023
MATA*: Combining Learnable Node Matching with A* Algorithm for Approximate Graph Edit Distance Computation

Junfeng Liu, Min Zhou, Shuai Ma et al.

Graph Edit Distance (GED) is a general and domain-agnostic metric to measure graph similarity, widely used in graph search or retrieving tasks. However, the exact GED computation is known to be NP-complete. For instance, the widely used A* algorithms explore the entire search space to find the optimal solution which inevitably suffers scalability issues. Learning-based methods apply graph representation techniques to learn the GED by formulating a regression task, which can not recover the edit path and lead to inaccurate GED approximation (i.e., the predicted GED is smaller than the exact). To this end, in this work, we present a data-driven hybrid approach MATA* for approximate GED computation based on Graph Neural Networks (GNNs) and A* algorithms, which models from the perspective of learning to match nodes instead of directly regressing GED. Specifically, aware of the structure-dominant operations (i.e.,node and edge insertion/deletion) property in GED computation, a structure-enhanced GNN is firstly designed to jointly learn local and high-order structural information for node embeddings for node matchings. Second, top-k candidate nodes are produced via a differentiable top-k operation to enable the training for node matchings, which is adhering to another property of GED, i.e., multiple optimal node matchings. Third, benefiting from the candidate nodes, MATA* only performs on the promising search directions, reaching the solution efficiently. Finally, extensive experiments show the superiority of MATA* as it significantly outperforms the combinatorial search-based, learning-based and hybrid methods and scales well to large-size graphs.

OCFeb 27, 2023
Global Algorithms for Mean-Variance Optimization in Markov Decision Processes

Li Xia, Shuai Ma

Dynamic optimization of mean and variance in Markov decision processes (MDPs) is a long-standing challenge caused by the failure of dynamic programming. In this paper, we propose a new approach to find the globally optimal policy for combined metrics of steady-state mean and variance in an infinite-horizon undiscounted MDP. By introducing the concepts of pseudo mean and pseudo variance, we convert the original problem to a bilevel MDP problem, where the inner one is a standard MDP optimizing pseudo mean-variance and the outer one is a single parameter selection problem optimizing pseudo mean. We use the sensitivity analysis of MDPs to derive the properties of this bilevel problem. By solving inner standard MDPs for pseudo mean-variance optimization, we can identify worse policy spaces dominated by optimal policies of the pseudo problems. We propose an optimization algorithm which can find the globally optimal policy by repeatedly removing worse policy spaces. The convergence and complexity of the algorithm are studied. Another policy dominance property is also proposed to further improve the algorithm efficiency. Numerical experiments demonstrate the performance and efficiency of our algorithms. To the best of our knowledge, our algorithm is the first that efficiently finds the globally optimal policy of mean-variance optimization in MDPs. These results are also valid for solely minimizing the variance metrics in MDPs.

HCMar 4
Echoes of Norms: Investigating Counterspeech Bots' Influence on Bystanders in Online Communities

Mengyao Wang, Shuai Ma, Nuo Li et al.

Counterspeech offers a non-repressive approach to moderate hate speech in online communities. Research has examined how counterspeech chatbots restrain hate speakers and support targets, but their impact on bystanders remains unclear. Therefore, we developed a counterspeech strategy framework and built \textit{Civilbot} for a mixed-method within-subjects study. Bystanders generally viewed Civilbot as credible and normative, though its shallow reasoning limited persuasiveness. Its behavioural effects were subtle: when performing well, it could guide participation or act as a stand-in; when performing poorly, it could discourage bystanders or motivate them to step in. Strategy proved critical: cognitive strategies that appeal to reason, especially when paired with a positive tone, were relatively effective, while mismatch of contexts and strategies could weaken impact. Based on these findings, we offer design insights for mobilizing bystanders and shaping online discourse, highlighting when to intervene and how to do so through reasoning-driven and context-aware strategies.

HCApr 5
Exploring a Gamified Personality Assessment Method through Interaction with LLM Agents Embodying Different Personalities

Baiqiao Zhang, Xiangxian Li, Chao Zhou et al.

The low-intrusion and automated personality assessment is receiving increasing attention in psychology and human-computer interaction fields. This study explores an interactive approach for personality assessment, focusing on the multiplicity of personality representation. We propose a framework of Gamified Personality Assessment through Multi-Personality Representations (Multi-PR GPA). The framework leverages Large Language Models to empower virtual agents with different personalities. These agents elicit multifaceted human personality representations through engaging in interactive games. Drawing upon the multi-type textual data generated throughout the interaction, it achieves personality assessments with interpretable insights. Grounded in the classic Big Five personality theory, we developed a prototype system and conducted a user study to evaluate the efficacy of Multi-PR GPA. The results affirm the effectiveness of our approach in personality assessment and demonstrate its superior performance when considering the multiplicity of personality representation. Error structure analysis further revealed systematic assessment biases in LLMs, which multi-context aggregation partially mitigated.

SPApr 28
SPAT: A Semantic Port-Aware Adaptive-Rate Transmission Protocol for Semantic Communication

Yunhao Wang, Shuai Ma, Bin Shen et al.

With the evolution of 6G, semantic communication has emerged as a promising paradigm by prioritizing the delivery of task-relevant meaning over strict bit-level correctness. However, existing transport mechanisms still rely on explicit port headers and bit-level validation, making them vulnerable to header corruption and the resulting packet loss. To address this issue, this paper proposes a Semantic Port-Aware Adaptive-Rate Transmission Protocol (SPAT) for semantic communication. The proposed framework jointly embeds source and destination port information into semantic representations, thereby reducing dependence on explicit port headers while enabling robust port-aware transmission. Furthermore, a differentiated semantic processing mechanism is developed for uplink and downlink scenarios, where port identification is introduced for uplink service recognition and destination-aware conditional gating is designed for downlink selective decoding. In addition, an adaptive-rate controller is incorporated to dynamically adjust the number of transmitted semantic channels according to channel conditions and feature importance, thereby improving both robustness and transmission efficiency. Experimental results on the AFHQ and ImageNet-10 datasets, together with real-world experimental measurements, demonstrate that SPAT consistently outperforms TCP, UDP, and SITP in reconstruction quality across different SNRs while maintaining low-latency transmission.

LGMay 6
Unsat Core Prediction through Polarity-Aware Representation Learning over Clause-Literal Hypergraphs

Zhenchao Sun, Shuai Ma, Ping Lu et al.

Graph neural networks have been widely used in Boolean satisfiability (SAT) tasks to learn structural information from SAT formulas. The goal of these studies is to solve SAT instances or to enhance SAT solvers, including tasks such as unsat-core prediction. However, most existing approaches model a SAT formula as a bipartite graph or a directed acyclic graph, which are less expressive in capturing higher-order interactions among literals and clauses. Moreover, these approaches are limited in modeling intrinsic polarity-related properties of SAT, such as the complementary relationship between the positive and negative literals of a variable. To address these limitations, we propose a polarity-aware representation learning framework over clause-literal hypergraphs. We model SAT formulas as clause-literal hypergraphs augmented with a clause incidence graph to capture higher-order structural interactions. We then introduce a polarity-aware decomposed mechanism that separates variable representations into polarity invariant and equivariant components, explicitly modeling the relationship between positive and negative literals, with the resulting literal representations propagated along the hypergraph structure. We further incorporate a polarity-inversion consistency regularization to reinforce polarity-consistent representations during training. Experimental results on multiple SAT datasets demonstrate the effectiveness of the proposed approach.

CLFeb 24, 2022Code
KESA: A Knowledge Enhanced Approach For Sentiment Analysis

Qinghua Zhao, Shuai Ma, Shuo Ren

Though some recent works focus on injecting sentiment knowledge into pre-trained language models, they usually design mask and reconstruction tasks in the post-training phase. In this paper, we aim to benefit from sentiment knowledge in a lighter way. To achieve this goal, we study sentence-level sentiment analysis and, correspondingly, propose two sentiment-aware auxiliary tasks named sentiment word cloze and conditional sentiment prediction. The first task learns to select the correct sentiment words within the input, given the overall sentiment polarity as prior knowledge. On the contrary, the second task predicts the overall sentiment polarity given the sentiment polarity of the word as prior knowledge. In addition, two kinds of label combination methods are investigated to unify multiple types of labels in each task. We argue that more information can promote the models to learn more profound semantic representation. We implement it in a straightforward way to verify this hypothesis. The experimental results demonstrate that our approach consistently outperforms pre-trained models and is additive to existing knowledge-enhanced post-trained models. The code and data are released at https://github.com/lshowway/KESA.

HCFeb 4
Adaptive Prompt Elicitation for Text-to-Image Generation

Xinyi Wen, Lena Hegemann, Xiaofu Jin et al.

Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.

CVMar 25
Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval

Junkai Yang, Qirui Wang, Yaoqing Jin et al.

Retrieving partially relevant segments from untrimmed videos remains difficult due to two persistent challenges: the mismatch in information density between text and video segments, and limited attention mechanisms that overlook semantic focus and event correlations. We present KDC-Net, a Knowledge-Refined Dual Context-Aware Network that tackles these issues from both textual and visual perspectives. On the text side, a Hierarchical Semantic Aggregation module captures and adaptively fuses multi-scale phrase cues to enrich query semantics. On the video side, a Dynamic Temporal Attention mechanism employs relative positional encoding and adaptive temporal windows to highlight key events with local temporal coherence. Additionally, a dynamic CLIP-based distillation strategy, enhanced with temporal-continuity-aware refinement, ensures segment-aware and objective-aligned knowledge transfer. Experiments on PRVR benchmarks show that KDC-Net consistently outperforms state-of-the-art methods, especially under low moment-to-video ratios.

HCMar 19
Tracing Generative AI in Digital Art: A Longitudinal Study of Chinese Painters' Attitudes, Practices, and Identity Negotiation

Yibo Meng, Ruiqi Chen, Zhuoran Lu et al.

This study presents a five-year longitudinal mixed-methods study of 17 Chinese digital painters, examining how their attitudes and practices evolved in response to generative AI. Our findings reveal a trajectory from resistance and defensiveness, to pragmatic adoption, and ultimately to reflective reconstruction, shaped by strong peer pressures and shifting emotional experiences. Persistent concerns around copyright and creative labor highlight the ongoing negotiation of identity and values. This work contributes by offering rare longitudinal empirical data, advancing a theoretical lens of "identity and value negotiation," and providing design implications for future human-AI collaborative systems.

HCMar 25, 2024
Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

Shuai Ma, Qiaoyi Chen, Xinru Wang et al.

In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to promote human reflection and discussion on conflicting human-AI opinions in decision-making. Based on theories in human deliberation, this framework engages humans and AI in dimension-level opinion elicitation, deliberative discussion, and decision updates. To empower AI with deliberative capabilities, we designed Deliberative AI, which leverages large language models (LLMs) as a bridge between humans and domain-specific models to enable flexible conversational interactions and faithful information provision. An exploratory evaluation on a graduate admissions task shows that Deliberative AI outperforms conventional explainable AI (XAI) assistants in improving humans' appropriate reliance and task performance. Based on a mixed-methods analysis of participant behavior, perception, user experience, and open-ended feedback, we draw implications for future AI-assisted decision tool design.

CLDec 15, 2025
AIR: Post-training Data Selection for Reasoning via Attention Head Influence

Jinrui Liu, Jeff Wu, Xuanguang Pan et al.

LLMs achieve remarkable multi-step reasoning capabilities, yet effectively transferring these skills via post-training distillation remains challenging. Existing data selection methods, ranging from manual curation to heuristics based on length, entropy, or overall loss, fail to capture the causal importance of individual reasoning steps, limiting distillation efficiency. To address this, we propose Attention Influence for Reasoning (AIR), a principled, unsupervised and training-free framework that leverages mechanistic insights of the retrieval head to select high-value post-training data. AIR first identifies reasoning-critical attention heads of an off-the-shelf model, then constructs a weakened reference model with disabled head influence, and finally quantifies the resulting loss divergence as the Attention Influence Score. This score enables fine-grained assessment at both the step and sample levels, supporting step-level weighted fine-tuning and global sample selection. Experiments across multiple reasoning benchmarks show that AIR consistently improves reasoning accuracy, surpassing heuristic baselines and effectively isolating the most critical steps and samples. Our work establishes a mechanism-driven, data-efficient approach for reasoning distillation in LLMs.

LGNov 26, 2025
Dynamic Stratified Contrastive Learning with Upstream Augmentation for MILP Branching

Tongkai Lu, Shuai Ma, Chongyang Tao

Mixed Integer Linear Programming (MILP) is a fundamental class of NP-hard problems that has garnered significant attention from both academia and industry. The Branch-and-Bound (B\&B) method is the dominant approach for solving MILPs and the branching plays an important role in B\&B methods. Neural-based learning frameworks have recently been developed to enhance branching policies and the efficiency of solving MILPs. However, these methods still struggle with semantic variation across depths, the scarcity of upstream nodes, and the costly collection of strong branching samples. To address these issues, we propose \ours, a Dynamic \underline{\textbf{S}}tratified \underline{\textbf{C}}ontrastive Training Framework for \underline{\textbf{MILP}} Branching. It groups branch-and-bound nodes based on their feature distributions and trains a GCNN-based discriminative model to progressively separate nodes across groups, learning finer-grained node representations throughout the tree. To address data scarcity and imbalance at upstream nodes, we introduce an upstream-augmented MILP derivation procedure that generates both theoretically equivalent and perturbed instances. \ours~effectively models subtle semantic differences between nodes, significantly enhancing branching accuracy and solving efficiency, particularly for upstream nodes. Extensive experiments on standard MILP benchmarks demonstrate that our method enhances branching accuracy, reduces solving time, and generalizes effectively to unseen instances.

CLJan 8
EvolSQL: Structure-Aware Evolution for Scalable Text-to-SQL Data Synthesis

Xuanguang Pan, Chongyang Tao, Jiayuan Bai et al.

Training effective Text-to-SQL models remains challenging due to the scarcity of high-quality, diverse, and structurally complex datasets. Existing methods either rely on limited human-annotated corpora, or synthesize datasets directly by simply prompting LLMs without explicit control over SQL structures, often resulting in limited structural diversity and complexity. To address this, we introduce EvolSQL, a structure-aware data synthesis framework that evolves SQL queries from seed data into richer and more semantically diverse forms. EvolSQL starts with an exploratory Query-SQL expansion to broaden question diversity and improve schema coverage, and then applies an adaptive directional evolution strategy using six atomic transformation operators derived from the SQL Abstract Syntax Tree to progressively increase query complexity across relational, predicate, aggregation, and nesting dimensions. An execution-grounded SQL refinement module and schema-aware deduplication further ensure the creation of high-quality, structurally diverse mapping pairs. Experimental results show that a 7B model fine-tuned on our data outperforms one trained on the much larger SynSQL dataset using only 1/18 of the data.

CLJan 13, 2024
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges

Zhen Li, Xiaohan Xu, Tao Shen et al.

In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This paper aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to understand and compare these methods. Our detailed exploration includes critically assessing various LLM-based methodologies, as well as comparing their strengths and limitations in evaluating NLG outputs. By discussing unresolved challenges, including bias, robustness, domain-specificity, and unified evaluation, this paper seeks to offer insights to researchers and advocate for fairer and more advanced NLG evaluation techniques.

CLDec 17, 2024
LLMs are Also Effective Embedding Models: An In-depth Overview

Chongyang Tao, Tao Shen, Shen Gao et al.

Large language models (LLMs) have revolutionized natural language processing by achieving state-of-the-art performance across various tasks. Recently, their effectiveness as embedding models has gained attention, marking a paradigm shift from traditional encoder-only models like ELMo and BERT to decoder-only, large-scale LLMs such as GPT, LLaMA, and Mistral. This survey provides an in-depth overview of this transition, beginning with foundational techniques before the LLM era, followed by LLM-based embedding models through two main strategies to derive embeddings from LLMs. 1) Direct prompting: We mainly discuss the prompt designs and the underlying rationale for deriving competitive embeddings. 2) Data-centric tuning: We cover extensive aspects that affect tuning an embedding model, including model architecture, training objectives, data constructions, etc. Upon the above, we also cover advanced methods for producing embeddings from longer texts, multilingual, code, cross-modal data, as well as reasoning-aware and other domain-specific scenarios. Furthermore, we discuss factors affecting choices of embedding models, such as performance/efficiency comparisons, dense vs sparse embeddings, pooling strategies, and scaling law. Lastly, the survey highlights the limitations and challenges in adapting LLMs for embeddings, including cross-task embedding quality, trade-offs between efficiency and accuracy, low-resource, long-context, data bias, robustness, etc. This survey serves as a valuable resource for researchers and practitioners by synthesizing current advancements, highlighting key challenges, and offering a comprehensive framework for future work aimed at enhancing the effectiveness and efficiency of LLMs as embedding models.

CRJan 9, 2025
TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning

Runhua Xu, Bo Li, Chao Li et al.

Federated learning is a computing paradigm that enhances privacy by enabling multiple parties to collaboratively train a machine learning model without revealing personal data. However, current research indicates that traditional federated learning platforms are unable to ensure privacy due to privacy leaks caused by the interchange of gradients. To achieve privacy-preserving federated learning, integrating secure aggregation mechanisms is essential. Unfortunately, existing solutions are vulnerable to recently demonstrated inference attacks such as the disaggregation attack. This paper proposes TAPFed, an approach for achieving privacy-preserving federated learning in the context of multiple decentralized aggregators with malicious actors. TAPFed uses a proposed threshold functional encryption scheme and allows for a certain number of malicious aggregators while maintaining security and privacy. We provide formal security and privacy analyses of TAPFed and compare it to various baselines through experimental evaluation. Our results show that TAPFed offers equivalent performance in terms of model quality compared to state-of-the-art approaches while reducing transmission overhead by 29%-45% across different model training scenarios. Most importantly, TAPFed can defend against recently demonstrated inference attacks caused by curious aggregators, which the majority of existing approaches are susceptible to.

CVDec 22, 2023
Voila-A: Aligning Vision-Language Models with User's Gaze Attention

Kun Yan, Lei Ji, Zeyu Wang et al.

In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs). However, existing VLMs face challenges in handling real-world applications with complex scenes and multiple objects, as well as aligning their focus with the diverse attention patterns of human users. In this paper, we introduce gaze information, feasibly collected by AR or VR devices, as a proxy for human attention to guide VLMs and propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications. First, we collect hundreds of minutes of gaze data to demonstrate that we can mimic human gaze modalities using localized narratives. We then design an automatic data annotation pipeline utilizing GPT-4 to generate the VOILA-COCO dataset. Additionally, we innovate the Voila Perceiver modules to integrate gaze information into VLMs while preserving their pretrained knowledge. We evaluate Voila-A using a hold-out validation set and a newly collected VOILA-GAZE Testset, which features real-life scenarios captured with a gaze-tracking device. Our experimental results demonstrate that Voila-A significantly outperforms several baseline models. By aligning model attention with human gaze patterns, Voila-A paves the way for more intuitive, user-centric VLMs and fosters engaging human-AI interaction across a wide range of applications.

HCMar 4, 2024
Beyond Recommender: An Exploratory Study of the Effects of Different AI Roles in AI-Assisted Decision Making

Shuai Ma, Chenyi Zhang, Xinru Wang et al.

Artificial Intelligence (AI) is increasingly employed in various decision-making tasks, typically as a Recommender, providing recommendations that the AI deems correct. However, recent studies suggest this may diminish human analytical thinking and lead to humans' inappropriate reliance on AI, impairing the synergy in human-AI teams. In contrast, human advisors in group decision-making perform various roles, such as analyzing alternative options or criticizing decision-makers to encourage their critical thinking. This diversity of roles has not yet been empirically explored in AI assistance. In this paper, we examine three AI roles: Recommender, Analyzer, and Devil's Advocate, and evaluate their effects across two AI performance levels. Our results show each role's distinct strengths and limitations in task performance, reliance appropriateness, and user experience. Notably, the Recommender role is not always the most effective, especially if the AI performance level is low, the Analyzer role may be preferable. These insights offer valuable implications for designing AI assistants with adaptive functional roles according to different situations.

CVJan 28, 2025
Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models

Huijie Liu, Jingyun Wang, Shuai Ma et al.

Motion customization aims to adapt the diffusion model (DM) to generate videos with the motion specified by a set of video clips with the same motion concept. To realize this goal, the adaptation of DM should be possible to model the specified motion concept, without compromising the ability to generate diverse appearances. Thus, the key to solving this problem lies in how to separate the motion concept from the appearance in the adaptation process of DM. Typical previous works explore different ways to represent and insert a motion concept into large-scale pretrained text-to-video diffusion models, e.g., learning a motion LoRA, using latent noise residuals, etc. While those methods can encode the motion concept, they also inevitably encode the appearance in the reference videos, resulting in weakened appearance generation capability. In this paper, we follow the typical way to learn a motion LoRA to encode the motion concept, but propose two novel strategies to enhance motion-appearance separation, including temporal attention purification (TAP) and appearance highway (AH). Specifically, we assume that in the temporal attention module, the pretrained Value embeddings are sufficient to serve as basic components needed by producing a new motion. Thus, in TAP, we choose only to reshape the temporal attention with motion LoRAs so that Value embeddings can be reorganized to produce a new motion. Further, in AH, we alter the starting point of each skip connection in U-Net from the output of each temporal attention module to the output of each spatial attention module. Extensive experiments demonstrate that compared to previous works, our method can generate videos with appearance more aligned with the text descriptions and motion more consistent with the reference videos.

CVFeb 17, 2025
Leveraging Labelled Data Knowledge: A Cooperative Rectification Learning Network for Semi-supervised 3D Medical Image Segmentation

Yanyan Wang, Kechen Song, Yuyuan Liu et al.

Semi-supervised 3D medical image segmentation aims to achieve accurate segmentation using few labelled data and numerous unlabelled data. The main challenge in the design of semi-supervised learning methods consists in the effective use of the unlabelled data for training. A promising solution consists of ensuring consistent predictions across different views of the data, where the efficacy of this strategy depends on the accuracy of the pseudo-labels generated by the model for this consistency learning strategy. In this paper, we introduce a new methodology to produce high-quality pseudo-labels for a consistency learning strategy to address semi-supervised 3D medical image segmentation. The methodology has three important contributions. The first contribution is the Cooperative Rectification Learning Network (CRLN) that learns multiple prototypes per class to be used as external knowledge priors to adaptively rectify pseudo-labels at the voxel level. The second contribution consists of the Dynamic Interaction Module (DIM) to facilitate pairwise and cross-class interactions between prototypes and multi-resolution image features, enabling the production of accurate voxel-level clues for pseudo-label rectification. The third contribution is the Cooperative Positive Supervision (CPS), which optimises uncertain representations to align with unassertive representations of their class distributions, improving the model's accuracy in classifying uncertain regions. Extensive experiments on three public 3D medical segmentation datasets demonstrate the effectiveness and superiority of our semi-supervised learning method.

LGApr 29, 2025
Modeling and Performance Analysis for Semantic Communications Based on Empirical Results

Shuai Ma, Bin Shen, Chuanhui Zhang et al.

Due to the black-box characteristics of deep learning based semantic encoders and decoders, finding a tractable method for the performance analysis of semantic communications is a challenging problem. In this paper, we propose an Alpha-Beta-Gamma (ABG) formula to model the relationship between the end-to-end measurement and SNR, which can be applied for both image reconstruction tasks and inference tasks. Specifically, for image reconstruction tasks, the proposed ABG formula can well fit the commonly used DL networks, such as SCUNet, and Vision Transformer, for semantic encoding with the multi scale-structural similarity index measure (MS-SSIM) measurement. Furthermore, we find that the upper bound of the MS-SSIM depends on the number of quantized output bits of semantic encoders, and we also propose a closed-form expression to fit the relationship between the MS-SSIM and quantized output bits. To the best of our knowledge, this is the first theoretical expression between end-to-end performance metrics and SNR for semantic communications. Based on the proposed ABG formula, we investigate an adaptive power control scheme for semantic communications over random fading channels, which can effectively guarantee quality of service (QoS) for semantic communications, and then design the optimal power allocation scheme to maximize the energy efficiency of the semantic communication system. Furthermore, by exploiting the bisection algorithm, we develop the power allocation scheme to maximize the minimum QoS of multiple users for OFDMA downlink semantic communication Extensive simulations verify the effectiveness and superiority of the proposed ABG formula and power allocation schemes.

LGNov 26, 2025
SetAD: Semi-Supervised Anomaly Learning in Contextual Sets

Jianling Gao, Chongyang Tao, Xuelian Lin et al.

Semi-supervised anomaly detection (AD) has shown great promise by effectively leveraging limited labeled data. However, existing methods are typically structured around scoring individual points or simple pairs. Such {point- or pair-centric} view not only overlooks the contextual nature of anomalies, which are defined by their deviation from a collective group, but also fails to exploit the rich supervisory signals that can be generated from the combinatorial composition of sets. Consequently, such models struggle to exploit the high-order interactions within the data, which are critical for learning discriminative representations. To address these limitations, we propose SetAD, a novel framework that reframes semi-supervised AD as a Set-level Anomaly Detection task. SetAD employs an attention-based set encoder trained via a graded learning objective, where the model learns to quantify the degree of anomalousness within an entire set. This approach directly models the complex group-level interactions that define anomalies. Furthermore, to enhance robustness and score calibration, we propose a context-calibrated anomaly scoring mechanism, which assesses a point's anomaly score by aggregating its normalized deviations from peer behavior across multiple, diverse contextual sets. Extensive experiments on 10 real-world datasets demonstrate that SetAD significantly outperforms state-of-the-art models. Notably, we show that our model's performance consistently improves with increasing set size, providing strong empirical support for the set-based formulation of anomaly detection.

LGOct 23, 2025
Empowering Targeted Neighborhood Search via Hyper Tour for Large-Scale TSP

Tongkai Lu, Shuai Ma, Chongyang Tao

Traveling Salesman Problem (TSP) is a classic NP-hard problem that has garnered significant attention from both academia and industry. While neural-based methods have shown promise for solving TSPs, they still face challenges in scaling to larger instances, particularly in memory constraints associated with global heatmaps, edge weights, or access matrices, as well as in generating high-quality initial solutions and insufficient global guidance for efficiently navigating vast search spaces. To address these challenges, we propose a Hyper Tour Guided Neighborhood Search (HyperNS) method for large-scale TSP instances. Inspired by the ``clustering first, route second" strategy, our approach initially divides the TSP instance into clusters using a sparse heatmap graph and abstracts them as supernodes, followed by the generation of a hyper tour to guide both the initialization and optimization processes. This method reduces the search space by focusing on edges relevant to the hyper tour, leading to more efficient and effective optimization. Experimental results on both synthetic and real-world datasets demonstrate that our approach outperforms existing neural-based methods, particularly in handling larger-scale instances, offering a significant reduction in the gap to the optimal solution.

AIOct 17, 2025
JudgeSQL: Reasoning over SQL Candidates with Weighted Consensus Tournament

Jiayuan Bai, Xuan-guang Pan, Chongyang Tao et al.

Text-to-SQL is a pivotal task that bridges natural language understanding and structured data access, yet it remains fundamentally challenging due to semantic ambiguity and complex compositional reasoning. While large language models (LLMs) have greatly advanced SQL generation though prompting, supervised finetuning and reinforced tuning, the shift toward test-time scaling exposes a new bottleneck: selecting the correct query from a diverse candidate pool. Existing selection approaches, such as self-consistency or best-of-$N$ decoding, provide only shallow signals, making them prone to inconsistent scoring, fragile reasoning chains, and a failure to capture fine-grained semantic distinctions between closely related SQL candidates. To this end, we introduce JudgeSQL, a principled framework that redefines SQL candidate selection through structured reasoning and weighted consensus tournament mechanism. JudgeSQL develops a reasoning-based SQL judge model that distills reasoning traces with reinforcement learning guided by verifiable rewards, enabling accurate and interpretable judgments. Building on this, a weighted consensus tournament integrates explicit reasoning preferences with implicit generator confidence, yielding selections that are both more reliable and more efficient. Extensive experiments on the BIRD benchmark demonstrate that JudgeSQL exhibits superior SQL judgment capabilities and good cross-scale generalization and robustness to generator capacity.

AIAug 31, 2025
Sharpe Ratio Optimization in Markov Decision Processes

Shuai Ma, Guangwu Liu, Li Xia

Sharpe ratio (also known as reward-to-variability ratio) is a widely-used metric in finance, which measures the additional return at the cost of per unit of increased risk (standard deviation of return). However, the optimization of Sharpe ratio in Markov decision processes (MDPs) is challenging, because there exist two difficulties hindering the application of dynamic programming. One is that dynamic programming does not work for fractional objectives, and the other is that dynamic programming is invalid for risk metrics. In this paper, we study the Sharpe ratio optimization in infinite-horizon MDPs, considering both the long-run average and discounted settings. We address the first challenge with the Dinkelbachs transform, which converts the Sharpe ratio objective to a mean-squared-variance (M2V) objective. It is shown that the M2V optimization and the original Sharpe ratio optimization share the same optimal policy when the risk-sensitive parameter is equal to the optimal Sharpe ratio. For the second challenge, we develop an iterative algorithm to solve the M2V optimization which is similar to a mean-variance optimization in MDPs. We iteratively solve the M2V problem and obtain the associated Sharpe ratio that is used to update the risk-sensitive parameter in the next iteration of M2V problems. We show that such a sequence of Sharpe ratios derived is monotonically increasing and converges to the optimal Sharpe ratio. For both average and discounted MDP settings, we develop a policy iteration procedure and prove its convergence to the optimum. Numerical experiments are conducted for validation. To the best of our knowledge, our approach is the first that solves the Sharpe ratio optimization in MDPs with dynamic programming type algorithms. We believe that the proposed algorithm can shed light on solving MDPs with other fractional objectives.

HCJan 26, 2024
Charting the Future of AI in Project-Based Learning: A Co-Design Exploration with Students

Chengbo Zheng, Kangyu Yuan, Bingcan Guo et al.

The increasing use of Artificial Intelligence (AI) by students in learning presents new challenges for assessing their learning outcomes in project-based learning (PBL). This paper introduces a co-design study to explore the potential of students' AI usage data as a novel material for PBL assessment. We conducted workshops with 18 college students, encouraging them to speculate an alternative world where they could freely employ AI in PBL while needing to report this process to assess their skills and contributions. Our workshops yielded various scenarios of students' use of AI in PBL and ways of analyzing these uses grounded by students' vision of education goal transformation. We also found students with different attitudes toward AI exhibited distinct preferences in how to analyze and understand the use of AI. Based on these findings, we discuss future research opportunities on student-AI interactions and understanding AI-enhanced learning.

CLFeb 28, 2022
TraceNet: Tracing and Locating the Key Elements in Sentiment Analysis

Qinghua Zhao, Shuai Ma

In this paper, we study sentiment analysis task where the outcomes are mainly contributed by a few key elements of the inputs. Motivated by the two-streams hypothesis, we propose a neural architecture, named TraceNet, to address this type of task. It not only learns discriminative representations for the target task via its encoders, but also traces key elements at the same time via its locators. In TraceNet, both encoders and locators are organized in a layer-wise manner, and a smoothness regularization is employed between adjacent encoder-locator combinations. Moreover, a sparsity constraints are enforced on locators for tracing purposes and items are proactively masked according to the item weights output by locators.A major advantage of TraceNet is that the outcomes are easier to understand, since the most responsible parts of inputs are identified. Also, under the guidance of locators, it is more robust to attacks due to its focus on key elements and the proactive masking training strategy. Experimental results show its effectiveness for sentiment classification. Moreover, we provide several case studies to demonstrate its robustness and interpretability.

OCJan 15, 2022
A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

Shuai Ma, Xiaoteng Ma, Li Xia

This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during the whole process, and future deviations are discounted to their present values. This discounted mean-variance optimization yields a reward function dependent on a discounted mean, and this dependency renders traditional dynamic programming methods inapplicable since it suppresses a crucial property -- time consistency. To deal with this unorthodox problem, we introduce a pseudo mean to transform the untreatable MDP to a standard one with a redefined reward function in standard form and derive a discounted mean-variance performance difference formula. With the pseudo mean, we propose a unified algorithm framework with a bilevel optimization structure for the discounted mean-variance optimization. The framework unifies a variety of algorithms for several variance-related problems including, but not limited to, risk-averse variance and mean-variance optimizations in discounted and average MDPs. Furthermore, the convergence analyses missing from the literature can be complemented with the proposed framework as well. Taking the value iteration as an example, we develop a discounted mean-variance value iteration algorithm and prove its convergence to a local optimum with the aid of a Bellman local-optimality equation. Finally, we conduct a numerical experiment on portfolio management to validate the proposed algorithm.

IRFeb 12, 2021
SceneRec: Scene-Based Graph Neural Networks for Recommender Systems

Gang Wang, Ziyi Guo, Xiang Li et al.

Collaborative filtering has been largely used to advance modern recommender systems to predict user preference. A key component in collaborative filtering is representation learning, which aims to project users and items into a low dimensional space to capture collaborative signals. However, the scene information, which has effectively guided many recommendation tasks, is rarely considered in existing collaborative filtering methods. To bridge this gap, we focus on scene-based collaborative recommendation and propose a novel representation model SceneRec. SceneRec formally defines a scene as a set of pre-defined item categories that occur simultaneously in real-life situations and creatively designs an item-category-scene hierarchical structure to build a scene-based graph. In the scene-based graph, we adopt graph neural networks to learn scene-specific representation on each item node, which is further aggregated with latent representation learned from collaborative interactions to make recommendations. We perform extensive experiments on real-world E-commerce datasets and the results demonstrate the effectiveness of the proposed method.

HCJan 4, 2021
CASS: Towards Building a Social-Support Chatbot for Online Health Community

Liuping Wang, Dakuo Wang, Feng Tian et al.

Chatbots systems, despite their popularity in today's HCI and CSCW research, fall short for one of the two reasons: 1) many of the systems use a rule-based dialog flow, thus they can only respond to a limited number of pre-defined inputs with pre-scripted responses; or 2) they are designed with a focus on single-user scenarios, thus it is unclear how these systems may affect other users or the community. In this paper, we develop a generalizable chatbot architecture (CASS) to provide social support for community members in an online health community. The CASS architecture is based on advanced neural network algorithms, thus it can handle new inputs from users and generate a variety of responses to them. CASS is also generalizable as it can be easily migrate to other online communities. With a follow-up field experiment, CASS is proven useful in supporting individual members who seek emotional support. Our work also contributes to fill the research gap on how a chatbot may influence the whole community's engagement.

SESep 22, 2020
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Shuo Ren, Daya Guo, Shuai Lu et al.

Evaluation metrics play a vital role in the growth of an area as it defines the standard of distinguishing between good and bad models. In the area of code synthesis, the commonly used evaluation metric is BLEU or perfect accuracy, but they are not suitable enough to evaluate codes, because BLEU is originally designed to evaluate the natural language, neglecting important syntactic and semantic features of codes, and perfect accuracy is too strict thus it underestimates different outputs with the same semantic logic. To remedy this, we introduce a new automatic evaluation metric, dubbed CodeBLEU. It absorbs the strength of BLEU in the n-gram match and further injects code syntax via abstract syntax trees (AST) and code semantics via data-flow. We conduct experiments by evaluating the correlation coefficient between CodeBLEU and quality scores assigned by the programmers on three code synthesis tasks, i.e., text-to-code, code translation, and code refinement. Experimental results show that our proposed CodeBLEU can achieve a better correlation with programmer assigned scores compared with BLEU and accuracy.

CLAug 31, 2019
Explicit Cross-lingual Pre-training for Unsupervised Machine Translation

Shuo Ren, Yu Wu, Shujie Liu et al.

Pre-training has proven to be effective in unsupervised machine translation due to its ability to model deep context information in cross-lingual scenarios. However, the cross-lingual information obtained from shared BPE spaces is inexplicit and limited. In this paper, we propose a novel cross-lingual pre-training method for unsupervised machine translation by incorporating explicit cross-lingual training signals. Specifically, we first calculate cross-lingual n-gram embeddings and infer an n-gram translation table from them. With those n-gram translation pairs, we propose a new pre-training model called Cross-lingual Masked Language Model (CMLM), which randomly chooses source n-grams in the input text stream and predicts their translation candidates at each time step. Experiments show that our method can incorporate beneficial cross-lingual information into pre-trained models. Taking pre-trained CMLM models as the encoder and decoder, we significantly improve the performance of unsupervised machine translation.

CYJul 9, 2019
The Mass, Fake News, and Cognition Security

Bin Guo, Yasan Ding, Yueheng Sun et al.

The wide spread of fake news in social networks is posing threats to social stability, economic development and political democracy etc. Numerous studies have explored the effective detection approaches of online fake news, while few works study the intrinsic propagation and cognition mechanisms of fake news. Since the development of cognitive science paves a promising way for the prevention of fake news, we present a new research area called Cognition Security (CogSec), which studies the potential impacts of fake news to human cognition, ranging from misperception, untrusted knowledge acquisition, targeted opinion/attitude formation, to biased decision making, and investigates the effective ways for fake news debunking. CogSec is a multidisciplinary research field that leverages knowledge from social science, psychology, cognition science, neuroscience, AI and computer science. We first propose related definitions to characterize CogSec and review the literature history. We further investigate the key research challenges and techniques of CogSec, including human-content cognition mechanism, social influence and opinion diffusion, fake news detection and malicious bot detection. Finally, we summarize the open issues and future research directions, such as early detection of fake news, explainable fake news debunking, social contagion and diffusion models of fake news, and so on.

AIJul 9, 2019
A Scheme for Dynamic Risk-Sensitive Sequential Decision Making

Shuai Ma, Jia Yuan Yu, Ahmet Satir

We present a scheme for sequential decision making with a risk-sensitive objective and constraints in a dynamic environment. A neural network is trained as an approximator of the mapping from parameter space to space of risk and policy with risk-sensitive constraints. For a given risk-sensitive problem, in which the objective and constraints are, or can be estimated by, functions of the mean and variance of return, we generate a synthetic dataset as training data. Parameters defining a targeted process might be dynamic, i.e., they might vary over time, so we sample them within specified intervals to deal with these dynamics. We show that: i). Most risk measures can be estimated using return variance; ii). By virtue of the state-augmentation transformation, practical problems modeled by Markov decision processes with stochastic rewards can be solved in a risk-sensitive scenario; and iii). The proposed scheme is validated by a numerical experiment.

LGJul 9, 2019
Variance-Based Risk Estimations in Markov Processes via Transformation with State Lumping

Shuai Ma, Jia Yuan Yu

Variance plays a crucial role in risk-sensitive reinforcement learning, and most risk measures can be analyzed via variance. In this paper, we consider two law-invariant risks as examples: mean-variance risk and exponential utility risk. With the aid of the state-augmentation transformation (SAT), we show that, the two risks can be estimated in Markov decision processes (MDPs) with a stochastic transition-based reward and a randomized policy. To relieve the enlarged state space, a novel definition of isotopic states is proposed for state lumping, considering the special structure of the transformed transition probability. In the numerical experiment, we illustrate state lumping in the SAT, errors from a naive reward simplification, and the validity of the SAT for the two risk estimations.

SPMar 13, 2019
Signal Demodulation with Machine Learning Methods for Physical Layer Visible Light Communications: Prototype Platform, Open Dataset and Algorithms

Shuai Ma, Jiahui Dai, Songtao Lu et al.

In this paper, we investigate the design and implementation of machine learning (ML) based demodulation methods in the physical layer of visible light communication (VLC) systems. We build a flexible hardware prototype of an end-to-end VLC system, from which the received signals are collected as the real data. The dataset is available online, which contains eight types of modulated signals. Then, we propose three ML demodulators based on convolutional neural network (CNN), deep belief network (DBN), and adaptive boosting (AdaBoost), respectively. Specifically, the CNN based demodulator converts the modulated signals to images and recognizes the signals by the image classification. The proposed DBN based demodulator contains three restricted Boltzmann machines (RBMs) to extract the modulation features. The AdaBoost method includes a strong classifier that is constructed by the weak classifiers with the k-nearest neighbor (KNN) algorithm. These three demodulators are trained and tested by our online open dataset. Experimental results show that the demodulation accuracy of the three data-driven demodulators drops as the transmission distance increases. A higher modulation order negatively influences the accuracy for a given transmission distance. Among the three ML methods, the AdaBoost modulator achieves the best performance.