Kexin Tang

CL
h-index13
4papers
65citations
Novelty55%
AI Score48

4 Papers

39.8MMMar 18Code
Beyond Forced Modality Balance: Intrinsic Information Budgets for Multimodal Learning

Zechang Xiong, Da Li, Kexin Tang et al.

Multimodal models often converge to a dominant-modality solution, in which a stronger, faster-converging modality overshadows weaker ones. This modality imbalance causes suboptimal performance. Existing methods attempt to balance different modalities by reweighting gradients or losses. However, they overlook the fact that each modality has finite information capacity. In this work, we propose IIBalance, a multimodal learning framework that aligns the modality contributions with Intrinsic Information Budgets (IIB). We propose a task-grounded estimator of each modality's IIB, transforming its capacity into a global prior over modality contributions. Anchored by the highest-budget modality, we design a prototype-based relative alignment mechanism that corrects semantic drift only when weaker modalities deviate from their budgeted potential, rather than forcing imitation. During inference, we propose a probabilistic gating module that integrates the global budgets with sample-level uncertainty to generate calibrated fusion weights. Experiments on three representative benchmarks demonstrate that IIBalance consistently outperforms state-of-the-art balancing methods and achieves better utilization of complementary modality cues. Our code is available at: https://github.com/XiongZechang/IIBalance.

CLJan 24, 2024Code
SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning

Guoxin Chen, Kexin Tang, Chao Yang et al.

Elucidating the reasoning process with structured explanations from question to answer is crucial, as it significantly enhances the interpretability, traceability, and trustworthiness of question-answering (QA) systems. However, structured explanations demand models to perform intricately structured reasoning, which poses great challenges. Most existing methods focus on single-step reasoning through supervised learning, ignoring logical dependencies between steps. Moreover, existing reinforcement learning (RL) based methods overlook the structured relationships, underutilizing the potential of RL in structured reasoning. In this paper, we propose SEER, a novel method that maximizes a structure-based return to facilitate structured reasoning and explanation. Our proposed structure-based return precisely describes the hierarchical and branching structure inherent in structured reasoning, effectively capturing the intricate relationships between different reasoning steps. In addition, we introduce a fine-grained reward function to meticulously delineate diverse reasoning steps. Extensive experiments show that SEER significantly outperforms state-of-the-art methods, achieving an absolute improvement of 6.9% over RL-based methods on EntailmentBank, a 4.4% average improvement on STREET benchmark, and exhibiting outstanding efficiency and cross-dataset generalization performance. Our code is available at https://github.com/Chen-GX/SEER.

CLApr 3, 2024
Measuring Social Norms of Large Language Models

Ye Yuan, Kexin Tang, Jianhao Shen et al.

We present a new challenge to examine whether large language models understand social norms. In contrast to existing datasets, our dataset requires a fundamental understanding of social norms to solve. Our dataset features the largest set of social norm skills, consisting of 402 skills and 12,383 questions covering a wide set of social norms ranging from opinions and arguments to culture and laws. We design our dataset according to the K-12 curriculum. This enables the direct comparison of the social understanding of large language models to humans, more specifically, elementary students. While prior work generates nearly random accuracy on our benchmark, recent large language models such as GPT3.5-Turbo and LLaMA2-Chat are able to improve the performance significantly, only slightly below human performance. We then propose a multi-agent framework based on large language models to improve the models' ability to understand social norms. This method further improves large language models to be on par with humans. Given the increasing adoption of large language models in real-world applications, our finding is particularly important and presents a unique direction for future improvements.

MMFeb 2, 2019
Multiuser Video Streaming Rate Adaptation: A Physical Layer Resource-Aware Deep Reinforcement Learning Approach

Kexin Tang, Nuowen Kan, Junni Zou et al.

We consider a multi-user video streaming service optimization problem over a time-varying and mutually interfering multi-cell wireless network. The key research challenge is to appropriately adapt each user's video streaming rate according to the radio frequency environment (e.g., channel fading and interference level) and service demands (e.g., play request), so that the users' long-term experience for watching videos can be optimized. To address the above challenge, we propose a novel two-level cross-layer optimization framework for multiuser adaptive video streaming over wireless networks. The key idea is to jointly design the physical layer optimization-based beamforming scheme (performed at the base stations) and the application layer Deep Reinforcement Learning (DRL)-based scheme (performed at the user terminals), so that a highly complex multi-user, cross-layer, time-varying video streaming problem can be decomposed into relatively simple problems and solved effectively. Our strategy represents a significant departure for the existing schemes where either short-term user experience optimization is considered, or only single-user point-to-point long-term optimization is considered. Extensive simulations based on real-data sets show that the proposed cross-layer design is effective and promising.