92.5SDMay 27
ChronosAudio: A Comprehensive Long-Audio Benchmark for Evaluating Audio-Large Language ModelsKaiwen Luo, Liang Lin, Yibo Zhang et al.
Although Audio Large Language Models (ALLMs) have witnessed substantial advancements, their long audio understanding capabilities remain unexplored. A plethora of benchmarks have been proposed for general audio tasks, they predominantly focus on short-form clips, leaving without a consensus on evaluating ALLMs over extended durations. This paper proposes ChronosAudio, the first multi-task benchmark tailored for long-audio understanding in ALLMs. It encompasses six major task categories and comprises 36,000 test instances totaling over 200 hours audio, stratified into short, middle, and long-form categories to comprehensively evaluate length generalization. Extensive experiments on 16 state-of-the-art models using ChronosAudio yield three critical findings: 1.Precipitous Long-Context Collapse: ALLMs exhibit a severe inability to sustain performance, with the transition from short to long contexts triggering a staggering performance degradation of over 90% in specific tasks. 2.Structural Attention Dilution: Performance degradation stems from a fundamental failure in maintaining temporal locality; attention mechanisms suffer from significant diffusion in later sequences. 3.Restorative Ceiling of Mitigation: Current strategies only offer 50% recovery. These findings reveal significant challenges in long-audio, underscoring the urgent need for approaches to achieve robust, document-level audio reasoning.
CVNov 22, 2022
PointCA: Evaluating the Robustness of 3D Point Cloud Completion Models Against Adversarial ExamplesShengshan Hu, Junwei Zhang, Wei Liu et al.
Point cloud completion, as the upstream procedure of 3D recognition and segmentation, has become an essential part of many tasks such as navigation and scene understanding. While various point cloud completion models have demonstrated their powerful capabilities, their robustness against adversarial attacks, which have been proven to be fatally malicious towards deep neural networks, remains unknown. In addition, existing attack approaches towards point cloud classifiers cannot be applied to the completion models due to different output forms and attack purposes. In order to evaluate the robustness of the completion models, we propose PointCA, the first adversarial attack against 3D point cloud completion models. PointCA can generate adversarial point clouds that maintain high similarity with the original ones, while being completed as another object with totally different semantic information. Specifically, we minimize the representation discrepancy between the adversarial example and the target point set to jointly explore the adversarial point clouds in the geometry space and the feature space. Furthermore, to launch a stealthier attack, we innovatively employ the neighbourhood density information to tailor the perturbation constraint, leading to geometry-aware and distribution-adaptive modifications for each point. Extensive experiments against different premier point cloud completion networks show that PointCA can cause a performance degradation from 77.9% to 16.7%, with the structure chamfer distance kept below 0.01. We conclude that existing completion models are severely vulnerable to adversarial examples, and state-of-the-art defenses for point cloud classification will be partially invalid when applied to incomplete and uneven point cloud data.
75.3CLApr 7
BaseCal: Unsupervised Confidence Calibration via Base Model SignalsHexiang Tan, Wanli Yang, Junwei Zhang et al.
Reliable confidence is essential for trusting the outputs of LLMs, yet widely deployed post-trained LLMs (PoLLMs) typically compromise this trust with severe overconfidence. In contrast, we observe that their corresponding base LLMs often remain well-calibrated. This naturally motivates us to calibrate PoLLM confidence using the base LLM as a reference. This work proposes two ways to achieve this. A straightforward solution, BaseCal-ReEval, evaluates PoLLM's responses by feeding them into the base LLM to get average probabilities as confidence. While effective, this approach introduces additional inference overhead. To address this, we propose BaseCal-Proj, which trains a lightweight projection to map the final-layer hidden states of PoLLMs back to those of their base LLMs. These projected states are then processed by the base LLM's output layer to derive base-calibrated confidence for PoLLM's responses. Notably, BaseCal is an unsupervised, plug-and-play solution that operates without human labels or LLM modifications. Experiments across five datasets and three LLM families demonstrate the effectiveness of BaseCal, reducing Expected Calibration Error (ECE) by an average of 42.90\% compared to the best unsupervised baselines.
87.7CLMay 8
Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMsWanli Yang, Hongyu Zang, Junwei Zhang et al.
Reinforcement learning (RL) has achieved remarkable success in LLM reasoning, but whether it can also improve direct recall of parametric knowledge remains an open question. We study this question in a controlled zero-shot, one-hop, closed-book QA setting with no chain-of-thought, training only on binary correctness rewards and applying fact-level train-test deduplication to ensure gains reflect improved recall rather than reasoning or memorization. Across three model families and multiple factual QA benchmarks, RL yields ~27% average relative gains, surpassing both training- and inference-time baselines alike. Mechanistically, RL primarily redistributes probability mass over existing knowledge rather than acquiring new facts, moving correct answers from the low-probability tail into reliable greedy generations. Our data-attribution study reveals that the hardest examples are the most informative: those whose answers never appear in 128 pre-RL samples (only ~18% of training data) drive ~83% of the gain, since rare correct rollouts still emerge during training and get reinforced. Together, these findings broaden the role of RL beyond reasoning, repositioning it as a tool for unlocking rather than acquiring latent parametric knowledge.
LGJan 30, 2024
Accelerated Cloud for Artificial Intelligence (ACAI)Dachi Chen, Weitian Ding, Chen Liang et al.
Training an effective Machine learning (ML) model is an iterative process that requires effort in multiple dimensions. Vertically, a single pipeline typically includes an initial ETL (Extract, Transform, Load) of raw datasets, a model training stage, and an evaluation stage where the practitioners obtain statistics of the model performance. Horizontally, many such pipelines may be required to find the best model within a search space of model configurations. Many practitioners resort to maintaining logs manually and writing simple glue code to automate the workflow. However, carrying out this process on the cloud is not a trivial task in terms of resource provisioning, data management, and bookkeeping of job histories to make sure the results are reproducible. We propose an end-to-end cloud-based machine learning platform, Accelerated Cloud for AI (ACAI), to help improve the productivity of ML practitioners. ACAI achieves this goal by enabling cloud-based storage of indexed, labeled, and searchable data, as well as automatic resource provisioning, job scheduling, and experiment tracking. Specifically, ACAI provides practitioners (1) a data lake for storing versioned datasets and their corresponding metadata, and (2) an execution engine for executing ML jobs on the cloud with automatic resource provisioning (auto-provision), logging and provenance tracking. To evaluate ACAI, we test the efficacy of our auto-provisioner on the MNIST handwritten digit classification task, and we study the usability of our system using experiments and interviews. We show that our auto-provisioner produces a 1.7x speed-up and 39% cost reduction, and our system reduces experiment time for ML scientists by 20% on typical ML use cases.
CLSep 15, 2025
A Dynamic Knowledge Update-Driven Model with Large Language Models for Fake News DetectionDi Jin, Jun Yang, Xiaobao Wang et al.
As the Internet and social media evolve rapidly, distinguishing credible news from a vast amount of complex information poses a significant challenge. Due to the suddenness and instability of news events, the authenticity labels of news can potentially shift as events develop, making it crucial for fake news detection to obtain the latest event updates. Existing methods employ retrieval-augmented generation to fill knowledge gaps, but they suffer from issues such as insufficient credibility of retrieved content and interference from noisy information. We propose a dynamic knowledge update-driven model for fake news detection (DYNAMO), which leverages knowledge graphs to achieve continuous updating of new knowledge and integrates with large language models to fulfill dual functions: news authenticity detection and verification of new knowledge correctness, solving the two key problems of ensuring the authenticity of new knowledge and deeply mining news semantics. Specifically, we first construct a news-domain-specific knowledge graph. Then, we use Monte Carlo Tree Search to decompose complex news and verify them step by step. Finally, we extract and update new knowledge from verified real news texts and reasoning paths. Experimental results demonstrate that DYNAMO achieves the best performance on two real-world datasets.
AIAug 14, 2025
STEP: Stepwise Curriculum Learning for Context-Knowledge Fusion in Conversational RecommendationZhenye Yang, Jinpeng Chen, Huan Li et al.
Conversational recommender systems (CRSs) aim to proactively capture user preferences through natural language dialogue and recommend high-quality items. To achieve this, CRS gathers user preferences via a dialog module and builds user profiles through a recommendation module to generate appropriate recommendations. However, existing CRS faces challenges in capturing the deep semantics of user preferences and dialogue context. In particular, the efficient integration of external knowledge graph (KG) information into dialogue generation and recommendation remains a pressing issue. Traditional approaches typically combine KG information directly with dialogue content, which often struggles with complex semantic relationships, resulting in recommendations that may not align with user expectations. To address these challenges, we introduce STEP, a conversational recommender centered on pre-trained language models that combines curriculum-guided context-knowledge fusion with lightweight task-specific prompt tuning. At its heart, an F-Former progressively aligns the dialogue context with knowledge-graph entities through a three-stage curriculum, thus resolving fine-grained semantic mismatches. The fused representation is then injected into the frozen language model via two minimal yet adaptive prefix prompts: a conversation prefix that steers response generation toward user intent and a recommendation prefix that biases item ranking toward knowledge-consistent candidates. This dual-prompt scheme allows the model to share cross-task semantics while respecting the distinct objectives of dialogue and recommendation. Experimental results show that STEP outperforms mainstream methods in the precision of recommendation and dialogue quality in two public datasets.
IRSep 9, 2021
Double-Scale Self-Supervised Hypergraph Learning for Group RecommendationJunwei Zhang, Min Gao, Junliang Yu et al.
With the prevalence of social media, there has recently been a proliferation of recommenders that shift their focus from individual modeling to group recommendation. Since the group preference is a mixture of various predilections from group members, the fundamental challenge of group recommendation is to model the correlations among members. Existing methods mostly adopt heuristic or attention-based preference aggregation strategies to synthesize group preferences. However, these models mainly focus on the pairwise connections of users and ignore the complex high-order interactions within and beyond groups. Besides, group recommendation suffers seriously from the problem of data sparsity due to severely sparse group-item interactions. In this paper, we propose a self-supervised hypergraph learning framework for group recommendation to achieve two goals: (1) capturing the intra- and inter-group interactions among users; (2) alleviating the data sparsity issue with the raw data itself. Technically, for (1), a hierarchical hypergraph convolutional network based on the user- and group-level hypergraphs is developed to model the complex tuplewise correlations among users within and beyond groups. For (2), we design a double-scale node dropout strategy to create self-supervision signals that can regularize user representations with different granularities against the sparsity issue. The experimental analysis on multiple benchmark datasets demonstrates the superiority of the proposed model and also elucidates the rationality of the hypergraph modeling and the double-scale self-supervision.
LGJan 27, 2021
Evolutionary Generative Adversarial Networks with Crossover Based Knowledge DistillationJunjie Li, Junwei Zhang, Xiaoyu Gong et al.
Generative Adversarial Networks (GAN) is an adversarial model, and it has been demonstrated to be effective for various generative tasks. However, GAN and its variants also suffer from many training problems, such as mode collapse and gradient vanish. In this paper, we firstly propose a general crossover operator, which can be widely applied to GANs using evolutionary strategies. Then we design an evolutionary GAN framework C-GAN based on it. And we combine the crossover operator with evolutionary generative adversarial networks (EGAN) to implement the evolutionary generative adversarial networks with crossover (CE-GAN). Under the premise that a variety of loss functions are used as mutation operators to generate mutation individuals, we evaluate the generated samples and allow the mutation individuals to learn experiences from the output in a knowledge distillation manner, imitating the best output outcome, resulting in better offspring. Then, we greedily selected the best offspring as parents for subsequent training using discriminator as evaluator. Experiments on real datasets demonstrate the effectiveness of CE-GAN and show that our method is competitive in terms of generated images quality and time efficiency.
LGNov 11, 2020
Proximal Policy Optimization via Enhanced Exploration EfficiencyJunwei Zhang, Zhenghao Zhang, Shuai Han et al.
Proximal policy optimization (PPO) algorithm is a deep reinforcement learning algorithm with outstanding performance, especially in continuous control tasks. But the performance of this method is still affected by its exploration ability. For classical reinforcement learning, there are some schemes that make exploration more full and balanced with data exploitation, but they can't be applied in complex environments due to the complexity of algorithm. Based on continuous control tasks with dense reward, this paper analyzes the assumption of the original Gaussian action exploration mechanism in PPO algorithm, and clarifies the influence of exploration ability on performance. Afterward, aiming at the problem of exploration, an exploration enhancement mechanism based on uncertainty estimation is designed in this paper. Then, we apply exploration enhancement theory to PPO algorithm and propose the proximal policy optimization algorithm with intrinsic exploration module (IEM-PPO) which can be used in complex environments. In the experimental parts, we evaluate our method on multiple tasks of MuJoCo physical simulator, and compare IEM-PPO algorithm with curiosity driven exploration algorithm (ICM-PPO) and original algorithm (PPO). The experimental results demonstrate that IEM-PPO algorithm needs longer training time, but performs better in terms of sample efficiency and cumulative reward, and has stability and robustness.
IRAug 10, 2020
Path-Based Reasoning over Heterogeneous Networks for Recommendation via Bidirectional ModelingJunwei Zhang, Min Gao, Junliang Yu et al.
Heterogeneous Information Network (HIN) is a natural and general representation of data in recommender systems. Combining HIN and recommender systems can not only help model user behaviors but also make the recommendation results explainable by aligning the users/items with various types of entities in the network. Over the past few years, path-based reasoning models have shown great capacity in HIN-based recommendation. The basic idea of these models is to explore HIN with predefined path schemes. Despite their effectiveness, these models are often confronted with the following limitations: (1) Most prior path-based reasoning models only consider the influence of the predecessors on the subsequent nodes when modeling the sequences, and ignore the reciprocity between the nodes in a path; (2) The weights of nodes in the same path instance are usually assumed to be constant, whereas varied weights of nodes can bring more flexibility and lead to expressive modeling; (3) User-item interactions are noisy, but they are often indiscriminately exploited. To overcome the aforementioned issues, in this paper, we propose a novel path-based reasoning approach for recommendation over HIN. Concretely, we use a bidirectional LSTM to enable the two-way modeling of paths and capture the reciprocity between nodes. Then an attention mechanism is employed to learn the dynamical influence of nodes in different contexts. Finally, the adversarial regularization terms are imposed on the loss function of the model to mitigate the effects of noise and enhance HIN-based recommendation. Extensive experiments conducted on three public datasets show that our model outperforms the state-of-the-art baselines. The case study further demonstrates the feasibility of our model on the explainable recommendation task.
IRMar 5, 2020
Recommender Systems Based on Generative Adversarial Networks: A Problem-Driven PerspectiveMin Gao, Junwei Zhang, Junliang Yu et al.
Recommender systems (RSs) now play a very important role in the online lives of people as they serve as personalized filters for users to find relevant items from an array of options. Owing to their effectiveness, RSs have been widely employed in consumer-oriented e-commerce platforms. However, despite their empirical successes, these systems still suffer from two limitations: data noise and data sparsity. In recent years, generative adversarial networks (GANs) have garnered increased interest in many fields, owing to their strong capacity to learn complex real data distributions; their abilities to enhance RSs by tackling the challenges these systems exhibit have also been demonstrated in numerous studies. In general, two lines of research have been conducted, and their common ideas can be summarized as follows: (1) for the data noise issue, adversarial perturbations and adversarial sampling-based training often serve as a solution; (2) for the data sparsity issue, data augmentation--implemented by capturing the distribution of real data under the minimax framework--is the primary coping strategy. To gain a comprehensive understanding of these research efforts, we review the corresponding studies and models, organizing them from a problem-driven perspective. More specifically, we propose a taxonomy of these models, along with their detailed descriptions and advantages. Finally, we elaborate on several open issues and current trends in GAN-based RSs.
LGDec 13, 2019
Recruitment-imitation Mechanism for Evolutionary Reinforcement LearningShuai Lü, Shuai Han, Wenbo Zhou et al.
Reinforcement learning, evolutionary algorithms and imitation learning are three principal methods to deal with continuous control tasks. Reinforcement learning is sample efficient, yet sensitive to hyper-parameters setting and needs efficient exploration; Evolutionary algorithms are stable, but with low sample efficiency; Imitation learning is both sample efficient and stable, however it requires the guidance of expert data. In this paper, we propose Recruitment-imitation Mechanism (RIM) for evolutionary reinforcement learning, a scalable framework that combines advantages of the three methods mentioned above. The core of this framework is a dual-actors and single critic reinforcement learning agent. This agent can recruit high-fitness actors from the population of evolutionary algorithms, which instructs itself to learn from experience replay buffer. At the same time, low-fitness actors in the evolutionary population can imitate behavior patterns of the reinforcement learning agent and improve their adaptability. Reinforcement and imitation learners in this framework can be replaced with any off-policy actor-critic reinforcement learner or data-driven imitation learner. We evaluate RIM on a series of benchmarks for continuous control tasks in Mujoco. The experimental results show that RIM outperforms prior evolutionary or reinforcement learning methods. The performance of RIM's components is significantly better than components of previous evolutionary reinforcement learning algorithm, and the recruitment using soft update enables reinforcement learning agent to learn faster than that using hard update.
IRJul 12, 2019
ScenarioSA: A Large Scale Conversational Database for Interactive Sentiment AnalysisYazhou Zhang, Lingling Song, Dawei Song et al.
Interactive sentiment analysis is an emerging, yet challenging, subtask of the sentiment analysis problem. It aims to discover the affective state and sentimental change of each person in a conversation. Existing sentiment analysis approaches are insufficient in modelling the interactions among people. However, the development of new approaches are critically limited by the lack of labelled interactive sentiment datasets. In this paper, we present a new conversational emotion database that we have created and made publically available, namely ScenarioSA. We manually label 2,214 multi-turn English conversations collected from natural contexts. In comparison with existing sentiment datasets, ScenarioSA (1) covers a wide range of scenarios; (2) describes the interactions between two speakers; and (3) reflects the sentimental evolution of each speaker over the course of a conversation. Finally, we evaluate various state-of-the-art algorithms on ScenarioSA, demonstrating the need of novel interactive sentiment analysis models and the potential of ScenarioSA to facilitate the development of such models.