Qi Shen

IR
h-index5
15papers
318citations
Novelty51%
AI Score46

15 Papers

CVJan 4Code
BARE: Towards Bias-Aware and Reasoning-Enhanced One-Tower Visual Grounding

Hongbing Li, Linhui Xiao, Zihan Zhao et al.

Visual Grounding (VG), which aims to locate a specific region referred to by expressions, is a fundamental yet challenging task in the multimodal understanding fields. While recent grounding transfer works have advanced the field through one-tower architectures, they still suffer from two primary limitations: (1) over-entangled multimodal representations that exacerbate deceptive modality biases, and (2) insufficient semantic reasoning that hinders the comprehension of referential cues. In this paper, we propose BARE, a bias-aware and reasoning-enhanced framework for one-tower visual grounding. BARE introduces a mechanism that preserves modality-specific features and constructs referential semantics through three novel modules: (i) language salience modulator, (ii) visual bias correction and (iii) referential relationship enhancement, which jointly mitigate multimodal distractions and enhance referential comprehension. Extensive experimental results on five benchmarks demonstrate that BARE not only achieves state-of-the-art performance but also delivers superior computational efficiency compared to existing approaches. The code is publicly accessible at https://github.com/Marloweeee/BARE.

CLDec 31, 2025
BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts

Hengli Li, Zhaoxin Yu, Qi Shen et al.

Strategic dialogue requires agents to execute distinct dialogue acts, for which belief estimation is essential. While prior work often estimates beliefs accurately, it lacks a principled mechanism to use those beliefs during generation. We bridge this gap by first formalizing two core acts Adversarial and Alignment, and by operationalizing them via probabilistic constraints on what an agent may generate. We instantiate this idea in BEDA, a framework that consists of the world set, the belief estimator for belief estimation, and the conditional generator that selects acts and realizes utterances consistent with the inferred beliefs. Across three settings, Conditional Keeper Burglar (CKBG, adversarial), Mutual Friends (MF, cooperative), and CaSiNo (negotiation), BEDA consistently outperforms strong baselines: on CKBG it improves success rate by at least 5.0 points across backbones and by 20.6 points with GPT-4.1-nano; on Mutual Friends it achieves an average improvement of 9.3 points; and on CaSiNo it achieves the optimal deal relative to all baselines. These results indicate that casting belief estimation as constraints provides a simple, general mechanism for reliable strategic dialogue.

SDMay 15, 2024
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

Weifei Jin, Yuxin Cao, Junjie Su et al.

In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study.

LGSep 29, 2025
BiHDTrans: binary hyperdimensional transformer for efficient multivariate time series classification

Jingtao Zhang, Yi Liu, Qi Shen et al.

The proliferation of Internet-of-Things (IoT) devices has led to an unprecedented volume of multivariate time series (MTS) data, requiring efficient and accurate processing for timely decision-making in resource-constrained edge environments. Hyperdimensional (HD) computing, with its inherent efficiency and parallelizability, has shown promise in classification tasks but struggles to capture complex temporal patterns, while Transformers excel at sequence modeling but incur high computational and memory overhead. We introduce BiHDTrans, an efficient neurosymbolic binary hyperdimensional Transformer that integrates self-attention into the HD computing paradigm, unifying the representational efficiency of HD computing with the temporal modeling power of Transformers. Empirically, BiHDTrans outperforms state-of-the-art (SOTA) HD computing models by at least 14.47% and achieves 6.67% higher accuracy on average than SOTA binary Transformers. With hardware acceleration on FPGA, our pipelined implementation leverages the independent and identically distributed properties of high-dimensional representations, delivering 39.4 times lower inference latency than SOTA binary Transformers. Theoretical analysis shows that binarizing in holographic high-dimensional space incurs significantly less information distortion than directly binarizing neural networks, explaining BiHDTrans's superior accuracy. Furthermore, dimensionality experiments confirm that BiHDTrans remains competitive even with a 64% reduction in hyperspace dimensionality, surpassing SOTA binary Transformers by 1-2% in accuracy with 4.4 times less model size, as well as further reducing the latency by 49.8% compare to the full-dimensional baseline. Together, these contributions bridge the gap between the expressiveness of Transformers and the efficiency of HD computing, enabling accurate, scalable, and low-latency MTS classification.

IRDec 31, 2021
Intention Adaptive Graph Neural Network for Category-aware Session-based Recommendation

Chuan Cui, Qi Shen, Shixuan Zhu et al.

Session-based recommendation (SBR) is proposed to recommend items within short sessions given that user profiles are invisible in various scenarios nowadays, such as e-commerce and short video recommendation. There is a common scenario that user specifies a target category of items as a global filter, however previous SBR settings mainly consider the item sequence and overlook the rich target category information. Therefore, we define a new task called Category-aware Session-Based Recommendation (CSBR), focusing on the above scenario, in which the user-specified category can be efficiently utilized by the recommendation system. To address the challenges of the proposed task, we develop a novel method called Intention Adaptive Graph Neural Network (IAGNN), which takes advantage of relationship between items and their categories to achieve an accurate recommendation result. Specifically, we construct a category-aware graph with both item and category nodes to represent the complex transition information in the session. An intention-adaptive graph neural network on the category-aware graph is utilized to capture user intention by transferring the historical interaction information to the user-specified category domain. Extensive experiments on three real-world datasets are conducted to show our IAGNN outperforms the state-of-the-art baselines in the new task.

IRDec 31, 2021
Temporal aware Multi-Interest Graph Neural Network For Session-based Recommendation

Qi Shen, Shixuan Zhu, Yitong Pang et al.

Session-based recommendation (SBR) is a challenging task, which aims at recommending next items based on anonymous interaction sequences. Despite the superior performance of existing methods for SBR, there are still several limitations: (i) Almost all existing works concentrate on single interest extraction and fail to disentangle multiple interests of user, which easily results in suboptimal representations for SBR. (ii) Furthermore, previous methods also ignore the multi-form temporal information, which is significant signal to obtain current intention for SBR. To address the limitations mentioned above, we propose a novel method, called \emph{Temporal aware Multi-Interest Graph Neural Network} (TMI-GNN) to disentangle multi-interest and yield refined intention representations with the injection of two level temporal information. Specifically, by appending multiple interest nodes, we construct a multi-interest graph for current session, and adopt the GNNs to model the item-item relation to capture adjacent item transitions, item-interest relation to disentangle the multi-interests, and interest-item relation to refine the item representation. Meanwhile, we incorporate item-level time interval signals to guide the item information propagation, and interest-level time distribution information to assist the scattering of interest information. Experiments on three benchmark datasets demonstrate that TMI-GNN outperforms other state-of-the-art methods consistently.

IRDec 22, 2021
Multiple Choice Questions based Multi-Interest Policy Learning for Conversational Recommendation

Yiming Zhang, Lingfei Wu, Qi Shen et al.

Conversational recommendation system (CRS) is able to obtain fine-grained and dynamic user preferences based on interactive dialogue. Previous CRS assumes that the user has a clear target item. However, for many users who resort to CRS, they might not have a clear idea about what they really like. Specifically, the user may have a clear single preference for some attribute types (e.g. color) of items, while for other attribute types, the user may have multiple preferences or even no clear preferences, which leads to multiple acceptable attribute instances (e.g. black and red) of one attribute type. Therefore, the users could show their preferences over items under multiple combinations of attribute instances rather than a single item with unique combination of all attribute instances. As a result, we first propose a more realistic CRS learning setting, namely Multi-Interest Multi-round Conversational Recommendation, where users may have multiple interests in attribute instance combinations and accept multiple items with partially overlapped combinations of attribute instances. To effectively cope with the new CRS learning setting, in this paper, we propose a novel learning framework namely, Multi-Choice questions based Multi-Interest Policy Learning . In order to obtain user preferences more efficiently, the agent generates multi-choice questions rather than binary yes/no ones on specific attribute instance. Besides, we propose a union set strategy to select candidate items instead of existing intersection set strategy in order to overcome over-filtering items during the conversation. Finally, we design a Multi-Interest Policy Learning module, which utilizes captured multiple interests of the user to decide next action, either asking attribute instances or recommending items. Extensive experimental results on four datasets verify the superiority of our method for the proposed setting.

IRSep 24, 2021
Multi-behavior Graph Contextual Aware Network for Session-based Recommendation

Qi Shen, Lingfei Wu, Yitong Pang et al.

Predicting the next interaction of a short-term sequence is a challenging task in session-based recommendation (SBR).Multi-behavior session recommendation considers session sequence with multiple interaction types, such as click and purchase, to capture more effective user intention representation sufficiently.Despite the superior performance of existing multi-behavior based methods for SBR, there are still several severe limitations:(i) Almost all existing works concentrate on single target type of next behavior and fail to model multiplex behavior sessions uniformly.(ii) Previous methods also ignore the semantic relations between various next behavior and historical behavior sequence, which are significant signals to obtain current latent intention for SBR.(iii) The global cross-session item-item graph established by some existing models may incorporate semantics and context level noise for multi-behavior session-based recommendation. To overcome the limitations (i) and (ii), we propose two novel tasks for SBR, which require the incorporation of both historical behaviors and next behaviors into unified multi-behavior recommendation modeling. To this end, we design a Multi-behavior Graph Contextual Aware Network (MGCNet) for multi-behavior session-based recommendation for the two proposed tasks. Specifically, we build a multi-behavior global item transition graph based on all sessions involving all interaction types. Based on the global graph, MGCNet attaches the global interest representation to final item representation based on local contextual intention to address the limitation (iii). In the end, we utilize the next behavior information explicitly to guide the learning of general interest and current intention for SBR. Experiments on three public benchmark datasets show that MGCNet can outperform state-of-the-art models for multi-behavior session-based recommendation.

IRSep 24, 2021
Graph Learning Augmented Heterogeneous Graph Neural Network for Social Recommendation

Yiming Zhang, Lingfei Wu, Qi Shen et al.

Social recommendation based on social network has achieved great success in improving the performance of recommendation system. Since social network (user-user relations) and user-item interactions are both naturally represented as graph-structured data, Graph Neural Networks (GNNs) have thus been widely applied for social recommendation. In this work, we propose an end-to-end heterogeneous global graph learning framework, namely Graph Learning Augmented Heterogeneous Graph Neural Network (GL-HGNN) for social recommendation. GL-HGNN aims to learn a heterogeneous global graph that makes full use of user-user relations, user-item interactions and item-item similarities in a unified perspective. To this end, we design a Graph Learner (GL) method to learn and optimize user-user and item-item connections separately. Moreover, we employ a Heterogeneous Graph Neural Network (HGNN) to capture the high-order complex semantic relations from our learned heterogeneous global graph. To scale up the computation of graph learning, we further present the Anchor-based Graph Learner (AGL) to reduce computational complexity. Extensive experiments on four real-world datasets demonstrate the effectiveness of our model.

IRJul 8, 2021
Heterogeneous Global Graph Neural Networks for Personalized Session-based Recommendation

Yitong Pang, Lingfei Wu, Qi Shen et al.

Predicting the next interaction of a short-term interaction session is a challenging task in session-based recommendation. Almost all existing works rely on item transition patterns, and neglect the impact of user historical sessions while modeling user preference, which often leads to non-personalized recommendation. Additionally, existing personalized session-based recommenders capture user preference only based on the sessions of the current user, but ignore the useful item-transition patterns from other user's historical sessions. To address these issues, we propose a novel Heterogeneous Global Graph Neural Networks (HG-GNN) to exploit the item transitions over all sessions in a subtle manner for better inferring user preference from the current and historical sessions. To effectively exploit the item transitions over all sessions from users, we propose a novel heterogeneous global graph that contains item transitions of sessions, user-item interactions and global co-occurrence items. Moreover, to capture user preference from sessions comprehensively, we propose to learn two levels of user representations from the global graph via two graph augmented preference encoders. Specifically, we design a novel heterogeneous graph neural network (HGNN) on the heterogeneous global graph to learn the long-term user preference and item representations with rich semantics. Based on the HGNN, we propose the Current Preference Encoder and the Historical Preference Encoder to capture the different levels of user preference from the current and historical sessions, respectively. To achieve personalized recommendation, we integrate the representations of the user current preference and historical interests to generate the final user preference representation. Extensive experimental results on three real-world datasets show that our model outperforms other state-of-the-art methods.

SEMar 22, 2021
Comprehensive Integration of API Usage Patterns

Qi Shen, Shijun Wu, Yanzhen Zou et al.

Nowadays, developers often reuse existing APIs to implement their programming tasks. A lot of API usage patterns are mined to help developers learn API usage rules. However, there are still many missing variables to be synthesized when developers integrate the patterns into their programming context. To deal with this issue, we propose a comprehensive approach to integrate API usage patterns in this paper. We first perform an empirical study by analyzing how API usage patterns are integrated in real-world projects. We find the expressions for variable synthesis is often non-trivial and can be divided into 5 syntax types. Based on the observation, we promote an approach to help developers interactively complete API usage patterns. Compared to the existing code completion techniques, our approach can recommend infrequent expressions accompanied with their real-world usage examples according to the user intent. The evaluation shows that our approach could assist users to integrate APIs more efficiently and complete the programming tasks faster than existing works.

CVOct 13, 2020
Robust Two-Stream Multi-Feature Network for Driver Drowsiness Detection

Qi Shen, Shengjie Zhao, Rongqing Zhang et al.

Drowsiness driving is a major cause of traffic accidents and thus numerous previous researches have focused on driver drowsiness detection. Many drive relevant factors have been taken into consideration for fatigue detection and can lead to high precision, but there are still several serious constraints, such as most existing models are environmentally susceptible. In this paper, fatigue detection is considered as temporal action detection problem instead of image classification. The proposed detection system can be divided into four parts: (1) Localize the key patches of the detected driver picture which are critical for fatigue detection and calculate the corresponding optical flow. (2) Contrast Limited Adaptive Histogram Equalization (CLAHE) is used in our system to reduce the impact of different light conditions. (3) Three individual two-stream networks combined with attention mechanism are designed for each feature to extract temporal information. (4) The outputs of the three sub-networks will be concatenated and sent to the fully-connected network, which judges the status of the driver. The drowsiness detection system is trained and evaluated on the famous Nation Tsing Hua University Driver Drowsiness Detection (NTHU-DDD) dataset and we obtain an accuracy of 94.46%, which outperforms most existing fatigue detection models.

SEJul 7, 2020
From API to NLI: A New Interface for Library Reuse

Qi Shen, Shijun Wu, Yanzhen Zou et al.

Developers frequently reuse APIs from existing libraries to implement certain functionality. However, learning APIs is difficult due to their large scale and complexity. In this paper, we design an abstract framework NLI2Code to ease the reuse process. Under the framework, users can reuse library functionalities with a high-level, automatically-generated NLI (Natural Language Interface) instead of the detailed API elements. The framework consists of three components: a functional feature extractor to summarize the frequently-used library functions in natural language form, a code pattern miner to give a code template for each functional feature, and a synthesizer to complete code patterns into well-typed snippets. From the perspective of a user, a reuse task under NLI2Code starts from choosing a functional feature and our framework will guide the user to synthesize the desired solution. We instantiated the framework as a tool to reuse Java libraries. The evaluation shows our tool can generate a high-quality natural language interface and save half of the coding time for newcomers to solve real-world programming tasks.

CPSep 22, 2018
Deep Learning-Based BSDE Solver for Libor Market Model with Application to Bermudan Swaption Pricing and Hedging

Haojie Wang, Han Chen, Agus Sudjianto et al.

The Libor market model is a mainstay term structure model of interest rates for derivatives pricing, especially for Bermudan swaptions, and other exotic Libor callable derivatives. For numerical implementation the pricing of derivatives with Libor market models is mainly carried out with Monte Carlo simulation. The PDE grid approach is not particularly feasible due to Curse of Dimensionality. The standard Monte Carlo method for American/Bermudan swaption pricing more or less uses regression to estimate expected value as a linear combination of basis functions (Longstaff and Schwartz). However, Monte Carlo method only provides the lower bound for American option price. Another complexity is the computation of the sensitivities of the option, the so-called Greeks, which are fundamental for a trader's hedging activity. Recently, an alternative numerical method based on deep learning and backward stochastic differential equations appeared in quite a few researches. For European style options the feedforward deep neural networks (DNN) show not only feasibility but also efficiency to obtain both prices and numerical Greeks. In this paper, a new backward DNN solver is proposed for Bermudan swaptions. Our approach is representing financial pricing problems in the form of high dimensional stochastic optimal control problems, FBSDEs, or equivalent PDEs. We demonstrate that using backward DNN the high-dimension Bermudan swaption pricing and hedging can be solved effectively and efficiently. A comparison between Monte Carlo simulation and the new method for pricing vanilla interest rate options manifests the superior performance of the new method. We then use this method to calculate prices and Greeks of Bermudan swaptions as a prelude for other Libor callable derivatives.

GRMay 20, 2018
Object-Aware Guidance for Autonomous Scene Reconstruction

Ligang Liu, Xi Xia, Han Sun et al.

To carry out autonomous 3D scanning and online reconstruction of unknown indoor scenes, one has to find a balance between global exploration of the entire scene and local scanning of the objects within it. In this work, we propose a novel approach, which provides object-aware guidance for autoscanning, for exploring, reconstructing, and understanding an unknown scene within one navigation pass. Our approach interleaves between object analysis to identify the next best object (NBO) for global exploration, and object-aware information gain analysis to plan the next best view (NBV) for local scanning. First, an objectness-based segmentation method is introduced to extract semantic objects from the current scene surface via a multi-class graph cuts minimization. Then, an object of interest (OOI) is identified as the NBO which the robot aims to visit and scan. The robot then conducts fine scanning on the OOI with views determined by the NBV strategy. When the OOI is recognized as a full object, it can be replaced by its most similar 3D model in a shape database. The algorithm iterates until all of the objects are recognized and reconstructed in the scene. Various experiments and comparisons have shown the feasibility of our proposed approach.