99.1CLMar 27
Switch Attention: Towards Dynamic and Fine-grained Hybrid TransformersYusheng Zhao, Hourun Li, Bohan Wu et al.
The attention mechanism has been the core component in modern transformer architectures. However, the computation of standard full attention scales quadratically with the sequence length, serving as a major bottleneck in long-context language modeling. Sliding window attention restricts the context length for better efficiency at the cost of narrower receptive fields. While existing efforts attempt to take the benefits from both sides by building hybrid models, they often resort to static, heuristically designed alternating patterns that limit efficient allocation of computation in various scenarios. In this paper, we propose Switch Attention (SwiAttn), a novel hybrid transformer that enables dynamic and fine-grained routing between full attention and sliding window attention. For each token at each transformer layer, SwiAttn dynamically routes the computation to either a full-attention branch for global information aggregation or a sliding-window branch for efficient local pattern matching. An adaptive regularization objective is designed to encourage the model towards efficiency. Moreover, we adopt continual pretraining to optimize the model, transferring the full attention architecture to the hybrid one. Extensive experiments are conducted on twenty-three benchmark datasets across both regular (4K) and long (32K) context lengths, demonstrating the effectiveness of the proposed method.
LGMar 7, 2024
A Survey of Graph Neural Networks in Real world: Imbalance, Noise, Privacy and OOD ChallengesWei Ju, Siyu Yi, Yifan Wang et al.
Graph-structured data exhibits universality and widespread applicability across diverse domains, such as social network analysis, biochemistry, financial fraud detection, and network security. Significant strides have been made in leveraging Graph Neural Networks (GNNs) to achieve remarkable success in these areas. However, in real-world scenarios, the training environment for models is often far from ideal, leading to substantial performance degradation of GNN models due to various unfavorable factors, including imbalance in data distribution, the presence of noise in erroneous data, privacy protection of sensitive information, and generalization capability for out-of-distribution (OOD) scenarios. To tackle these issues, substantial efforts have been devoted to improving the performance of GNN models in practical real-world scenarios, as well as enhancing their reliability and robustness. In this paper, we present a comprehensive survey that systematically reviews existing GNN models, focusing on solutions to the four mentioned real-world challenges including imbalance, noise, privacy, and OOD in practical scenarios that many existing reviews have not considered. Specifically, we first highlight the four key challenges faced by existing GNNs, paving the way for our exploration of real-world GNN models. Subsequently, we provide detailed discussions on these four aspects, dissecting how these solutions contribute to enhancing the reliability and robustness of GNN models. Last but not least, we outline promising directions and offer future perspectives in the field.
IRDec 19, 2024
DisCo: Graph-Based Disentangled Contrastive Learning for Cold-Start Cross-Domain RecommendationHourun Li, Yifan Wang, Zhiping Xiao et al.
Recommender systems are widely used in various real-world applications, but they often encounter the persistent challenge of the user cold-start problem. Cross-domain recommendation (CDR), which leverages user interactions from one domain to improve prediction performance in another, has emerged as a promising solution. However, users with similar preferences in the source domain may exhibit different interests in the target domain. Therefore, directly transferring embeddings may introduce irrelevant source-domain collaborative information. In this paper, we propose a novel graph-based disentangled contrastive learning framework to capture fine-grained user intent and filter out irrelevant collaborative information, thereby avoiding negative transfer. Specifically, for each domain, we use a multi-channel graph encoder to capture diverse user intents. We then construct the affinity graph in the embedding space and perform multi-step random walks to capture high-order user similarity relationships. Treating one domain as the target, we propose a disentangled intent-wise contrastive learning approach, guided by user similarity, to refine the bridging of user intents across domains. Extensive experiments on four benchmark CDR datasets demonstrate that DisCo consistently outperforms existing state-of-the-art baselines, thereby validating the effectiveness of both DisCo and its components.
LGJan 1, 2024
Graph Neural Networks in Intelligent Transportation Systems: Advances, Applications and TrendsHourun Li, Yusheng Zhao, Zhengyang Mao et al.
Intelligent Transportation System (ITS) is crucial for improving traffic congestion, reducing accidents, optimizing urban planning, and more. However, the complexity of traffic networks has rendered traditional machine learning and statistical methods less effective. With the advent of artificial intelligence, deep learning frameworks have achieved remarkable progress across various fields and are now considered highly effective in many areas. Since 2019, Graph Neural Networks (GNNs) have emerged as a particularly promising deep learning approach within the ITS domain, owing to their robust ability to model graph-structured data and address complex problems. Consequently, there has been increasing scholarly attention to the applications of GNNs in transportation, which have demonstrated excellent performance. Nevertheless, current research predominantly focuses on traffic forecasting, with other ITS domains, such as autonomous vehicles and demand prediction, receiving less attention. This paper aims to review the applications of GNNs across six representative and emerging ITS research areas: traffic forecasting, vehicle control system, traffic signal control, transportation safety, demand prediction, and parking management. We have examined a wide range of graph-related studies from 2018 to 2023, summarizing their methodologies, features, and contributions in detailed tables and lists. Additionally, we identify the challenges of applying GNNs in ITS and propose potential future research directions.
CLMay 13, 2025
ALOHA: Empowering Multilingual Agent for University Orientation with Hierarchical RetrievalMingxu Tao, Bowen Tang, Mingxuan Ma et al.
The rise of Large Language Models~(LLMs) revolutionizes information retrieval, allowing users to obtain required answers through complex instructions within conversations. However, publicly available services remain inadequate in addressing the needs of faculty and students to search campus-specific information. It is primarily due to the LLM's lack of domain-specific knowledge and the limitation of search engines in supporting multilingual and timely scenarios. To tackle these challenges, we introduce ALOHA, a multilingual agent enhanced by hierarchical retrieval for university orientation. We also integrate external APIs into the front-end interface to provide interactive service. The human evaluation and case study show our proposed system has strong capabilities to yield correct, timely, and user-friendly responses to the queries in multiple languages, surpassing commercial chatbots and search engines. The system has been deployed and has provided service for more than 12,000 people.