Guilong Li

AI
3papers
23citations
Novelty53%
AI Score41

3 Papers

27.5CLMay 28
Reasoning-preserved Efficient Distillation of Large Language Models via Activation-aware Initialization

Junlin He, Yihong Tang, Tong Nie et al.

Efficient Distillation (EDistill) compresses large language models (LLMs) by structured pruning parameters and tuning lightweight modules with high training efficiency. Although these EDistilled LLMs achieve state-of-the-art (SOTA) performance on general ability benchmarks relative to similarly sized LLMs, we identify a severe degradation in their multi-step reasoning ability, which we term reasoning collapse. We systematically analyze the geometric origins of reasoning collapse and show that the SOTA EDistill method based on width-reducing projection matrices suffers from eRank collapse, in which the effective rank (eRank) of hidden representations drops. We theoretically explain how singular values of randomly initialized projection matrices become unevenly distributed, leading to eRank collapse and thus token indistinguishability. To address this issue, we propose RED (Reasoning-preserved Efficient Distillation) for LLMs, which introduces activation-aware initialization to initialize projection matrices as channel-selection matrices, thus theoretically mitigating eRank collapse. Experiments on Llama and Qwen series demonstrate that RED substantially recovers reasoning while maintaining high training efficiency and SOTA general ability.

LGAug 30, 2024
Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach

Tong Nie, Junlin He, Yuewen Mei et al.

The proliferation of e-commerce and urbanization has significantly intensified delivery operations in urban areas, boosting the volume and complexity of delivery demand. Data-driven predictive methods, especially those utilizing machine learning techniques, have emerged to handle these complexities in urban delivery demand management problems. One particularly pressing issue that has yet to be sufficiently addressed is the joint estimation and prediction of city-wide delivery demand, as well as the generalization of the model to new cities. To this end, we formulate this problem as a transferable graph-based spatiotemporal learning task. First, an individual-collective message-passing neural network model is formalized to capture the interaction between demand patterns of associated regions. Second, by exploiting recent advances in large language models (LLMs), we extract general geospatial knowledge encodings from the unstructured locational data using the embedding generated by LLMs. Last, to encourage the cross-city generalization of the model, we integrate the encoding into the demand predictor in a transferable way. Comprehensive empirical evaluation results on two real-world delivery datasets, including eight cities in China and the US, demonstrate that our model significantly outperforms state-of-the-art baselines in accuracy, efficiency, and transferability.

AIJan 30, 2022
Potential destination discovery for low predictability individuals based on knowledge graph

Guilong Li, Yixian Chen, Qionghua Liao et al.

Travelers may travel to locations they have never visited, which we call potential destinations of them. Especially under a very limited observation, travelers tend to show random movement patterns and usually have a large number of potential destinations, which make them difficult to handle for mobility prediction (e.g., destination prediction). In this paper, we develop a new knowledge graph-based framework (PDPFKG) for potential destination discovery of low predictability travelers by considering trip association relationships between them. We first construct a trip knowledge graph (TKG) to model the trip scenario by entities (e.g., travelers, destinations and time information) and their relationships, in which we introduce the concept of private relationship for complexity reduction. Then a modified knowledge graph embedding algorithm is implemented to optimize the overall graph representation. Based on the trip knowledge graph embedding model (TKGEM), the possible ranking of individuals' unobserved destinations to be chosen in the future can be obtained by calculating triples' distance. Empirically. PDPFKG is tested using an anonymous vehicular dataset from 138 intersections equipped with video-based vehicle detection systems in Xuancheng city, China. The results show that (i) the proposed method significantly outperforms baseline methods, and (ii) the results show strong consistency with traveler behavior in choosing potential destinations. Finally, we provide a comprehensive discussion of the innovative points of the methodology.