48.5ROJun 2
BotDirector: Robot Storytelling Across the Symmetrical Reality with Multi-modal InteractionsZhe Sun, Meng Wang, Lei Wang et al.
Robot storytelling offers a unique blend of technological innovation and creative expression that engages children in unprecedented ways. However, the technical aspects are often too complicated for children. We propose an interactive system that facilitates robot storytelling with tangible and natural language interactions. Children arrange the playground with their own stuff and create narratives with an LLM agent. The created narratives are transformed into a motion sequence based on the map and characters, and the motions are executed by self-navigating swarm robots. This system enhances robot storytelling with flexible scenarios, enabling young children to create robot dramas with everyday objects.
CROct 23, 2023
B^2SFL: A Bi-level Blockchained Architecture for Secure Federated Learning-based Traffic PredictionHao Guo, Collin Meese, Wanxin Li et al.
Federated Learning (FL) is a privacy-preserving machine learning (ML) technology that enables collaborative training and learning of a global ML model based on aggregating distributed local model updates. However, security and privacy guarantees could be compromised due to malicious participants and the centralized FL server. This article proposed a bi-level blockchained architecture for secure federated learning-based traffic prediction. The bottom and top layer blockchain store the local model and global aggregated parameters accordingly, and the distributed homomorphic-encrypted federated averaging (DHFA) scheme addresses the secure computation problems. We propose the partial private key distribution protocol and a partially homomorphic encryption/decryption scheme to achieve the distributed privacy-preserving federated averaging model. We conduct extensive experiments to measure the running time of DHFA operations, quantify the read and write performance of the blockchain network, and elucidate the impacts of varying regional group sizes and model complexities on the resulting prediction accuracy for the online traffic flow prediction task. The results indicate that the proposed system can facilitate secure and decentralized federated learning for real-world traffic prediction tasks.
CLMar 9, 2024
KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking TechniquesRui Yang, Haoran Liu, Edison Marrese-Taylor et al.
Large language models (LLMs) have demonstrated impressive generative capabilities with the potential to innovate in medicine. However, the application of LLMs in real clinical settings remains challenging due to the lack of factual consistency in the generated content. In this work, we develop an augmented LLM framework, KG-Rank, which leverages a medical knowledge graph (KG) along with ranking and re-ranking techniques, to improve the factuality of long-form question answering (QA) in the medical domain. Specifically, when receiving a question, KG-Rank automatically identifies medical entities within the question and retrieves the related triples from the medical KG to gather factual information. Subsequently, KG-Rank innovatively applies multiple ranking techniques to refine the ordering of these triples, providing more relevant and precise information for LLM inference. To the best of our knowledge, KG-Rank is the first application of KG combined with ranking models in medical QA specifically for generating long answers. Evaluation on four selected medical QA datasets demonstrates that KG-Rank achieves an improvement of over 18% in ROUGE-L score. Additionally, we extend KG-Rank to open domains, including law, business, music, and history, where it realizes a 14% improvement in ROUGE-L score, indicating the effectiveness and great potential of KG-Rank.
CLMar 9
Scalable Identification and Prioritization of Requisition-Specific Personal Competencies Using Large Language ModelsWanxin Li, Denver McNeney, Nivedita Prabhu et al.
AI-powered recruitment tools are increasingly adopted in personnel selection, yet they struggle to capture the requisition (req)-specific personal competencies (PCs) that distinguish successful candidates beyond job categories. We propose a large language model (LLM)-based approach to identify and prioritize req-specific PCs from reqs. Our approach integrates dynamic few-shot prompting, reflection-based self-improvement, similarity-based filtering, and multi-stage validation. Applied to a dataset of Program Manager reqs, our approach correctly identifies the highest-priority req-specific PCs with an average accuracy of 0.76, approaching human expert inter-rater reliability, and maintains a low out-of-scope rate of 0.07.
LGOct 5, 2025
Wasserstein projection distance for fairness testing of regression modelsWanxin Li, Yongjin P. Park, Khanh Dao Duc
Fairness in machine learning is a critical concern, yet most research has focused on classification tasks, leaving regression models underexplored. This paper introduces a Wasserstein projection-based framework for fairness testing in regression models, focusing on expectation-based criteria. We propose a hypothesis-testing approach and an optimal data perturbation method to improve fairness while balancing accuracy. Theoretical results include a detailed categorization of fairness criteria for regression, a dual reformulation of the Wasserstein projection test statistic, and the derivation of asymptotic bounds and limiting distributions. Experiments on synthetic and real-world datasets demonstrate that the proposed method offers higher specificity compared to permutation-based tests, and effectively detects and mitigates biases in real applications such as student performance and housing price prediction.
CLMay 24, 2025
The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language ModelsKefan Yu, Qingcheng Zeng, Weihao Xuan et al.
Current large language models (LLMs) have demonstrated emerging capabilities in social intelligence tasks, including implicature resolution and theory-of-mind reasoning, both of which require substantial pragmatic understanding. However, how LLMs acquire this pragmatic competence throughout the training process remains poorly understood. In this work, we introduce ALTPRAG, a dataset grounded in the pragmatic concept of alternatives, to evaluate whether LLMs at different training stages can accurately infer nuanced speaker intentions. Each instance pairs two equally plausible yet pragmatically divergent continuations and requires the model to (i) infer the speaker's intended meaning and (ii) explain when and why a speaker would choose one utterance over its alternative, thus directly probing pragmatic competence through contrastive reasoning. We systematically evaluate 22 LLMs across 3 key training stages: after pre-training, supervised fine-tuning (SFT), and preference optimization, to examine the development of pragmatic competence. Our results show that even base models exhibit notable sensitivity to pragmatic cues, which improves consistently with increases in model and data scale. Additionally, SFT and RLHF contribute further gains, particularly in cognitive-pragmatic scenarios. These findings highlight pragmatic competence as an emergent and compositional property of LLM training and offer new insights for aligning models with human communicative norms.
LGMay 31, 2023
Traffic Prediction using Artificial Intelligence: Review of Recent Advances and Emerging OpportunitiesMaryam Shaygan, Collin Meese, Wanxin Li et al.
Traffic prediction plays a crucial role in alleviating traffic congestion which represents a critical problem globally, resulting in negative consequences such as lost hours of additional travel time and increased fuel consumption. Integrating emerging technologies into transportation systems provides opportunities for improving traffic prediction significantly and brings about new research problems. In order to lay the foundation for understanding the open research challenges in traffic prediction, this survey aims to provide a comprehensive overview of traffic prediction methodologies. Specifically, we focus on the recent advances and emerging research opportunities in Artificial Intelligence (AI)-based traffic prediction methods, due to their recent success and potential in traffic prediction, with an emphasis on multivariate traffic time series modeling. We first provide a list and explanation of the various data types and resources used in the literature. Next, the essential data preprocessing methods within the traffic prediction context are categorized, and the prediction methods and applications are subsequently summarized. Lastly, we present primary research challenges in traffic prediction and discuss some directions for future research.
CROct 27, 2020
Blockchain-enabled Identity Verification for Safe Ridesharing Leveraging Zero-Knowledge ProofWanxin Li, Collin Meese, Hao Guo et al.
The on-demand mobility market, including ridesharing, is becoming increasingly important with e-hailing fares growing at a rate of approximately 130% per annum since 2013. By increasing utilization of existing vehicles and empty seats, ridesharing can provide many benefits including reduced traffic congestion and environmental impact from vehicle usage and production. However, the safety of riders and drivers has become of paramount concern and a method for privacy-preserving identity verification between untrusted parties is essential for protecting users. To this end, we propose a novel privacy-preserving identity verification system, extending zero-knowledge proof (ZKP) and blockchain for use in ridesharing applications. We design a permissioned blockchain network to perform the ZKP verification of a driver's identity, which also acts as an immutable ledger to store ride logs and ZKP records. For the ZKP module, we design a protocol to facilitate user verification without requiring the exchange of any private information. We prototype the proposed system on the Hyperledger Fabric platform, with the Hyperledger Ursa cryptography library, and conduct extensive experimentation. To measure the prototype's performance, we utilize the Hyperledger Caliper benchmark tool to perform extensive analysis and the results show that our system is suitable for use in real-world ridesharing applications.
CRFeb 25, 2020
Attribute-based Multi-Signature and Encryption for EHR Management: A Blockchain-based SolutionHao Guo, Wanxin Li, Ehsan Meamari et al.
The global Electronic Health Record (EHR) market is growing dramatically and has already hit $31.5 billion in 2018. To safeguard the security of EHR data and privacy of patients, fine-grained information access and sharing mechanisms are essential for EHR management. This paper proposes a hybrid architecture of blockchain and edge nodes to facilitate EHR management. In this architecture, we utilize attribute-based multi-signature (ABMS) scheme to authenticate user's signatures without revealing the sensitive information and multi-authority attribute-based encryption (ABE) scheme to encrypt EHR data which is stored on the edge node. We develop the blockchain module on Hyperledger Fabric platform and the ABMS module on Hyperledger Ursa library. We measure the signing and verifying time of the ABMS scheme under different settings, and experiment with the authentication events and access activities which are logged as transactions in blockchain.
NIJun 6, 2019
A Blockchain-Based Architecture for Traffic Signal Control SystemsWanxin Li, Mark Nejad, Rui Zhang
Ever-growing incorporation of connected vehicle (CV) technologies into intelligent traffic signal control systems bring about significant data security issues in the connected vehicular networks. This paper presents a novel decentralized and secure by design architecture for connected vehicle data security, which is based on the emerging blockchain paradigm. In a simulation study, we applied this architecture to defend the Intelligent Traffic Signal System (I-SIG), a USDOT approved CV pilot program, against congestion attacks. The results show the performance of the proposed architecture for the traffic signal control system.
CRJun 4, 2019
Access Control for Electronic Health Records with Hybrid Blockchain-Edge ArchitectureHao Guo, Wanxin Li, Mark Nejad et al.
The global Electronic Health Record (EHR) market is growing dramatically and expected to reach $39.7 billions by 2022. To safe-guard security and privacy of EHR, access control is an essential mechanism for managing EHR data. This paper proposes a hybrid architecture to facilitate access control of EHR data by using both blockchain and edge node. Within the architecture, a blockchain-based controller manages identity and access control policies and serves as a tamper-proof log of access events. In addition, off-chain edge nodes store the EHR data and apply policies specified in Abbreviated Language For Authorization (ALFA) to enforce attribute-based access control on EHR data in collaboration with the blockchain-based access control logs. We evaluate the proposed hybrid architecture by utilizing Hyperledger Composer Fabric blockchain to measure the performance of executing smart contracts and ACL policies in terms of transaction processing time and response time against unauthorized data retrieval.