Lan Luo

h-index20

6papers

679citations

Novelty35%

AI Score38

Ranked #109,497 of 201,326 authors (top 54%)#7,192 in AI (top 51%)

6 Papers

CLMar 14, 2022

Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

Jiawei Zhou, Xiaoguang Li, Lifeng Shang et al.

To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR). However, there still remains a large discrepancy between the provided upstream signals and the downstream question-passage relevance, which leads to less improvement. To bridge this gap, we propose the HyperLink-induced Pre-training (HLP), a method to pre-train the dense retriever with the text relevance induced by hyperlink-based topology within Web documents. We demonstrate that the hyperlink-based structures of dual-link and co-mention can provide effective relevance signals for large-scale pre-training that better facilitate downstream passage retrieval. We investigate the effectiveness of our approach across a wide range of open-domain QA datasets under zero-shot, few-shot, multi-hop, and out-of-domain scenarios. The experiments show our HLP outperforms the BM25 by up to 7 points as well as other pre-training methods by more than 10 points in terms of top-20 retrieval accuracy under the zero-shot scenario. Furthermore, HLP significantly outperforms other pre-training methods under the other scenarios.

AIMar 11

Emulating Clinician Cognition via Self-Evolving Deep Clinical Research

Ruiyang Ren, Yuhao Wang, Yunsen Liang et al.

Clinical diagnosis is a complex cognitive process, grounded in dynamic cue acquisition and continuous expertise accumulation. Yet most current artificial intelligence (AI) systems are misaligned with this reality, treating diagnosis as single-pass retrospective prediction while lacking auditable mechanisms for governed improvement. We developed DxEvolve, a self-evolving diagnostic agent that bridges these gaps through an interactive deep clinical research workflow. The framework autonomously requisitions examinations and continually externalizes clinical experience from increasing encounter exposure as diagnostic cognition primitives. On the MIMIC-CDM benchmark, DxEvolve improved diagnostic accuracy by 11.2% on average over backbone models and reached 90.4% on a reader-study subset, comparable to the clinician reference (88.8%). DxEvolve improved accuracy on an independent external cohort by 10.2% (categories covered by the source cohort) and 17.1% (uncovered categories) compared to the competitive method. By transforming experience into a governable learning asset, DxEvolve supports an accountable pathway for the continual evolution of clinical AI.

CVJan 10, 2024

Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics

Beiwen Tian, Huan-ang Gao, Leiyao Cui et al.

In the past several years, road anomaly segmentation is actively explored in the academia and drawing growing attention in the industry. The rationale behind is straightforward: if the autonomous car can brake before hitting an anomalous object, safety is promoted. However, this rationale naturally calls for a temporally informed setting while existing methods and benchmarks are designed in an unrealistic frame-wise manner. To bridge this gap, we contribute the first video anomaly segmentation dataset for autonomous driving. Since placing various anomalous objects on busy roads and annotating them in every frame are dangerous and expensive, we resort to synthetic data. To improve the relevance of this synthetic dataset to real-world applications, we train a generative adversarial network conditioned on rendering G-buffers for photorealism enhancement. Our dataset consists of 120,000 high-resolution frames at a 60 FPS framerate, as recorded in 7 different towns. As an initial benchmarking, we provide baselines using latest supervised and unsupervised road anomaly segmentation methods. Apart from conventional ones, we focus on two new metrics: temporal consistency and latencyaware streaming accuracy. We believe the latter is valuable as it measures whether an anomaly segmentation algorithm can truly prevent a car from crashing in a temporally informed setting.

CVAug 10, 2021

DVM-CAR: A large-scale automotive dataset for visual marketing research and applications

Jingmin Huang, Bowei Chen, Lan Luo et al.

There is a growing interest in product aesthetics analytics and design. However, the lack of available large-scale data that covers various variables and information is one of the biggest challenges faced by analysts and researchers. In this paper, we present our multidisciplinary initiative of developing a comprehensive automotive dataset from different online sources and formats. Specifically, the created dataset contains 1.4 million images from 899 car models and their corresponding model specifications and sales information over more than ten years in the UK market. Our work makes significant contributions to: (i) research and applications in the automotive industry; (ii) big data creation and sharing; (iii) database design; and (iv) data fusion. Apart from our motivation, technical details and data structure, we further present three simple examples to demonstrate how our data can be used in business research and applications.

AISep 3, 2020

Learning to Infer User Hidden States for Online Sequential Advertising

Zhaoqing Peng, Junqi Jin, Lan Luo et al.

To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important. The lack of interpretability in existing deep reinforcement learning methods makes it not easy to understand, diagnose and further optimize the strategy. In this paper, we propose our Deep Intents Sequential Advertising (DISA) method to address these issues. The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states). In this paper, we model this intention as a latent variable and formulate the problem as a Partially Observable Markov Decision Process (POMDP) where the underlying intents are inferred based on the observable behaviors. Large-scale industrial offline and online experiments demonstrate our method's superior performance over several baselines. The inferred hidden states are analyzed, and the results prove the rationality of our inference.

CRJul 12, 2020

On Runtime Software Security of TrustZone-M based IoT Devices

Lan Luo, Yue Zhang, Cliff C. Zou et al.

Internet of Things (IoT) devices have been increasingly integrated into our daily life. However, such smart devices suffer a broad attack surface. Particularly, attacks targeting the device software at runtime are challenging to defend against if IoT devices use resource-constrained microcontrollers (MCUs). TrustZone-M, a TrustZone extension for MCUs, is an emerging security technique fortifying MCU based IoT devices. This paper presents the first security analysis of potential software security issues in TrustZone-M enabled MCUs. We explore the stack-based buffer overflow (BOF) attack for code injection, return-oriented programming (ROP) attack, heap-based BOF attack, format string attack, and attacks against Non-secure Callable (NSC) functions in the context of TrustZone-M. We validate these attacks using the TrustZone-M enabled SAM L11 MCU. Strategies to mitigate these software attacks are also discussed.