CLMay 24, 2022
D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented ChatBinwei Yao, Chao Shi, Likai Zou et al.
In a depression-diagnosis-directed clinical session, doctors initiate a conversation with ample emotional support that guides the patients to expose their symptoms based on clinical diagnosis criteria. Such a dialogue system is distinguished from existing single-purpose human-machine dialog systems, as it combines task-oriented and chit-chats with uniqueness in dialogue topics and procedures. However, due to the social stigma associated with mental illness, the dialogue data related to depression consultation and diagnosis are rarely disclosed. Based on clinical depression diagnostic criteria ICD-11 and DSM-5, we designed a 3-phase procedure to construct D$^4$: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat, which simulates the dialogue between doctors and patients during the diagnosis of depression, including diagnosis results and symptom summary given by professional psychiatrists for each conversation. Upon the newly-constructed dataset, four tasks mirroring the depression diagnosis process are established: response generation, topic prediction, dialog summary, and severity classification of depressive episode and suicide risk. Multi-scale evaluation results demonstrate that a more empathy-driven and diagnostic-accurate consultation dialogue system trained on our dataset can be achieved compared to rule-based bots.
LGAug 22, 2023
Federated Learning in Big Model Era: Domain-Specific Multimodal Large ModelsZengxiang Li, Zhaoxiang Hou, Hui Liu et al.
Multimodal data, which can comprehensively perceive and recognize the physical world, has become an essential path towards general artificial intelligence. However, multimodal large models trained on public datasets often underperform in specific industrial domains. This paper proposes a multimodal federated learning framework that enables multiple enterprises to utilize private domain data to collaboratively train large models for vertical domains, achieving intelligent services across scenarios. The authors discuss in-depth the strategic transformation of federated learning in terms of intelligence foundation and objectives in the era of big model, as well as the new challenges faced in heterogeneous data, model aggregation, performance and cost trade-off, data privacy, and incentive mechanism. The paper elaborates a case study of leading enterprises contributing multimodal data and expert knowledge to city safety operation management , including distributed deployment and efficient coordination of the federated learning platform, technical innovations on data quality improvement based on large model capabilities and efficient joint fine-tuning approaches. Preliminary experiments show that enterprises can enhance and accumulate intelligent capabilities through multimodal model federated learning, thereby jointly creating an smart city model that provides high-quality intelligent services covering energy infrastructure safety, residential community security, and urban operation management. The established federated learning cooperation ecosystem is expected to further aggregate industry, academia, and research resources, realize large models in multiple vertical domains, and promote the large-scale industrial application of artificial intelligence and cutting-edge research on multimodal federated learning.
79.4CVApr 13
MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language ModelsXincheng Yao, Zefeng Qian, Chao Shi et al.
In the progress of industrial anomaly detection, general anomaly detection (GAD) is an emerging trend and also the ultimate goal. Unlike the conventional single- and multi-class AD, general AD aims to train a general AD model that can directly detect anomalies in diverse novel classes without any retraining or fine-tuning on the target data. Recently, Multimodal Large Language Models (MLLMs) have shown great promise in achieving general anomaly detection due to their revolutionary visual understanding and language reasoning capabilities. However, MLLM's general AD ability remains underexplored due to: (1) MLLMs are pretrained on amounts of data sourced from the Web, these data still have significant gaps with the data in AD scenarios. Moreover, the image-text pairs during pretraining are also not specifically for AD tasks. (2) The current mainstream AD datasets are image-based and not yet suitable for post-training MLLMs. To facilitate MLLM-based general AD research, we present MMR-AD, which is a comprehensive benchmark for both training and evaluating MLLM-based AD models. With MMR-AD, we reveal that the AD performance of current SOTA generalist MLLMs still falls far behind the industrial requirements. Based on MMR-AD, we also propose a baseline model, Anomaly-R1, which is a reasoning-based AD model that learns from the CoT data in MMR-AD and is further enhanced by reinforcement learning. Extensive experiments show that our Anomaly-R1 achieves remarkable improvements over generalist MLLMs in both anomaly detection and localization.
CVSep 28, 2025Code
ResAD++: Towards Class Agnostic Anomaly Detection via Residual Feature LearningXincheng Yao, Chao Shi, Muming Zhao et al.
This paper explores the problem of class-agnostic anomaly detection (AD), where the objective is to train one class-agnostic AD model that can generalize to detect anomalies in diverse new classes from different domains without any retraining or fine-tuning on the target data. When applied for new classes, the performance of current single- and multi-class AD methods is still unsatisfactory. One fundamental reason is that representation learning in existing methods is still class-related, namely, feature correlation. To address this issue, we propose residual features and construct a simple but effective framework, termed ResAD. Our core insight is to learn the residual feature distribution rather than the initial feature distribution. Residual features are formed by matching and then subtracting normal reference features. In this way, we can effectively realize feature decorrelation. Even in new classes, the distribution of normal residual features would not remarkably shift from the learned distribution. In addition, we think that residual features still have one issue: scale correlation. To this end, we propose a feature hypersphere constraining approach, which learns to constrain initial normal residual features into a spatial hypersphere for enabling the feature scales of different classes as consistent as possible. Furthermore, we propose a novel logbarrier bidirectional contraction OCC loss and vector quantization based feature distribution matching module to enhance ResAD, leading to the improved version of ResAD (ResAD++). Comprehensive experiments on eight real-world AD datasets demonstrate that our ResAD++ can achieve remarkable AD results when directly used in new classes, outperforming state-of-the-art competing methods and also surpassing ResAD. The code is available at https://github.com/xcyao00/ResAD.
ROJul 25, 2021
An Internal Arc Fixation Channel and Automatic Planning Algorithm for Pelvic FractureQing Yang, Jian Song, Chang Cheng et al.
Fixating fractured pelvis fragments with the sacroiliac screw is a common treatment for unstable pelvis fracture. Due to the complex shape of the pelvis, sometimes a suitable straight screw fixation channel cannot be found using traditional methods, which increases the difficulty of pelvic fracture fixation. Therefore, there is an urgent need to find a new screw fixation method to improve the feasibility of pelvic fracture fixation. In this study, a new method of arc nail fixation is proposed to treat the pelvic fracture. An algorithm is proposed to verify the feasibility of the internal arc fixation channel (IAFC) in the pelvis, and the algorithm can calculate a relatively optimal IAFC in the pelvis. Furthermore, we compared the advantages and disadvantages of arc channel and straight channel through experiments. This study verified the feasibility of the IAFC, and the comparison of experimental results shows that the adaptability and safety of the arc channel fixation is better than the traditional straight sacroiliac screw.
HCJun 28, 2019
Non-user Inclusive Design for Maintaining Harmony of Real-Virtual Human Interaction in Augmented RealityChao Shi
Augmented reality enables the illusion of contents such as objects and humans in the virtual world co-existing with users in the real world. However, non-users who are not aware of the presence of the virtual world and dynamically move nearby might either cause a conflict by directly breaking into space where a user is talking to a Virtual Human (VH), or be troubled when try to avoid disturbing the user. To maintain harmony and keep both the user's and non-users' comfort, we propose a method that controls the VH to adjust its own position to avoid such potential conflict. The difficulty to address this problem is that the agent must avoid potential conflict in a natural way to keep the user away from feeling unnatural. Our idea is to endow the VH with three capabilities: anticipating non-users walking around, understanding how to establish and maintain proper formation to adapt to the environment, and planning to avoid conflicts by shifting formation in advance. We develop a non-user inclusive spatial formation model that realizes natural arrangement shift corresponding to the environment based on theoretical sources from literature. We implemented our proposed model into a VH behavior planning system to achieve natural conflict avoidance. Evaluation experiments showed that it successfully reduces potential conflicts caused by non-users.
MLJun 27, 2018
Dynamic Assortment Selection under the Nested Logit ModelsXi Chen, Chao Shi, Yining Wang et al.
We study a stylized dynamic assortment planning problem during a selling season of finite length $T$. At each time period, the seller offers an arriving customer an assortment of substitutable products and the customer makes the purchase among offered products according to a discrete choice model. The goal of the seller is to maximize the expected revenue, or equivalently, to minimize the worst-case expected regret. One key challenge is that utilities of products are unknown to the seller and need to be learned. Although the dynamic assortment planning problem has received increasing attention in revenue management, most existing work is based on the multinomial logit choice models (MNL). In this paper, we study the problem of dynamic assortment planning under a more general choice model -- the nested logit model, which models hierarchical choice behavior and is ``the most widely used member of the GEV (generalized extreme value) family''. By leveraging the revenue-ordered structure of the optimal assortment within each nest, we develop a novel upper confidence bound (UCB) policy with an aggregated estimation scheme. Our policy simultaneously learns customers' choice behavior and makes dynamic decisions on assortments based on the current knowledge. It achieves the accumulated regret at the order of $\tilde{O}(\sqrt{MNT})$, where $M$ is the number of nests and $N$ is the number of products in each nest. We further provide a lower bound result of $Ω(\sqrt{MT})$, which shows the near optimality of the upper bound when $T$ is much larger than $M$ and $N$. When the number of items per nest $N$ is large, we further provide a discretization heuristic for better performance of our algorithm. Numerical results are presented to demonstrate the empirical performance of our proposed algorithms.