AIJul 11, 2024
Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence ModelsWanling Gao, Yunyou Huang, Dandan Cui et al.
A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-step inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816.
LGJul 3, 2023
OpenClinicalAI: An Open and Dynamic Model for Alzheimer's Disease DiagnosisYunyou Huang, Xiaoshuang Liang, Xiangjiang Lu et al.
Although Alzheimer's disease (AD) cannot be reversed or cured, timely diagnosis can significantly reduce the burden of treatment and care. Current research on AD diagnosis models usually regards the diagnosis task as a typical classification task with two primary assumptions: 1) All target categories are known a priori; 2) The diagnostic strategy for each patient is consistent, that is, the number and type of model input data for each patient are the same. However, real-world clinical settings are open, with complexity and uncertainty in terms of both subjects and the resources of the medical institutions. This means that diagnostic models may encounter unseen disease categories and need to dynamically develop diagnostic strategies based on the subject's specific circumstances and available medical resources. Thus, the AD diagnosis task is tangled and coupled with the diagnosis strategy formulation. To promote the application of diagnostic systems in real-world clinical settings, we propose OpenClinicalAI for direct AD diagnosis in complex and uncertain clinical settings. This is the first powerful end-to-end model to dynamically formulate diagnostic strategies and provide diagnostic results based on the subject's conditions and available medical resources. OpenClinicalAI combines reciprocally coupled deep multiaction reinforcement learning (DMARL) for diagnostic strategy formulation and multicenter meta-learning (MCML) for open-set recognition. The experimental results show that OpenClinicalAI achieves better performance and fewer clinical examinations than the state-of-the-art model. Our method provides an opportunity to embed the AD diagnostic system into the current health care system to cooperate with clinicians to improve current health care.
LGJul 3, 2023
OpenAPMax: Abnormal Patterns-based Model for Real-World Alzheimer's Disease DiagnosisYunyou Huang, Xianglong Guan, Xiangjiang Lu et al.
Alzheimer's disease (AD) cannot be reversed, but early diagnosis will significantly benefit patients' medical treatment and care. In recent works, AD diagnosis has the primary assumption that all categories are known a prior -- a closed-set classification problem, which contrasts with the open-set recognition problem. This assumption hinders the application of the model in natural clinical settings. Although many open-set recognition technologies have been proposed in other fields, they are challenging to use for AD diagnosis directly since 1) AD is a degenerative disease of the nervous system with similar symptoms at each stage, and it is difficult to distinguish from its pre-state, and 2) diversified strategies for AD diagnosis are challenging to model uniformly. In this work, inspired by the concerns of clinicians during diagnosis, we propose an open-set recognition model, OpenAPMax, based on the anomaly pattern to address AD diagnosis in real-world settings. OpenAPMax first obtains the abnormal pattern of each patient relative to each known category through statistics or a literature search, clusters the patients' abnormal pattern, and finally, uses extreme value theory (EVT) to model the distance between each patient's abnormal pattern and the center of their category and modify the classification probability. We evaluate the performance of the proposed method with recent open-set recognition, where we obtain state-of-the-art results.
LGAug 17, 2020Code
FLBench: A Benchmark Suite for Federated LearningYuan Liang, Yange Guo, Yanxia Gong et al.
Federated learning is a new machine learning paradigm. The goal is to build a machine learning model from the data sets distributed on multiple devices so-called an isolated data island, while keeping their data secure and private. Most existing federated learning benchmarks work manually splits commonly used public datasets into partitions to simulate real world isolated data island scenarios. Still, this simulation fails to capture real world isolated data island intrinsic characteristics. This paper presents a federated learning (FL) benchmark suite named FLBench. FLBench contains three domains: medical, financial, and AIoT. By configuring various domains, FLBench is qualified to evaluate federated learning systems and algorithms essential aspects, like communication, scenario transformation, privacy-preserving, data distribution heterogeneity, and cooperation strategy. Hence, it becomes a promising platform for developing novel federated learning algorithms. Currently, FLBench is open sourced and in fast evolution. We package it as an automated deployment tool. The benchmark suite is available from https://www.benchcouncil.org/flbench.html.
LGJun 20, 2024
Younger: The First Dataset for Artificial Intelligence-Generated Neural Network ArchitectureZhengxin Yang, Wanling Gao, Luzhou Peng et al.
Designing and optimizing neural network architectures typically requires extensive expertise, starting with handcrafted designs and then manual or automated refinement. This dependency presents a significant barrier to rapid innovation. Recognizing the complexity of automatically generating neural network architecture from scratch, we introduce Younger, a pioneering dataset to advance this ambitious goal. Derived from over 174K real-world models across more than 30 tasks from various public model hubs, Younger includes 7,629 unique architectures, and each is represented as a directed acyclic graph with detailed operator-level information. The dataset facilitates two primary design paradigms: global, for creating complete architectures from scratch, and local, for detailed architecture component refinement. By establishing these capabilities, Younger contributes to a new frontier, Artificial Intelligence-Generated Neural Network Architecture (AIGNNA). Our experiments explore the potential and effectiveness of Younger for automated architecture generation and, as a secondary benefit, demonstrate that Younger can serve as a benchmark dataset, advancing the development of graph neural networks. We release the dataset and code publicly to lower the entry barriers and encourage further research in this challenging area.
AISep 9, 2021
OpenClinicalAI: enabling AI to diagnose diseases in real-world clinical settingsYunyou Huang, Nana Wang, Suqin Tang et al.
This paper quantitatively reveals the state-of-the-art and state-of-the-practice AI systems only achieve acceptable performance on the stringent conditions that all categories of subjects are known, which we call closed clinical settings, but fail to work in real-world clinical settings. Compared to the diagnosis task in the closed setting, real-world clinical settings pose severe challenges, and we must treat them differently. We build a clinical AI benchmark named Clinical AIBench to set up real-world clinical settings to facilitate researches. We propose an open, dynamic machine learning framework and develop an AI system named OpenClinicalAI to diagnose diseases in real-world clinical settings. The first versions of Clinical AIBench and OpenClinicalAI target Alzheimer's disease. In the real-world clinical setting, OpenClinicalAI significantly outperforms the state-of-the-art AI system. In addition, OpenClinicalAI develops personalized diagnosis strategies to avoid unnecessary testing and seamlessly collaborates with clinicians. It is promising to be embedded in the current medical systems to improve medical services.
AIApr 30, 2020
AIBench Training: Balanced Industry-Standard AI Training BenchmarkingFei Tang, Wanling Gao, Jianfeng Zhan et al.
Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks. Only using a few AI component benchmarks like MLPerfalone in the other stages may lead to misleading conclusions. Moreover, the learning dynamics are not well understood, and the benchmarks' shelf-life is short. This paper proposes a balanced benchmarking methodology. We use real-world benchmarks to cover the factors space that impacts the learning dynamics to the most considerable extent. After performing an exhaustive survey on Internet service AI domains, we identify and implement nineteen representative AI tasks with state-of-the-art models. For repeatable performance ranking (RPR subset) and workload characterization (WC subset), we keep two subsets to a minimum for affordability. We contribute by far the most comprehensive AI training benchmark suite. The evaluations show: (1) AIBench Training (v1.1) outperforms MLPerfTraining (v0.7) in terms of diversity and representativeness of model complexity, computational cost, convergent rate, computation, and memory access patterns, and hotspot functions; (2) Against the AIBench full benchmarks, its RPR subset shortens the benchmarking cost by 64%, while maintaining the primary workload characteristics; (3) The performance ranking shows the single-purpose AI accelerator like TPU with the optimized TensorFlowframework performs better than that of GPUs while losing the latter's general support for various AI models. The specification, source code, and performance numbers are available from the AIBench homepage https://www.benchcouncil.org/aibench-training/index.html.
PFFeb 17, 2020
AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark SuiteWanling Gao, Fei Tang, Jianfeng Zhan et al.
Domain-specific software and hardware co-design is encouraging as it is much easier to achieve efficiency for fewer tasks. Agile domain-specific benchmarking speeds up the process as it provides not only relevant design inputs but also relevant metrics, and tools. Unfortunately, modern workloads like Big data, AI, and Internet services dwarf the traditional one in terms of code size, deployment scale, and execution path, and hence raise serious benchmarking challenges. This paper proposes an agile domain-specific benchmarking methodology. Together with seventeen industry partners, we identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks. We propose the permutations of essential AI and non-AI component benchmarks as end-to-end benchmarks. An end-to-end benchmark is a distillation of the essential attributes of an industry-scale application. We design and implement a highly extensible, configurable, and flexible benchmark framework, on the basis of which, we propose the guideline for building end-to-end benchmarks, and present the first end-to-end Internet service AI benchmark. The preliminary evaluation shows the value of our benchmark suite---AIBench against MLPerf and TailBench for hardware and software designers, micro-architectural researchers, and code developers. The specifications, source code, testbed, and results are publicly available from the web site \url{http://www.benchcouncil.org/AIBench/index.html}.
CVAug 13, 2019
AIBench: An Industry Standard Internet Service AI Benchmark SuiteWanling Gao, Fei Tang, Lei Wang et al.
Today's Internet Services are undergoing fundamental changes and shifting to an intelligent computing era where AI is widely employed to augment services. In this context, many innovative AI algorithms, systems, and architectures are proposed, and thus the importance of benchmarking and evaluating them rises. However, modern Internet services adopt a microservice-based architecture and consist of various modules. The diversity of these modules and complexity of execution paths, the massive scale and complex hierarchy of datacenter infrastructure, the confidential issues of data sets and workloads pose great challenges to benchmarking. In this paper, we present the first industry-standard Internet service AI benchmark suite---AIBench with seventeen industry partners, including several top Internet service providers. AIBench provides a highly extensible, configurable, and flexible benchmark framework that contains loosely coupled modules. We identify sixteen prominent AI problem domains like learning to rank, each of which forms an AI component benchmark, from three most important Internet service domains: search engine, social network, and e-commerce, which is by far the most comprehensive AI benchmarking effort. On the basis of the AIBench framework, abstracting the real-world data sets and workloads from one of the top e-commerce providers, we design and implement the first end-to-end Internet service AI benchmark, which contains the primary modules in the critical paths of an industry scale application and is scalable to deploy on different cluster scales. The specifications, source code, and performance numbers are publicly available from the benchmark council web site http://www.benchcouncil.org/AIBench/index.html.
SPAug 1, 2019
LoadCNN: A Low Training Cost Deep Learning Model for Day-Ahead Individual Residential Load ForecastingYunyou Huang, Nana Wang, Wanling Gao et al.
Accurate day-ahead individual residential load forecasting is of great importance to various applications of smart grid on day-ahead market. Deep learning, as a powerful machine learning technology, has shown great advantages and promising application in load forecasting tasks. However, deep learning is a computationally-hungry method, and requires high costs (e.g., time, energy and CO2 emission) to train a deep learning model, which aggravates the energy crisis and incurs a substantial burden to the environment. As a consequence, the deep learning methods are difficult to be popularized and applied in the real smart grid environment. In this paper, we propose a low training cost model based on convolutional neural network, namely LoadCNN, for next-day load forecasting of individual resident with reduced training cost. The experiments show that the training time of LoadCNN is only approximately 1/54 of the one of other state-of-the-art models, and energy consumption and CO2 emissions are only approximate 1/45 of those of other state-of-the-art models based on the same indicators. Meanwhile, the prediction accuracy of our model is equal to that of current state-of-the-art models, making LoadCNN the first load forecasting model simultaneously achieving high prediction accuracy and low training costs. LoadCNN is an efficient green model that is able to be quickly, cost-effectively and environmentally-friendly deployed in a realistic smart grid environment.
IVJun 1, 2019
A Semantic-based Medical Image Fusion ApproachFanda Fan, Yunyou Huang, Lei Wang et al.
It is necessary for clinicians to comprehensively analyze patient information from different sources. Medical image fusion is a promising approach to providing overall information from medical images of different modalities. However, existing medical image fusion approaches ignore the semantics of images, making the fused image difficult to understand. In this work, we propose a new evaluation index to measure the semantic loss of fused image, and put forward a Fusion W-Net (FW-Net) for multimodal medical image fusion. The experimental results are promising: the fused image generated by our approach greatly reduces the semantic information loss, and has better visual effects in contrast to five state-of-art approaches. Our approach and tool have great potential to be applied in the clinical setting.
AIMay 8, 2019
A new direction to promote the implementation of artificial intelligence in natural clinical settingsYunyou Huang, Zhifei Zhang, Nana Wang et al.
Artificial intelligence (AI) researchers claim that they have made great `achievements' in clinical realms. However, clinicians point out the so-called `achievements' have no ability to implement into natural clinical settings. The root cause for this huge gap is that many essential features of natural clinical tasks are overlooked by AI system developers without medical background. In this paper, we propose that the clinical benchmark suite is a novel and promising direction to capture the essential features of the real-world clinical tasks, hence qualifies itself for guiding the development of AI systems, promoting the implementation of AI in real-world clinical practice.