Shaoshan Liu

RO
h-index13
50papers
741citations
Novelty33%
AI Score51

50 Papers

IVNov 21, 2022Code
AICOM-MP: an AI-based Monkeypox Detector for Resource-Constrained Environments

Tim Tianyi Yang, Tom Tianze Yang, Andrew Liu et al.

Under the Autonomous Mobile Clinics (AMCs) initiative, we are developing, open sourcing, and standardizing health AI technologies to enable healthcare access in least developed countries (LDCs). We deem AMCs as the next generation of health care delivery platforms, whereas health AI engines are applications on these platforms, similar to how various applications expand the usage scenarios of smart phones. Facing the recent global monkeypox outbreak, in this article, we introduce AICOM-MP, an AI-based monkeypox detector specially aiming for handling images taken from resource-constrained devices. Compared to existing AI-based monkeypox detectors, AICOM-MP has achieved state-of-the-art (SOTA) performance. We have hosted AICOM-MP as a web service to allow universal access to monkeypox screening technology. We have also open sourced both the source code and the dataset of AICOM-MP to allow health AI professionals to integrate AICOM-MP into their services. Also, through the AICOM-MP project, we have generalized a methodology of developing health AI technologies for AMCs to allow universal access even in resource-constrained environments.

ARDec 5, 2022
Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators

Abhishek Tyagi, Yiming Gan, Shaoshan Liu et al.

As Deep Neural Networks (DNNs) are increasingly deployed in safety critical and privacy sensitive applications such as autonomous driving and biometric authentication, it is critical to understand the fault-tolerance nature of DNNs. Prior work primarily focuses on metrics such as Failures In Time (FIT) rate and the Silent Data Corruption (SDC) rate, which quantify how often a device fails. Instead, this paper focuses on quantifying the DNN accuracy given that a transient error has occurred, which tells us how well a network behaves when a transient error occurs. We call this metric Resiliency Accuracy (RA). We show that existing RA formulation is fundamentally inaccurate, because it incorrectly assumes that software variables (model weights/activations) have equal faulty probability under hardware transient faults. We present an algorithm that captures the faulty probabilities of DNN variables under transient faults and, thus, provides correct RA estimations validated by hardware. To accelerate RA estimation, we reformulate RA calculation as a Monte Carlo integration problem, and solve it using importance sampling driven by DNN specific heuristics. Using our lightweight RA estimation method, we show that transient faults lead to far greater accuracy degradation than what todays DNN resiliency tools estimate. We show how our RA estimation tool can help design more resilient DNNs by integrating it with a Network Architecture Search framework.

CYApr 11, 2022
Autonomous Mobile Clinics: Empowering Affordable Anywhere Anytime Healthcare Access

Shaoshan Liu, Yuzhang Huang, Leiyu Shi

We are facing a global healthcare crisis today as the healthcare cost is ever climbing, but with the aging population, government fiscal revenue is ever dropping. To create a more efficient and effective healthcare system, three technical challenges immediately present themselves: healthcare access, healthcare equity, and healthcare efficiency. An autonomous mobile clinic solves the healthcare access problem by bringing healthcare services to the patient by the order of the patient's fingertips. Nevertheless, to enable a universal autonomous mobile clinic network, a three-stage technical roadmap needs to be achieved: In stage one, we focus on solving the inequity challenge in the existing healthcare system by combining autonomous mobility and telemedicine. In stage two, we develop an AI doctor for primary care, which we foster from infancy to adulthood with clean healthcare data. With the AI doctor, we can solve the inefficiency problem. In stage three, after we have proven that the autonomous mobile clinic network can truly solve the target clinical use cases, we shall open up the platform for all medical verticals, thus enabling universal healthcare through this whole new system.

AIJun 17, 2023Code
AI Clinics on Mobile (AICOM): Universal AI Doctors for the Underserved and Hard-to-Reach

Tim Tianyi Yang, Tom Tianze Yang, Na An et al.

This paper introduces Artificial Intelligence Clinics on Mobile (AICOM), an open-source project devoted to answering the United Nations Sustainable Development Goal 3 (SDG3) on health, which represents a universal recognition that health is fundamental to human capital and social and economic development. The core motivation for the AICOM project is the fact that over 80% of the people in the least developed countries (LDCs) own a mobile phone, even though less than 40% of these people have internet access. Hence, through enabling AI-based disease diagnostics and screening capability on affordable mobile phones without connectivity will be a critical first step to addressing healthcare access problems. The technologies developed in the AICOM project achieve exactly this goal, and we have demonstrated the effectiveness of AICOM on monkeypox screening tasks. We plan to continue expanding and open-sourcing the AICOM platform, aiming for it to evolve into an universal AI doctor for the Underserved and Hard-to-Reach.

ROMar 17Code
DySL-VLA: Efficient Vision-Language-Action Model Inference via Dynamic-Static Layer-Skipping for Robot Manipulation

Zebin Yang, Yijiahao Qi, Tong Xie et al.

Vision-Language-Action (VLA) models have shown remarkable success in robotic tasks like manipulation by fusing a language model's reasoning with a vision model's 3D understanding. However, their high computational cost remains a major obstacle for real-world applications that require real-time performance. We observe that the actions within a task have varying levels of importance: critical steps demand high precision, while less important ones can tolerate more variance. Leveraging this insight, we propose DySL-VLA, a novel framework that addresses computational cost by dynamically skipping VLA layers based on each action's importance. DySL-VLA categorizes its layers into two types: informative layers, which are consistently executed, and incremental layers, which can be selectively skipped. To intelligently skip layers without sacrificing accuracy, we invent a prior-post skipping guidance mechanism to determine when to initiate layer-skipping. We also propose a skip-aware two-stage knowledge distillation algorithm to efficiently train a standard VLA into a DySL-VLA. Our experiments indicate that DySL-VLA achieves 2.1% improvement in success length over Deer-VLA on the Calvin dataset, while simultaneously reducing trainable parameters by a factor of 85.7 and providing a 3.75x speedup relative to the RoboFlamingo baseline at iso-accuracy. Our code is available on https://github.com/PKU-SEC-Lab/DYSL_VLA.

ROJul 8, 2023
Autonomy 2.0: The Quest for Economies of Scale

Shuang Wu, Bo Yu, Shaoshan Liu et al.

With the advancement of robotics and AI technologies in the past decade, we have now entered the age of autonomous machines. In this new age of information technology, autonomous machines, such as service robots, autonomous drones, delivery robots, and autonomous vehicles, rather than humans, will provide services. In this article, through examining the technical challenges and economic impact of the digital economy, we argue that scalability is both highly necessary from a technical perspective and significantly advantageous from an economic perspective, thus is the key for the autonomy industry to achieve its full potential. Nonetheless, the current development paradigm, dubbed Autonomy 1.0, scales with the number of engineers, instead of with the amount of data or compute resources, hence preventing the autonomy industry to fully benefit from the economies of scale, especially the exponentially cheapening compute cost and the explosion of available data. We further analyze the key scalability blockers and explain how a new development paradigm, dubbed Autonomy 2.0, can address these problems to greatly boost the autonomy industry.

CYJul 23, 2023
A Comprehensive Review and Systematic Analysis of Artificial Intelligence Regulation Policies

Weiyue Wu, Shaoshan Liu

Due to the cultural and governance differences of countries around the world, there currently exists a wide spectrum of AI regulation policy proposals that have created a chaos in the global AI regulatory space. Properly regulating AI technologies is extremely challenging, as it requires a delicate balance between legal restrictions and technological developments. In this article, we first present a comprehensive review of AI regulation proposals from different geographical locations and cultural backgrounds. Then, drawing from historical lessons, we develop a framework to facilitate a thorough analysis of AI regulation proposals. Finally, we perform a systematic analysis of these AI regulation proposals to understand how each proposal may fail. This study, containing historical lessons and analysis methods, aims to help governing bodies untangling the AI regulatory chaos through a divide-and-conquer manner.

CYApr 8Code
Infrastructure First: Enabling Embodied AI for Science in the Global South

Shaoshan Liu, Jie Tang, Marwa S. Hassan et al.

Embodied AI for Science (EAI4S) brings intelligence into the laboratory by uniting perception, reasoning, and robotic action to autonomously run experiments in the physical world. For the Global South, this shift is not about adopting advanced automation for its own sake, but about overcoming a fundamental capacity constraint: too few hands to run too many experiments. By enabling continuous, reliable experimentation under limits of manpower, power, and connectivity, EAI4S turns automation from a luxury into essential scientific infrastructure. The main obstacle, however, is not algorithmic capability. It is infrastructure. Open-source AI and foundation models have narrowed the knowledge gap, but EAI4S depends on dependable edge compute, energy-efficient hardware, modular robotic systems, localized data pipelines, and open standards. Without these foundations, even the most capable models remain trapped in well-resourced laboratories. This article argues for an infrastructure-first approach to EAI4S and outlines the practical requirements for deploying embodied intelligence at scale, offering a concrete pathway for Global South institutions to translate AI advances into sustained scientific capacity and competitive research output.

CYJan 31, 2023
Compliance Costs of AI Technology Commercialization: A Field Deployment Perspective

Weiyue Wu, Shaoshan Liu

While Artificial Intelligence (AI) technologies are progressing fast, compliance costs have become a huge financial burden for AI startups, which are already constrained on research & development budgets. This situation creates a compliance trap, as many AI startups are not financially prepared to cope with a broad spectrum of regulatory requirements. Particularly, the complex and varying regulatory processes across the globe subtly give advantages to well-established and resourceful technology firms over resource-constrained AI startups [1]. The continuation of this trend may phase out the majority of AI startups and lead to giant technology firms' monopolies of AI technologies. To demonstrate the reality of the compliance trap, from a field deployment perspective, we delve into the details of compliance costs of AI commercial operations.

ROOct 21, 2025Code
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval

Zebin Yang, Sunjian Zheng, Tong Xie et al.

Object-goal navigation (ObjNav) tasks an agent with navigating to the location of a specific object in an unseen environment. Embodied agents equipped with large language models (LLMs) and online constructed navigation maps can perform ObjNav in a zero-shot manner. However, existing agents heavily rely on giant LLMs on the cloud, e.g., GPT-4, while directly switching to small LLMs, e.g., LLaMA3.2-11b, suffer from significant success rate drops due to limited model capacity for understanding complex navigation maps, which prevents deploying ObjNav on local devices. At the same time, the long prompt introduced by the navigation map description will cause high planning latency on local devices. In this paper, we propose EfficientNav to enable on-device efficient LLM-based zero-shot ObjNav. To help the smaller LLMs better understand the environment, we propose semantics-aware memory retrieval to prune redundant information in navigation maps. To reduce planning latency, we propose discrete memory caching and attention-based memory clustering to efficiently save and re-use the KV cache. Extensive experimental results demonstrate that EfficientNav achieves 11.1% improvement in success rate on HM3D benchmark over GPT-4-based baselines, and demonstrates 6.7x real-time latency reduction and 4.7x end-to-end latency reduction over GPT-4 planner. Our code will be released soon.

AIJun 4, 2025Code
Training Cross-Morphology Embodied AI Agents: From Practical Challenges to Theoretical Foundations

Shaoshan Liu, Fan Wang, Hongjun Zhou et al.

While theory and practice are often seen as separate domains, this article shows that theoretical insight is essential for overcoming real-world engineering barriers. We begin with a practical challenge: training a cross-morphology embodied AI policy that generalizes across diverse robot morphologies. We formalize this as the Heterogeneous Embodied Agent Training (HEAT) problem and prove it reduces to a structured Partially Observable Markov Decision Process (POMDP) that is PSPACE-complete. This result explains why current reinforcement learning pipelines break down under morphological diversity, due to sequential training constraints, memory-policy coupling, and data incompatibility. We further explore Collective Adaptation, a distributed learning alternative inspired by biological systems. Though NEXP-complete in theory, it offers meaningful scalability and deployment benefits in practice. This work illustrates how computational theory can illuminate system design trade-offs and guide the development of more robust, scalable embodied AI. For practitioners and researchers to explore this problem, the implementation code of this work has been made publicly available at https://github.com/airs-admin/HEAT

LGMar 24
StateLinFormer: Stateful Training Enhancing Long-term Memory in Navigation

Zhiyuan Chen, Yuxuan Zhong, Fan Wang et al.

Effective navigation intelligence relies on long-term memory to support both immediate generalization and sustained adaptation. However, existing approaches face a dilemma: modular systems rely on explicit mapping but lack flexibility, while Transformer-based end-to-end models are constrained by fixed context windows, limiting persistent memory across extended interactions. We introduce StateLinFormer, a linear-attention navigation model trained with a stateful memory mechanism that preserves recurrent memory states across consecutive training segments instead of reinitializing them at each batch boundary. This training paradigm effectively approximates learning on infinitely long sequences, enabling the model to achieve long-horizon memory retention. Experiments across both MAZE and ProcTHOR environments demonstrate that StateLinFormer significantly outperforms its stateless linear-attention counterpart and standard Transformer baselines with fixed context windows. Notably, as interaction length increases, persistent stateful training substantially improves context-dependent adaptation, suggesting an enhancement in the model's In-Context Learning (ICL) capabilities for navigation tasks.

LGFeb 28, 2024
ICE-SEARCH: A Language Model-Driven Feature Selection Approach

Tianze Yang, Tianyi Yang, Fuyuan Lyu et al.

This study unveils the In-Context Evolutionary Search (ICE-SEARCH) method, which is among the first works that melds large language models (LLMs) with evolutionary algorithms for feature selection (FS) tasks and demonstrates its effectiveness in Medical Predictive Analytics (MPA) applications. ICE-SEARCH harnesses the crossover and mutation capabilities inherent in LLMs within an evolutionary framework, significantly improving FS through the model's comprehensive world knowledge and its adaptability to a variety of roles. Our evaluation of this methodology spans three crucial MPA tasks: stroke, cardiovascular disease, and diabetes, where ICE-SEARCH outperforms traditional FS methods in pinpointing essential features for medical applications. ICE-SEARCH achieves State-of-the-Art (SOTA) performance in stroke prediction and diabetes prediction; the Decision-Randomized ICE-SEARCH ranks as SOTA in cardiovascular disease prediction. The study emphasizes the critical role of incorporating domain-specific insights, illustrating ICE-SEARCH's robustness, generalizability, and convergence. This opens avenues for further research into comprehensive and intricate FS landscapes, marking a significant stride in the application of artificial intelligence in medical predictive analytics.

LGFeb 5, 2025
Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds

Fan Wang, Pengtao Shao, Yiming Zhang et al.

In-Context Reinforcement Learning (ICRL) enables agents to learn automatically and on-the-fly from their interactive experiences. However, a major challenge in scaling up ICRL is the lack of scalable task collections. To address this, we propose the procedurally generated tabular Markov Decision Processes, named AnyMDP. Through a carefully designed randomization process, AnyMDP is capable of generating high-quality tasks on a large scale while maintaining relatively low structural biases. To facilitate efficient meta-training at scale, we further introduce decoupled policy distillation and induce prior information in the ICRL framework. Our results demonstrate that, with a sufficiently large scale of AnyMDP tasks, the proposed model can generalize to tasks that were not considered in the training set through versatile in-context learning paradigms. The scalable task set provided by AnyMDP also enables a more thorough empirical investigation of the relationship between data distribution and ICRL performance. We further show that the generalization of ICRL potentially comes at the cost of increased task diversity and longer adaptation periods. This finding carries critical implications for scaling robust ICRL capabilities, highlighting the necessity of diverse and extensive task design, and prioritizing asymptotic performance over few-shot adaptation.

CYApr 7
The Biggest Risk of Embodied AI is Governance Lag

Shaoshan Liu

Embodied AI is widely discussed as a job-displacement problem. The deeper risk, however, is governance lag: the inability of public institutions to keep pace with how fast the technology spreads through the physical economy. As reusable robotic platforms are combined with increasingly general AI models, embodied AI may scale across manufacturing, logistics, care, and infrastructure faster than governance systems can observe, interpret, and respond. We argue that this lag appears in three connected forms: observational, institutional, and distributive. The central policy challenge, therefore, is not automation alone, but whether governance and compliance systems can adapt before disruption becomes entrenched.

AIMay 29, 2025
Conceptual Framework Toward Embodied Collective Adaptive Intelligence

Fan Wang, Shaoshan Liu

Collective Adaptive Intelligence (CAI) represent a transformative approach in embodied AI, wherein numerous autonomous agents collaborate, adapt, and self-organize to navigate complex, dynamic environments. By enabling systems to reconfigure themselves in response to unforeseen challenges, CAI facilitate robust performance in real-world scenarios. This article introduces a conceptual framework for designing and analyzing CAI. It delineates key attributes including task generalization, resilience, scalability, and self-assembly, aiming to bridge theoretical foundations with practical methodologies for engineering adaptive, emergent intelligence. By providing a structured foundation for understanding and implementing CAI, this work seeks to guide researchers and practitioners in developing more resilient, scalable, and adaptable AI systems across various domains.

NIMar 13
A Standards-Aligned Coordination Framework for Edge-Enhanced Collaborative Healthcare in 6G Networks

Liuwang Kang, Fan Wang, Yuzhang Huang et al.

Mission-critical healthcare applications including real-time intensive care monitoring, ambulance-to-hospital orchestration, and distributed medical imaging inference require workflow-level, time-bounded coordination across heterogeneous devices, edge servers, and network control entities. While current 3GPP and O-RAN standards excel at per-device control and quality-of-service enforcement, they do not natively expose abstractions for workflow-level coordination under strict clinical timing constraints, leaving this capability to fragile, application-specific overlays. This article outlines the Collective Adaptive Intelligence Plane (CAIP) as a standards-aligned coordination framework that addresses this abstraction gap without introducing new protocol layers. CAIP is realized through minimal, backward-compatible coordination profiles anchored to existing RRC, QoS/SDAP, and O-RAN E2 interfaces, enabling workflow-scoped coordination context binding, deadline-aware coordination pacing, semantic flow association, and privacy-preserving data locality across distributed clinical entities. We analyze the structural limitations of existing standards, present a concrete interface mapping to 3GPP and O-RAN mechanisms, illustrate deployment through a representative ICU coordination scenario, and outline a phased standardization roadmap from proof-of-concept xApp deployment to AI-native 6G specification evolution. The proposed framework is incrementally deployable on current 5G Advanced infrastructure and provides a principled migration path toward workflow-level coordination abstraction as a first-class capability in future 6G healthcare networks.

CYOct 29, 2025
Human Resilience in the AI Era -- What Machines Can't Replace

Shaoshan Liu, Anina Schwarzenbach, Yiyu Shi

AI is displacing tasks, mediating high-stakes decisions, and flooding communication with synthetic content, unsettling work, identity, and social trust. We argue that the decisive human countermeasure is resilience. We define resilience across three layers: psychological, including emotion regulation, meaning-making, cognitive flexibility; social, including trust, social capital, coordinated response; organizational, including psychological safety, feedback mechanisms, and graceful degradation. We synthesize early evidence that these capacities buffer individual strain, reduce burnout through social support, and lower silent failure in AI-mediated workflows through team norms and risk-responsive governance. We also show that resilience can be cultivated through training that complements rather than substitutes for structural safeguards. By reframing the AI debate around actionable human resilience, this article offers policymakers, educators, and operators a practical lens to preserve human agency and steer responsible adoption.

LGSep 26, 2025
In-Context Learning can Perform Continual Learning Like Humans

Liuwang Kang, Fan Wang, Shaoshan Liu et al.

Large language models (LLMs) can adapt to new tasks via in-context learning (ICL) without parameter updates, making them powerful learning engines for fast adaptation. While extensive research has examined ICL as a few-shot learner, whether it can achieve long-term retention and cross-task knowledge accumulation when multitasks arrive sequentially remains underexplored. Motivated by human memory studies, we investigate the retention characteristics of ICL in multitask settings and extend it to in-context continual learning (ICCL), where continual learning ability emerges through task scheduling and prompt rearrangement. Experiments on Markov-Chain benchmarks demonstrate that, for specific large-language models, ICCL benefits from distributed practice (DP) in a manner analogous to humans, consistently revealing a spacing "sweet spot" for retention. Beyond retention performance, we propose a human-retention similarity metric to quantify how closely a continual-learning (CL) method aligns with human retention dynamics. Using this metric, we show that linear-attention models such as MAMBA and RWKV exhibit particularly human-like retention patterns, despite their retention performance lagging behind that of Transformer-based LLMs. Overall, our results establish ICCL as both cognitively plausible and practically effective, providing an inference-only CL paradigm that mitigates catastrophic forgetting and addresses the stability-plasticity dilemma in conventional CL methods.

LGSep 26, 2025
Context and Diversity Matter: The Emergence of In-Context Learning in World Models

Fan Wang, Zhiyuan Chen, Yuxuan Zhong et al.

The capability of predicting environmental dynamics underpins both biological neural systems and general embodied AI in adapting to their surroundings. Yet prevailing approaches rest on static world models that falter when confronted with novel or rare configurations. We investigate in-context environment learning (ICEL), shifting attention from zero-shot performance to the growth and asymptotic limits of the world model. Our contributions are three-fold: (1) we formalize in-context learning of a world model and identify two core mechanisms: environment recognition and environment learning; (2) we derive error upper-bounds for both mechanisms that expose how the mechanisms emerge; and (3) we empirically confirm that distinct ICL mechanisms exist in the world model, and we further investigate how data distribution and model architecture affect ICL in a manner consistent with theory. These findings demonstrate the potential of self-adapting world models and highlight the key factors behind the emergence of ICEL, most notably the necessity of long context and diverse environments.

CYDec 17, 2021
Dilemma of the Artificial Intelligence Regulatory Landscape

Weiyue Wu, Shaoshan Liu

As a startup company in the autonomous driving space, we have undergone four years of painful experiences dealing with a broad spectrum of regulatory requirements. Compared to the software industry norm, which spends 13% of their overall budget on compliances, we were forced to spend 42% of our budget on compliances. Our situation is not alone and, in a way, reflects the dilemma of the artificial intelligence (AI) regulatory landscape. The root cause is the lack of AI expertise in the legislative and executive branches, leading to a lack of standardization for the industry to follow. In this article, we share our first-hand experiences and advocate for the establishment of an FDA-like agency to regulate AI properly.

ROOct 12, 2021
Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card

Hsin-Hsuan Sung, Yuanchao Xu, Jiexiong Guan et al.

Autonomous driving is of great interest in both research and industry. The high cost has been one of the major roadblocks that slow down the development and adoption of autonomous driving in practice. This paper, for the first-time, shows that it is possible to run level-4 (i.e., fully autonomous driving) software on a single off-the-shelf card (Jetson AGX Xavier) for less than $1k, an order of magnitude less than the state-of-the-art systems, while meeting all the requirements of latency. The success comes from the resolution of some important issues shared by existing practices through a series of measures and innovations. The study overturns the common perceptions of the computing resources required by level-4 autonomous driving, points out a promising path for the industry to lower the cost, and suggests a number of research opportunities for rethinking the architecture, software design, and optimizations of autonomous driving.

ARSep 15, 2021
Dataflow Accelerator Architecture for Autonomous Machine Computing

Shaoshan Liu, Yuhao Zhu, Bo Yu et al.

Commercial autonomous machines is a thriving sector, one that is likely the next ubiquitous computing platform, after Personal Computers (PC), cloud computing, and mobile computing. Nevertheless, a suitable computing substrate for autonomous machines is missing, and many companies are forced to develop ad hoc computing solutions that are neither principled nor extensible. By analyzing the demands of autonomous machine computing, this article proposes Dataflow Accelerator Architecture (DAA), a modern instantiation of the classic dataflow principle, that matches the characteristics of autonomous machine software.

DCJul 2, 2021
4C: A Computation, Communication, and Control Co-Design Framework for CAVs

Liangkai Liu, Shaoshan Liu, Weisong Shi

Connected and autonomous vehicles (CAVs) are promising due to their potential safety and efficiency benefits and have attracted massive investment and interest from government agencies, industry, and academia. With more computing and communication resources are available, both vehicles and edge servers are equipped with a set of camera-based vision sensors, also known as Visual IoT (V-IoT) techniques, for sensing and perception. Tremendous efforts have been made for achieving programmable communication, computation, and control. However, they are conducted mainly in the silo mode, limiting the responsiveness and efficiency of handling challenging scenarios in the real world. To improve the end-to-end performance, we envision that future CAVs require the co-design of communication, computation, and control. This paper presents our vision of the end-to-end design principle for CAVs, called 4C, which extends the V-IoT system by providing a unified communication, computation, and control co-design framework. With programmable communications, fine-grained heterogeneous computation, and efficient vehicle controls in 4C, CAVs can handle critical scenarios and achieve energy-efficient autonomous driving. Finally, we present several challenges to achieving the vision of the 4C framework.

ROJun 26, 2021
Rise of the Autonomous Machines

Shaoshan Liu, Jean-Luc Gaudiot

After decades of uninterrupted progress and growth, information technology has so evolved that it can be said we are entering the age of autonomous machines, but there exist many roadblocks in the way of making this a reality. In this article, we make a preliminary attempt at recognizing and categorizing the technical and non-technical challenges of autonomous machines; for each of the ten areas we have identified, we review current status, roadblocks, and potential research directions. It is hoped that this will help the community define clear, effective, and more formal development goalposts for the future.

ARApr 11, 2021
iELAS: An ELAS-Based Energy-Efficient Accelerator for Real-Time Stereo Matching on FPGA Platform

Tian Gao, Zishen Wan, Yuyang Zhang et al.

Stereo matching is a critical task for robot navigation and autonomous vehicles, providing the depth estimation of surroundings. Among all stereo matching algorithms, Efficient Large-scale Stereo (ELAS) offers one of the best tradeoffs between efficiency and accuracy. However, due to the inherent iterative process and unpredictable memory access pattern, ELAS can only run at 1.5-3 fps on high-end CPUs and difficult to achieve real-time performance on low-power platforms. In this paper, we propose an energy-efficient architecture for real-time ELAS-based stereo matching on FPGA platform. Moreover, the original computational-intensive and irregular triangulation module is reformed in a regular manner with points interpolation, which is much more hardware-friendly. Optimizations, including memory management, parallelism, and pipelining, are further utilized to reduce memory footprint and improve throughput. Compared with Intel i7 CPU and the state-of-the-art CPU+FPGA implementation, our FPGA realization achieves up to 38.4x and 3.32x frame rate improvement, and up to 27.1x and 1.13x energy efficiency improvement, respectively.

ARApr 1, 2021
An Energy-Efficient Quad-Camera Visual System for Autonomous Machines on FPGA Platform

Zishen Wan, Yuyang Zhang, Arijit Raychowdhury et al.

In our past few years' of commercial deployment experiences, we identify localization as a critical task in autonomous machine applications, and a great acceleration target. In this paper, based on the observation that the visual frontend is a major performance and energy consumption bottleneck, we present our design and implementation of an energy-efficient hardware architecture for ORB (Oriented-Fast and Rotated- BRIEF) based localization system on FPGAs. To support our multi-sensor autonomous machine localization system, we present hardware synchronization, frame-multiplexing, and parallelization techniques, which are integrated in our design. Compared to Nvidia TX1 and Intel i7, our FPGA-based implementation achieves 5.6x and 3.4x speedup, as well as 3.0x and 34.6x power reduction, respectively.

ROMar 30, 2021
The Matter of Time -- A General and Efficient System for Precise Sensor Synchronization in Robotic Computing

Shaoshan Liu, Bo Yu, Yahui Liu et al.

Time synchronization is a critical task in robotic computing such as autonomous driving. In the past few years, as we developed advanced robotic applications, our synchronization system has evolved as well. In this paper, we first introduce the time synchronization problem and explain the challenges of time synchronization, especially in robotic workloads. Summarizing these challenges, we then present a general hardware synchronization system for robotic computing, which delivers high synchronization accuracy while maintaining low energy and resource consumption. The proposed hardware synchronization system is a key building block in our future robotic products.

ROMar 3, 2021
Towards Fully Intelligent Transportation through Infrastructure-Vehicle Cooperative Autonomous Driving: Challenges and Opportunities

Shaoshan Liu, Bo Yu, Jie Tang et al.

The infrastructure-vehicle cooperative autonomous driving approach depends on the cooperation between intelligent roads and intelligent vehicles. This approach is not only safer but also more economical compared to the traditional on-vehicle-only autonomous driving approach. In this paper, we introduce our real-world deployment experiences of cooperative autonomous driving, and delve into the details of new challenges and opportunities. Specifically, based on our progress towards commercial deployment, we follow a three-stage development roadmap of the cooperative autonomous driving approach:infrastructure-augmented autonomous driving (IAAD), infrastructure-guided autonomous driving (IGAD), and infrastructure-planned autonomous driving (IPAD).

AIFeb 16, 2021
Engineering Education in the Age of Autonomous Machines

Shaoshan Liu, Jean-Luc Gaudiot, Hironori Kasahara

In the past few years, we have observed a huge supply-demand gap for autonomous driving engineers. The core problem is that autonomous driving is not one single technology but rather a complex system integrating many technologies, and no one single academic department can provide comprehensive education in this field. We advocate to create a cross-disciplinary program to expose students with technical background in computer science, computer engineering, electrical engineering, as well as mechanical engineering. On top of the cross-disciplinary technical foundation, a capstone project that provides students with hands-on experiences of working with a real autonomous vehicle is required to consolidate the technical foundation.

RONov 12, 2020
On Designing Computing Systems for Autonomous Vehicles: a PerceptIn Case Study

Bo Yu, Jie Tang, Shaoshan Liu

PerceptIn develops and commercializes autonomous vehicles for micromobility around the globe. This paper makes a holistic summary of PerceptIn's development and operating experiences. This paper provides the business tale behind our product, and presents the development of the computing system for our vehicles. We illustrate the design decision made for the computing system, and show the advantage of offloading localization workloads onto an FPGA platform.

ROOct 11, 2020
An Energy-Efficient High Definition Map Data Distribution Mechanism for Autonomous Driving

Jinliang Xie, Jie Tang, Shaoshan Liu

Autonomous Driving is now the promising future of transportation. As one basis for autonomous driving, High Definition Map (HD map) provides high-precision descriptions of the environment, therefore it enables more accurate perception and localization while improving the efficiency of path planning. However, an extremely large amount of map data needs to be transmitted during driving, thus posing great challenge for real-time and safety requirements for autonomous driving. To this end, we first demonstrate how the existing data distribution mechanism can support HD map services. Furthermore, considering the constraints of vehicle power, vehicle speed, base station bandwidth, etc., we propose a HD map data distribution mechanism on top of Vehicle-to-Infrastructure (V2I) data transmission. By this mechanism, the map provision task is allocated to the selected RSU nodes and transmits proportionate HD map data cooperatively. Their works on map data loading aims to provide in-time HD map data service with optimized in-vehicle energy consumption. Finally, we model the selection of RSU nodes into a partial knapsack problem and propose a greedy strategy-based data transmission algorithm. Experimental results confirm that within limited energy consumption, the proposed mechanism can ensure HD map data service by coordinating multiple RSUs with the shortest data transmission time.

ROSep 13, 2020
A Survey of FPGA-Based Robotic Computing

Zishen Wan, Bo Yu, Thomas Yuang Li et al.

Recent researches on robotics have shown significant improvement, spanning from algorithms, mechanics to hardware architectures. Robotics, including manipulators, legged robots, drones, and autonomous vehicles, are now widely applied in diverse scenarios. However, the high computation and data complexity of robotic algorithms pose great challenges to its applications. On the one hand, CPU platform is flexible to handle multiple robotic tasks. GPU platform has higher computational capacities and easy-touse development frameworks, so they have been widely adopted in several applications. On the other hand, FPGA-based robotic accelerators are becoming increasingly competitive alternatives, especially in latency-critical and power-limited scenarios. With specialized designed hardware logic and algorithm kernels, FPGA-based accelerators can surpass CPU and GPU in performance and energy efficiency. In this paper, we give an overview of previous work on FPGA-based robotic accelerators covering different stages of the robotic system pipeline. An analysis of software and hardware optimization techniques and main technical issues is presented, along with some commercial and space applications, to serve as a guide for future work.

CYSep 7, 2020
Critical Business Decision Making for Technology Startups -- A PerceptIn Case Study

Shaoshan Liu

Most business decisions are made with analysis, but some are judgment calls not susceptible to analysis due to time or information constraints. In this article, we present a real-life case study of critical business decision making of PerceptIn, an autonomous driving technology startup. In early years of PerceptIn, PerceptIn had to make a decision on the design of computing systems for its autonomous vehicle products. By providing details on PerceptIn's decision process and the results of the decision, we hope to provide some insights that can be beneficial to entrepreneurs and engineering managers in technology startups.

IVAug 16, 2020
Real-Time Spatio-Temporal LiDAR Point Cloud Compression

Yu Feng, Shaoshan Liu, Yuhao Zhu

Compressing massive LiDAR point clouds in real-time is critical to autonomous machines such as drones and self-driving cars. While most of the recent prior work has focused on compressing individual point cloud frames, this paper proposes a novel system that effectively compresses a sequence of point clouds. The idea to exploit both the spatial and temporal redundancies in a sequence of point cloud frames. We first identify a key frame in a point cloud sequence and spatially encode the key frame by iterative plane fitting. We then exploit the fact that consecutive point clouds have large overlaps in the physical space, and thus spatially encoded data can be (re-)used to encode the temporal stream. Temporal encoding by reusing spatial encoding data not only improves the compression rate, but also avoids redundant computations, which significantly improves the compression speed. Experiments show that our compression system achieves 40x to 90x compression rate, significantly higher than the MPEG's LiDAR point cloud compression standard, while retaining high end-to-end application accuracies. Meanwhile, our compression system has a compression speed that matches the point cloud generation rate by today LiDARs and out-performs existing compression systems, enabling real-time point cloud transmission.

LGMar 14, 2020
CoCoPIE: Making Mobile AI Sweet As PIE --Compression-Compilation Co-Design Goes a Long Way

Shaoshan Liu, Bin Ren, Xipeng Shen et al.

Assuming hardware is the major constraint for enabling real-time mobile intelligence, the industry has mainly dedicated their efforts to developing specialized hardware accelerators for machine learning and inference. This article challenges the assumption. By drawing on a recent real-time AI optimization framework CoCoPIE, it maintains that with effective compression-compiler co-design, it is possible to enable real-time artificial intelligence on mainstream end devices without special hardware. CoCoPIE is a software framework that holds numerous records on mobile AI: the first framework that supports all main kinds of DNNs, from CNNs to RNNs, transformer, language models, and so on; the fastest DNN pruning and acceleration framework, up to 180X faster compared with current DNN pruning on other frameworks such as TensorFlow-Lite; making many representative AI applications able to run in real-time on off-the-shelf mobile devices that have been previously regarded possible only with special hardware support; making off-the-shelf mobile devices outperform a number of representative ASIC and FPGA solutions in terms of energy efficiency and/or performance.

ROJan 22, 2020
Autonomous Last-mile Delivery Vehicles in Complex Traffic Environments

Bai Li, Shaoshan Liu, Jie Tang et al.

E-commerce has evolved with the digital technology revolution over the years. Last-mile logistics service contributes a significant part of the e-commerce experience. In contrast to the traditional last-mile logistics services, smart logistics service with autonomous driving technologies provides a promising solution to reduce the delivery cost and to improve efficiency. However, the traffic conditions in complex traffic environments, such as those in China, are more challenging compared to those in well-developed countries. Many types of moving objects (such as pedestrians, bicycles, electric bicycles, and motorcycles, etc.) share the road with autonomous vehicles, and their behaviors are not easy to track and predict. This paper introduces a technical solution from JD.com, a leading E-commerce company in China, to the autonomous last-mile delivery in complex traffic environments. Concretely, the methodologies in each module of our autonomous vehicles are presented, together with safety guarantee strategies. Up to this point, JD.com has deployed more than 300 self-driving vehicles for trial operations in tens of provinces of China, with an accumulated 715,819 miles and up to millions of on-road testing hours.

ROAug 12, 2019
Enabling Commercial Autonomous Space Robotic Explorers

Thomas Yuang Li, Shaoshan Liu

In contrast to manned missions, the application of autonomous robots for space exploration missions decreases the safety concerns of the exploration missions while extending the exploration distance since returning transportation is not necessary for robotics missions. In addition, the employment of robots in these missions also decreases mission complexities and costs because there is no need for onboard life support systems: robots can withstand and operate in harsh conditions, for instance, extreme temperature, pressure, and radiation, where humans cannot survive. In this article, we introduce environments on Mars, review the existing autonomous driving techniques deployed on Earth, as well as explore technologies required to enable future commercial autonomous space robotic explorers. Last but not least, we also present that one of the urgent technical challenges for autonomous space explorers, namely, computing power onboard.

IVMay 7, 2019
PI-BA Bundle Adjustment Acceleration on Embedded FPGAs with Co-observation Optimization

Shuzhen Qin, Qiang Liu, Bo Yu et al.

Bundle adjustment (BA) is a fundamental optimization technique used in many crucial applications, including 3D scene reconstruction, robotic localization, camera calibration, autonomous driving, space exploration, street view map generation etc. Essentially, BA is a joint non-linear optimization problem, and one which can consume a significant amount of time and power, especially for large optimization problems. Previous approaches of optimizing BA performance heavily rely on parallel processing or distributed computing, which trade higher power consumption for higher performance. In this paper we propose π-BA, the first hardware-software co-designed BA engine on an embedded FPGA-SoC that exploits custom hardware for higher performance and power efficiency. Specifically, based on our key observation that not all points appear on all images in a BA problem, we designed and implemented a Co-Observation Optimization technique to accelerate BA operations with optimized usage of memory and computation resources. Experimental results confirm that π-BA outperforms the existing software implementations in terms of performance and power consumption.

CVMar 6, 2018
Trifo-VIO: Robust and Efficient Stereo Visual Inertial Odometry using Points and Lines

Feng Zheng, Grace Tsai, Zhe Zhang et al.

In this paper, we present the Trifo Visual Inertial Odometry (Trifo-VIO), a tightly-coupled filtering-based stereo VIO system using both points and lines. Line features help improve system robustness in challenging scenarios when point features cannot be reliably detected or tracked, e.g. low-texture environment or lighting change. In addition, we propose a novel lightweight filtering-based loop closing technique to reduce accumulated drift without global bundle adjustment or pose graph optimization. We formulate loop closure as EKF updates to optimally relocate the current sliding window maintained by the filter to past keyframes. We also present the Trifo Ironsides dataset, a new visual-inertial dataset, featuring high-quality synchronized stereo camera and IMU data from the Ironsides sensor [3] with various motion types and textures and millimeter-accuracy groundtruth. To validate the performance of the proposed system, we conduct extensive comparison with state-of-the-art approaches (OKVIS, VINS-MONO and S-MSCKF) using both the public EuRoC dataset and the Trifo Ironsides dataset.

ROFeb 23, 2018
PIRT: A Runtime Framework to Enable Energy-Efficient Real-Time Robotic Applications on Heterogeneous Architectures

Liu Liu, Shaoshan Liu, Zhe Zhang et al.

Enabling full robotic workloads with diverse behaviors on mobile systems with stringent resource and energy constraints remains a challenge. In recent years, attempts have been made to deploy single-accelerator-based computing platforms (such as GPU, DSP, or FPGA) to address this challenge, but with little success. The core problem is two-fold: firstly, different robotic tasks require different accelerators, and secondly, managing multiple accelerators simultaneously is overwhelming for developers. In this paper, we propose PIRT, the first robotic runtime framework to efficiently manage dynamic task executions on mobile systems with multiple accelerators as well as on the cloud to achieve better performance and energy savings. With PIRT, we enable a robot to simultaneously perform autonomous navigation with 25 FPS of localization, obstacle detection with 3 FPS, route planning, large map generation, and scene understanding, traveling at a max speed of 5 miles per hour, all within an 11W computing power envelope.

CYFeb 22, 2018
Teaching Autonomous Driving Using a Modular and Integrated Approach

Jie Tang, Shaoshan Liu, Songwen Pei et al.

Autonomous driving is not one single technology but rather a complex system integrating many technologies, which means that teaching autonomous driving is a challenging task. Indeed, most existing autonomous driving classes focus on one of the technologies involved. This not only fails to provide a comprehensive coverage, but also sets a high entry barrier for students with different technology backgrounds. In this paper, we present a modular, integrated approach to teaching autonomous driving. Specifically, we organize the technologies used in autonomous driving into modules. This is described in the textbook we have developed as well as a series of multimedia online lectures designed to provide technical overview for each module. Then, once the students have understood these modules, the experimental platforms for integration we have developed allow the students to fully understand how the modules interact with each other. To verify this teaching approach, we present three case studies: an introductory class on autonomous driving for students with only a basic technology background; a new session in an existing embedded systems class to demonstrate how embedded system technologies can be applied to autonomous driving; and an industry professional training session to quickly bring up experienced engineers to work in autonomous driving. The results show that students can maintain a high interest level and make great progress by starting with familiar concepts before moving onto other modules.

ROOct 18, 2017
FPGA-based ORB Feature Extraction for Real-Time Visual SLAM

Weikang Fang, Yanjun Zhang, Bo Yu et al.

Simultaneous Localization And Mapping (SLAM) is the problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it. How to enable SLAM robustly and durably on mobile, or even IoT grade devices, is the main challenge faced by the industry today. The main problems we need to address are: 1.) how to accelerate the SLAM pipeline to meet real-time requirements; and 2.) how to reduce SLAM energy consumption to extend battery life. After delving into the problem, we found out that feature extraction is indeed the bottleneck of performance and energy consumption. Hence, in this paper, we design, implement, and evaluate a hardware ORB feature extractor and prove that our design is a great balance between performance and energy consumption compared with ARM Krait and Intel Core i5.

ROOct 2, 2017
PIRVS: An Advanced Visual-Inertial SLAM System with Flexible Sensor Fusion and Hardware Co-Design

Zhe Zhang, Shaoshan Liu, Grace Tsai et al.

In this paper, we present the PerceptIn Robotics Vision System (PIRVS) system, a visual-inertial computing hardware with embedded simultaneous localization and mapping (SLAM) algorithm. The PIRVS hardware is equipped with a multi-core processor, a global-shutter stereo camera, and an IMU with precise hardware synchronization. The PIRVS software features a novel and flexible sensor fusion approach to not only tightly integrate visual measurements with inertial measurements and also to loosely couple with additional sensor modalities. It runs in real-time on both PC and the PIRVS hardware. We perform a thorough evaluation of the proposed system using multiple public visual-inertial datasets. Experimental results demonstrate that our system reaches comparable accuracy of state-of-the-art visual-inertial algorithms on PC, while being more efficient on the PIRVS hardware.

DCMay 31, 2017
Distributed Simulation Platform for Autonomous Driving

Jie Tang, Shaoshan Liu, Chao Wang et al.

Autonomous vehicle safety and reliability are the paramount requirements when developing autonomous vehicles. These requirements are guaranteed by massive functional and performance tests. Conducting these tests on real vehicles is extremely expensive and time consuming, and thus it is imperative to develop a simulation platform to perform these tasks. For simulation, we can utilize the Robot Operating System (ROS) for data playback to test newly developed algorithms. However, due to the massive amount of simulation data, performing simulation on single machines is not practical. Hence, a high-performance distributed simulation platform is a critical piece in autonomous driving development. In this paper we present our experiences of building a production distributed autonomous driving simulation platform. This platform is built upon Spark distributed framework, for distributed computing management, and ROS, for data playback simulations.

ROMay 31, 2017
Real-Time Robot Localization, Vision, and Speech Recognition on Nvidia Jetson TX1

Jie Tang, Yong Ren, Shaoshan Liu

Robotics systems are complex, often consisted of basic services including SLAM for localization and mapping, Convolution Neural Networks for scene understanding, and Speech Recognition for user interaction, etc. Meanwhile, robots are mobile and usually have tight energy constraints, integrating these services onto an embedded platform with around 10 W of power consumption is critical to the proliferation of mobile robots. In this paper, we present a case study on integrating real-time localization, vision, and speech recognition services on a mobile SoC, Nvidia Jetson TX1, within about 10 W of power envelope. In addition, we explore whether offloading some of the services to cloud platform can lead to further energy efficiency while meeting the real-time requirements

DCApr 16, 2017
Learn-Memorize-Recall-Reduce A Robotic Cloud Computing Paradigm

Shaoshan Liu, Bolin Ding, Jie Tang et al.

The rise of robotic applications has led to the generation of a huge volume of unstructured data, whereas the current cloud infrastructure was designed to process limited amounts of structured data. To address this problem, we propose a learn-memorize-recall-reduce paradigm for robotic cloud computing. The learning stage converts incoming unstructured data into structured data; the memorization stage provides effective storage for the massive amount of data; the recall stage provides efficient means to retrieve the raw data; while the reduction stage provides means to make sense of this massive amount of unstructured data with limited computing resources.

LGApr 12, 2017
Enabling Embedded Inference Engine with ARM Compute Library: A Case Study

Dawei Sun, Shaoshan Liu, Jean-Luc Gaudiot

When you need to enable deep learning on low-cost embedded SoCs, is it better to port an existing deep learning framework or should you build one from scratch? In this paper, we share our practical experiences of building an embedded inference engine using ARM Compute Library (ACL). The results show that, contradictory to conventional wisdoms, for simple models, it takes much less development time to build an inference engine from scratch compared to porting existing frameworks. In addition, by utilizing ACL, we managed to build an inference engine that outperforms TensorFlow by 25%. Our conclusion is that, on embedded devices, we most likely will use very simple deep learning models for inference, and with well-developed building blocks such as ACL, it may be better in both performance and development time to build the engine from scratch.

DCApr 10, 2017
Implementing a Cloud Platform for Autonomous Driving

Shaoshan Liu, Jie Tang, Chao Wang et al.

Autonomous driving clouds provide essential services to support autonomous vehicles. Today these services include but not limited to distributed simulation tests for new algorithm deployment, offline deep learning model training, and High-Definition (HD) map generation. These services require infrastructure support including distributed computing, distributed storage, as well as heterogeneous computing. In this paper, we present the details of how we implement a unified autonomous driving cloud infrastructure, and how we support these services on top of this infrastructure.

ARFeb 7, 2017
CAAD: Computer Architecture for Autonomous Driving

Shaoshan Liu, Jie Tang, Zhe Zhang et al.

We describe the computing tasks involved in autonomous driving, examine existing autonomous driving computing platform implementations. To enable autonomous driving, the computing stack needs to simultaneously provide high performance, low power consumption, and low thermal dissipation, at low cost. We discuss possible approaches to design computing platforms that will meet these needs.