20.8CVMay 28
Geodesics with Unified Tangent-constrained Priors and Curvature RegularizationChong Di, Li Liu, Jinglin Zhang et al.
Curvature-penalized geodesic models have proven their effectiveness in image segmentation by computing globally optimal curves. Unfortunately, these models remain susceptible to shortcuts when delineating objects with complex shapes and image intensity distributions, as they lack mechanisms to enforce shape-aware tangent constraints. To address this limitation, we propose a unified geodesic framework that integrates tangent-constrained priors with curvature penalization. The key idea is to formulate tangent admissibility directly within the orientation-lifted space, where path tangents are restricted to spatially varying angular sectors derived from intrinsic shape representatives (ISR) such as skeletons or interior landmarks. This formulation gives rise to a family of tangent-constrained Finslerian metrics, extending the classical curvature-penalized geodesic models while enforcing mandatory tangent constraints. The resulting Hamilton-Jacobi-Bellman (HJB) partial differential equations (PDEs) admit efficient numerical solutions via variants of the fast marching method, preserving the single-pass computational complexity. Experiments on synthetic, natural, and medical images demonstrate that the proposed geodesic framework indeed improves robustness against weak boundaries and topological shortcuts, yielding segmentation results with enhanced shape fidelity compared to existing geodesic models.
NIAug 26, 2024
Towards Battery-Free Wireless Sensing via Radio-Frequency Energy HarvestingTao Ni, Zehua Sun, Mingda Han et al.
Diverse Wi-Fi-based wireless applications have been proposed, ranging from daily activity recognition to vital sign monitoring. Despite their remarkable sensing accuracy, the high energy consumption and the requirement for customized hardware modification hinder the wide deployment of the existing sensing solutions. In this paper, we propose REHSense, an energy-efficient wireless sensing solution based on Radio-Frequency (RF) energy harvesting. Instead of relying on a power-hungry Wi-Fi receiver, REHSense leverages an RF energy harvester as the sensor and utilizes the voltage signals harvested from the ambient Wi-Fi signals to enable simultaneous context sensing and energy harvesting. We design and implement REHSense using a commercial-off-the-shelf (COTS) RF energy harvester. Extensive evaluation of three fine-grained wireless sensing tasks (i.e., respiration monitoring, human activity, and hand gesture recognition) shows that REHSense can achieve comparable sensing accuracy with conventional Wi-Fi-based solutions while adapting to different sensing environments, reducing the power consumption by 98.7% and harvesting up to 4.5mW of power from RF energy.
54.9ARApr 11
Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile EdgeJiesong Chen, Jun You, Zhidan Liu et al.
Precise estimation of model inference latency is crucial for time-critical mobile edge applications, enabling devices to calculate latency margins against deadlines and trade them for enhanced model performance or resource savings. However, the ubiquity of Dynamic Voltage and Frequency Scaling (DVFS) renders traditional static profiling invalid in real-world deployments, as inference latency fluctuates with varying processor (CPU and GPU) frequencies. While extensive profiling across frequency combinations is theoretically possible, it is prohibitively expensive, particularly for emerging Small Language Models (SLMs), where variable context lengths explode the profiling up to days. We observe that simple analytic scaling fails to predict these fluctuations due to the complex asynchronous coupling between CPU (kernel launching) and GPU (execution). In this paper, we introduce FLAME to accurately estimate inference latency across frequency combinations. It features a novel layer-wise modeling that quantifies the overlapping parallelism and then aggregates dynamic pipeline bubbles caused by asynchronous processor interactions when extending to the full model. This bottom-up approach ensures generalizability across diverse models from DNNs to SLMs, and its precise modeling allows for profiling a sparse subset of samples, cutting DNN profiling from hours to minutes and SLM profiling from days to mere minutes, while maintaining small estimation errors across frequencies. We further showcase FLAME's utility in a deadline-aware DVFS, outperforming the state-of-the-art approach in both power efficiency and latency guarantees.
76.8HCMar 30
NeuroPath: Practically Adopting Motor Imagery Decoding through EEG SignalsJiani Cao, Kun Wang, Yang Liu et al.
Motor Imagery (MI) is an emerging Brain-Computer Interface (BCI) paradigm where a person imagines body movements without physical action. By decoding scalp-recorded electroencephalography (EEG) signals, BCIs establish direct communication to control external devices, offering significant potential in prosthetics, rehabilitation, and human-computer interaction. However, existing solutions remain difficult to deploy. (i) Most employ independent, opaque models for each MI task, lacking a unified architectural foundation. Consequently, these models are trained in isolation, failing to learn robust representations from diverse datasets, resulting in modest performance. (ii) They primarily adopt fixed sensor deployment, whereas real-world setups vary in electrode number and placement, causing models to fail across configurations. (iii) Performance degrades sharply under low-SNR conditions typical of consumer-grade EEG. To address these challenges, we present NeuroPath, a neural architecture for robust MI decoding. NeuroPath takes inspiration from the brain's signal pathway from cortex to scalp, utilizing a deep neural architecture with specialized modules for signal filtering, spatial representation learning, and feature classification, enabling unified decoding. To handle varying electrode configurations, we introduce a spatially aware graph adapter accommodating different electrode numbers and placements. To enhance robustness under low-SNR conditions, NeuroPath incorporates multimodal auxiliary training to refine EEG representations and stabilize performance on noisy real-world data. Evaluations on three consumer-grade and three medical-grade public datasets demonstrate that NeuroPath achieves superior performance.
LGJan 30, 2024
SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory BudgetKun Wang, Jiani Cao, Zimu Zhou et al.
Executing deep neural networks (DNNs) on edge artificial intelligence (AI) devices enables various autonomous mobile computing applications. However, the memory budget of edge AI devices restricts the number and complexity of DNNs allowed in such applications. Existing solutions, such as model compression or cloud offloading, reduce the memory footprint of DNN inference at the cost of decreased model accuracy or autonomy. To avoid these drawbacks, we divide DNN into blocks and swap them in and out in order, such that large DNNs can execute within a small memory budget. Nevertheless, naive swapping on edge AI devices induces significant delays due to the redundant memory operations in the DNN development ecosystem for edge AI devices. To this end, we develop SwapNet, an efficient DNN block swapping middleware for edge AI devices. We systematically eliminate the unnecessary memory operations during block swapping while retaining compatible with the deep learning frameworks, GPU backends, and hardware architectures of edge AI devices. We further showcase the utility of SwapNet via a multi-DNN scheduling scheme. Evaluations on eleven DNN inference tasks in three applications demonstrate that SwapNet achieves almost the same latency as the case with sufficient memory even when DNNs demand 2.32x to 5.81x memory beyond the available budget. The design of SwapNet also provides novel and feasible insights for deploying large language models (LLMs) on edge AI devices in the future.
LGJan 17, 2024
DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with PruningLixiang Han, Zhen Xiao, Zhenjiang Li
DTMM is a library designed for efficient deployment and execution of machine learning models on weak IoT devices such as microcontroller units (MCUs). The motivation for designing DTMM comes from the emerging field of tiny machine learning (TinyML), which explores extending the reach of machine learning to many low-end IoT devices to achieve ubiquitous intelligence. Due to the weak capability of embedded devices, it is necessary to compress models by pruning enough weights before deploying. Although pruning has been studied extensively on many computing platforms, two key issues with pruning methods are exacerbated on MCUs: models need to be deeply compressed without significantly compromising accuracy, and they should perform efficiently after pruning. Current solutions only achieve one of these objectives, but not both. In this paper, we find that pruned models have great potential for efficient deployment and execution on MCUs. Therefore, we propose DTMM with pruning unit selection, pre-execution pruning optimizations, runtime acceleration, and post-execution low-cost storage to fill the gap for efficient deployment and execution of pruned models. It can be integrated into commercial ML frameworks for practical deployment, and a prototype system has been developed. Extensive experiments on various models show promising gains compared to state-of-the-art methods.
SEDec 12, 2024
EmbedGenius: Towards Automated Software Development for Generic Embedded IoT SystemsHuanqi Yang, Mingzhe Li, Mingda Han et al.
Embedded IoT system development is crucial for enabling seamless connectivity and functionality across a wide range of applications. However, such a complex process requires cross-domain knowledge of hardware and software and hence often necessitates direct developer involvement, making it labor-intensive, time-consuming, and error-prone. To address this challenge, this paper introduces EmbedGenius, the first fully automated software development platform for general-purpose embedded IoT systems. The key idea is to leverage the reasoning ability of Large Language Models (LLMs) and embedded system expertise to automate the hardware-in-the-loop development process. The main methods include a component-aware library resolution method for addressing hardware dependencies, a library knowledge generation method that injects utility domain knowledge into LLMs, and an auto-programming method that ensures successful deployment. We evaluate EmbedGenius's performance across 71 modules and four mainstream embedded development platforms with over 350 IoT tasks. Experimental results show that EmbedGenius can generate codes with an accuracy of 95.7% and complete tasks with a success rate of 86.5%, surpassing human-in-the-loop baselines by 15.6%--37.7% and 25.5%--53.4%, respectively. We also show EmbedGenius's potential through case studies in environmental monitoring and remote control systems development.
IVJun 14, 2024
A Deep Learning System for Rapid and Accurate Warning of Acute Aortic Syndrome on Non-contrast CT in ChinaYujian Hu, Yilang Xiang, Yan-Jie Zhou et al.
The accurate and timely diagnosis of acute aortic syndromes (AAS) in patients presenting with acute chest pain remains a clinical challenge. Aortic CT angiography (CTA) is the imaging protocol of choice in patients with suspected AAS. However, due to economic and workflow constraints in China, the majority of suspected patients initially undergo non-contrast CT as the initial imaging testing, and CTA is reserved for those at higher risk. In this work, we present an artificial intelligence-based warning system, iAorta, using non-contrast CT for AAS identification in China, which demonstrates remarkably high accuracy and provides clinicians with interpretable warnings. iAorta was evaluated through a comprehensive step-wise study. In the multi-center retrospective study (n = 20,750), iAorta achieved a mean area under the receiver operating curve (AUC) of 0.958 (95% CI 0.950-0.967). In the large-scale real-world study (n = 137,525), iAorta demonstrated consistently high performance across various non-contrast CT protocols, achieving a sensitivity of 0.913-0.942 and a specificity of 0.991-0.993. In the prospective comparative study (n = 13,846), iAorta demonstrated the capability to significantly shorten the time to correct diagnostic pathway. For the prospective pilot deployment that we conducted, iAorta correctly identified 21 out of 22 patients with AAS among 15,584 consecutive patients presenting with acute chest pain and under non-contrast CT protocol in the emergency department (ED) and enabled the average diagnostic time of these 21 AAS positive patients to be 102.1 (75-133) mins. Last, the iAorta can help avoid delayed or missed diagnosis of AAS in settings where non-contrast CT remains the unavoidable the initial or only imaging test in resource-constrained regions and in patients who cannot or did not receive intravenous contrast.
IVOct 13, 2021
Breaking the Dilemma of Medical Image-to-image TranslationLingke Kong, Chenyu Lian, Detian Huang et al.
Supervised Pix2Pix and unsupervised Cycle-consistency are two modes that dominate the field of medical image-to-image translation. However, neither modes are ideal. The Pix2Pix mode has excellent performance. But it requires paired and well pixel-wise aligned images, which may not always be achievable due to respiratory motion or anatomy change between times that paired images are acquired. The Cycle-consistency mode is less stringent with training data and works well on unpaired or misaligned images. But its performance may not be optimal. In order to break the dilemma of the existing modes, we propose a new unsupervised mode called RegGAN for medical image-to-image translation. It is based on the theory of "loss-correction". In RegGAN, the misaligned target images are considered as noisy labels and the generator is trained with an additional registration network to fit the misaligned noise distribution adaptively. The goal is to search for the common optimal solution to both image-to-image translation and registration tasks. We incorporated RegGAN into a few state-of-the-art image-to-image translation methods and demonstrated that RegGAN could be easily combined with these methods to improve their performances. Such as a simple CycleGAN in our mode surpasses latest NICEGAN even though using less network parameters. Based on our results, RegGAN outperformed both Pix2Pix on aligned data and Cycle-consistency on misaligned or unpaired data. RegGAN is insensitive to noises which makes it a better choice for a wide range of scenarios, especially for medical image-to-image translation tasks in which well pixel-wise aligned data are not available
NIFeb 22, 2021
InaudibleKey: Generic Inaudible Acoustic Signal based Key Agreement Protocol for Mobile DevicesWeitao Xu, Zhenjiang Li, Wanli Xue et al.
Secure Device-to-Device (D2D) communication is becoming increasingly important with the ever-growing number of Internet-of-Things (IoT) devices in our daily life. To achieve secure D2D communication, the key agreement between different IoT devices without any prior knowledge is becoming desirable. Although various approaches have been proposed in the literature, they suffer from a number of limitations, such as low key generation rate and short pairing distance. In this paper, we present InaudibleKey, an inaudible acoustic signal-based key generation protocol for mobile devices. Based on acoustic channel reciprocity, InaudibleKey exploits the acoustic channel frequency response of two legitimate devices as a common secret to generating keys. InaudibleKey employs several novel technologies to significantly improve its performance. We conduct extensive experiments to evaluate the proposed system in different real environments. Compared to state-of-the-art works, InaudibleKey improves key generation rate by 3-145 times, extends pairing distance by 3.2-44 times, and reduces information reconciliation counts by 2.5-16 times. Security analysis demonstrates that InaudibleKey is resilient to a number of malicious attacks. We also implement InaudibleKey on modern smartphones and resource-limited IoT devices. Results show that it is energy-efficient and can run on both powerful and resource-limited IoT devices without incurring excessive resource consumption.
CRDec 4, 2018
Continuous User Authentication by Contactless Wireless SensingFei Wang, Zhenjiang Li, Jinsong Han
This paper presents BodyPIN, which is a continuous user authentication system by contactless wireless sensing using commodity Wi-Fi. BodyPIN can track the current user's legal identity throughout a computer system's execution. In case the authentication fails, the consequent accesses will be denied to protect the system. The recent rich wireless-based user identification designs cannot be applied to BodyPIN directly, because they identify a user's various activities, rather than the user herself. The enforced to be performed activities can thus interrupt the user's operations on the system, highly inconvenient and not user-friendly. In this paper, we leverage the bio-electromagnetics domain human model for quantifying the impact of human body on the bypassing Wi-Fi signals and deriving the component that indicates a user's identity. Then we extract suitable Wi-Fi signal features to fully represent such an identity component, based on which we fulfill the continuous user authentication design. We implement a BodyPIN prototype by commodity Wi-Fi NICs without any extra or dedicated wireless hardware. We show that BodyPIN achieves promising authentication performances, which is also lightweight and robust under various practical settings.