38.7SPMay 31
SweetFruit: A Two-Stage Mobile Sensing System for Real-Time Fruit Sugar EstimationMark Cardamis, Yanxiang Wang, Chun Tung Chou et al.
Accurate prediction of fruit sugar content is essential for quality control and market valuation in agriculture. Conventional measurement techniques rely on destructive, time-consuming processes (e.g., juicing and refractometry) or direct contact instruments, which hinder high-throughput operations. This paper introduces SweetFruit, a mobile two-stage system that leverages low-cost sensors to estimate fruit sugar content without contact. In Stage 1, we implement a lightweight 3D deep learning model (SF-PointNet) that uses point clouds from a Time-of-Flight (ToF) depth camera to classify fruit as high or low sugar. In Stage 2, a regression network (SF-Net) predicts the fruit's Brix value using measurements from a compact 18-channel near-infrared (NIR) spectrometer. The system uses simple off-the-shelf sensors (AS7265x NIR and Arducam ToF) with efficient processing pipelines for real-time execution on embedded platforms. Experiments on green 'Granny Smith' apples and strawberries demonstrate the system's effectiveness. Stage 1 achieves over 90% classification accuracy, enabling rapid prescreening, while Stage 2 delivers precise sugar estimates, with a root mean square error (RMSE) of 0.57 Brix, reducing error by 22% compared to using NIR sensing alone. SweetFruit offers a scalable, field-ready solution for rapid fruit quality screening, showcasing the benefits of task-specific multimodal sensing in mobile agricultural applications.
CVNov 24, 2025
Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi SensingCheng Jiang, Yihe Yan, Yanxiang Wang et al.
While Wi-Fi sensing offers a compelling, privacy-preserving alternative to cameras, its practical utility has been fundamentally undermined by a lack of robustness across domains. Models trained in one setup fail to generalize to new environments, hardware, or users, a critical "domain shift" problem exacerbated by modest, fragmented public datasets. We shift from this limited paradigm and apply a foundation model approach, leveraging Masked Autoencoding (MAE) style pretraining on the largest and most heterogeneous Wi-Fi CSI datasets collection assembled to date. Our study pretrains and evaluates models on over 1.3 million samples extracted from 14 datasets, collected using 4 distinct devices across the 2.4/5/6 GHz bands and bandwidths from 20 to 160 MHz. Our large-scale evaluation is the first to systematically disentangle the impacts of data diversity versus model capacity on cross-domain performance. The results establish scaling trends on Wi-Fi CSI sensing. First, our experiments show log-linear improvements in unseen domain performance as the amount of pretraining data increases, suggesting that data scale and diversity are key to domain generalization. Second, based on the current data volume, larger model can only provide marginal gains for cross-domain performance, indicating that data, rather than model capacity, is the current bottleneck for Wi-Fi sensing generalization. Finally, we conduct a series of cross-domain evaluations on human activity recognition, human gesture recognition and user identification tasks. The results show that the large-scale pretraining improves cross-domain accuracy ranging from 2.2% to 15.7%, compared to the supervised learning baseline. Overall, our findings provide insightful direction for designing future Wi-Fi sensing systems that can eventually be robust enough for real-world deployment.