SPMay 31
SweetFruit: A Two-Stage Mobile Sensing System for Real-Time Fruit Sugar EstimationMark Cardamis, Yanxiang Wang, Chun Tung Chou et al.
Accurate prediction of fruit sugar content is essential for quality control and market valuation in agriculture. Conventional measurement techniques rely on destructive, time-consuming processes (e.g., juicing and refractometry) or direct contact instruments, which hinder high-throughput operations. This paper introduces SweetFruit, a mobile two-stage system that leverages low-cost sensors to estimate fruit sugar content without contact. In Stage 1, we implement a lightweight 3D deep learning model (SF-PointNet) that uses point clouds from a Time-of-Flight (ToF) depth camera to classify fruit as high or low sugar. In Stage 2, a regression network (SF-Net) predicts the fruit's Brix value using measurements from a compact 18-channel near-infrared (NIR) spectrometer. The system uses simple off-the-shelf sensors (AS7265x NIR and Arducam ToF) with efficient processing pipelines for real-time execution on embedded platforms. Experiments on green 'Granny Smith' apples and strawberries demonstrate the system's effectiveness. Stage 1 achieves over 90% classification accuracy, enabling rapid prescreening, while Stage 2 delivers precise sugar estimates, with a root mean square error (RMSE) of 0.57 Brix, reducing error by 22% compared to using NIR sensing alone. SweetFruit offers a scalable, field-ready solution for rapid fruit quality screening, showcasing the benefits of task-specific multimodal sensing in mobile agricultural applications.
CVAug 3, 2024
Multiple Contexts and Frequencies Aggregation Network forDeepfake DetectionZifeng Li, Wenzhong Tang, Shijun Gao et al.
Deepfake detection faces increasing challenges since the fast growth of generative models in developing massive and diverse Deepfake technologies. Recent advances rely on introducing heuristic features from spatial or frequency domains rather than modeling general forgery features within backbones. To address this issue, we turn to the backbone design with two intuitive priors from spatial and frequency detectors, \textit{i.e.,} learning robust spatial attributes and frequency distributions that are discriminative for real and fake samples. To this end, we propose an efficient network for face forgery detection named MkfaNet, which consists of two core modules. For spatial contexts, we design a Multi-Kernel Aggregator that adaptively selects organ features extracted by multiple convolutions for modeling subtle facial differences between real and fake faces. For the frequency components, we propose a Multi-Frequency Aggregator to process different bands of frequency components by adaptively reweighing high-frequency and low-frequency features. Comprehensive experiments on seven popular deepfake detection benchmarks demonstrate that our proposed MkfaNet variants achieve superior performances in both within-domain and across-domain evaluations with impressive efficiency of parameter usage.
HCMay 7
SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language RecognitionXiaofang Xiao, Guangchao Li, Guangrong Zhao et al.
Automatic sign language recognition (SLR) has become a key enabler of inclusive human-computer interaction, fostering seamless communication between deaf individuals and hearing communities. Despite significant advances in multimodal learning, existing SLR research remains dominated by vision-based datasets, which are limited by sensitivity to lighting and occlusion, privacy concerns, and a lack of cross-modal diversity. To address these challenges, we introduce SIGMA-ASL, a large-scale multimodal dataset for SLR. The dataset integrates an Azure Kinect RGB-D camera, a millimeter-wave (mmWave) radar, and two wrist-worn inertial measurement units (IMUs) to capture complementary visual, radio-reflection, and kinematic information. Collected in a controlled studio environment with 20 participants performing 160 common American sign language (ASL) signs, SIGMA-ASL provides 93,545 temporally synchronized word-level multimodal clips. A unified sensing framework achieves millisecond-level alignment across modalities, enabling reliable sensor fusion and cross-modal learning. We further design standardized preprocessing pipelines and benchmarking protocols under both user-dependent and user-independent settings, offering a comprehensive foundation for evaluating single and multimodal SLR. Extensive experiments validate the dataset's quality and demonstrate its potential as a valuable resource for developing robust, privacy-preserving, and ubiquitous sign language recognition systems.
CVNov 24, 2025
Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi SensingCheng Jiang, Yihe Yan, Yanxiang Wang et al.
While Wi-Fi sensing offers a compelling, privacy-preserving alternative to cameras, its practical utility has been fundamentally undermined by a lack of robustness across domains. Models trained in one setup fail to generalize to new environments, hardware, or users, a critical "domain shift" problem exacerbated by modest, fragmented public datasets. We shift from this limited paradigm and apply a foundation model approach, leveraging Masked Autoencoding (MAE) style pretraining on the largest and most heterogeneous Wi-Fi CSI datasets collection assembled to date. Our study pretrains and evaluates models on over 1.3 million samples extracted from 14 datasets, collected using 4 distinct devices across the 2.4/5/6 GHz bands and bandwidths from 20 to 160 MHz. Our large-scale evaluation is the first to systematically disentangle the impacts of data diversity versus model capacity on cross-domain performance. The results establish scaling trends on Wi-Fi CSI sensing. First, our experiments show log-linear improvements in unseen domain performance as the amount of pretraining data increases, suggesting that data scale and diversity are key to domain generalization. Second, based on the current data volume, larger model can only provide marginal gains for cross-domain performance, indicating that data, rather than model capacity, is the current bottleneck for Wi-Fi sensing generalization. Finally, we conduct a series of cross-domain evaluations on human activity recognition, human gesture recognition and user identification tasks. The results show that the large-scale pretraining improves cross-domain accuracy ranging from 2.2% to 15.7%, compared to the supervised learning baseline. Overall, our findings provide insightful direction for designing future Wi-Fi sensing systems that can eventually be robust enough for real-world deployment.
CRJun 27, 2021
Secure Reversible Data Hiding in Encrypted Images Using Cipher-Feedback Secret SharingZhongyun Hua, Yanxiang Wang, Shuang Yi et al.
Reversible data hiding in encrypted images (RDH-EI) has attracted increasing attention, since it can protect the privacy of original images while the embedded data can be exactly extracted. Recently, some RDH-EI schemes with multiple data hiders have been proposed using secret sharing technique. However, these schemes protect the contents of the original images with lightweight security level. In this paper, we propose a high-security RDH-EI scheme with multiple data hiders. First, we introduce a cipher-feedback secret sharing (CFSS) technique. It follows the cryptography standards by introducing the cipher-feedback strategy of AES. Then, using the CFSS technique, we devise a new (r,n)-threshold (r<=n) RDH-EI scheme with multiple data hiders called CFSS-RDHEI. It can encrypt an original image into n encrypted images with reduced size using an encryption key and sends each encrypted image to one data hider. Each data hider can independently embed secret data into the encrypted image to obtain the corresponding marked encrypted image. The original image can be completely recovered from r marked encrypted images and the encryption key. Performance evaluations show that our CFSS-RDHEI scheme has high embedding rate and its generated encrypted images are much smaller, compared to existing secret sharing-based RDH-EI schemes. Security analysis demonstrates that it can achieve high security to defense some commonly used security attacks.
SIAug 13, 2012
Social Event Detection with Interaction Graph ModelingYanxiang Wang, Hari Sundaram, Lexing Xie
This paper focuses on detecting social, physical-world events from photos posted on social media sites. The problem is important: cheap media capture devices have significantly increased the number of photos shared on these sites. The main contribution of this paper is to incorporate online social interaction features in the detection of physical events. We believe that online social interaction reflect important signals among the participants on the "social affinity" of two photos, thereby helping event detection. We compute social affinity via a random-walk on a social interaction graph to determine similarity between two photos on the graph. We train a support vector machine classifier to combine the social affinity between photos and photo-centric metadata including time, location, tags and description. Incremental clustering is then used to group photos to event clusters. We have very good results on two large scale real-world datasets: Upcoming and MediaEval. We show an improvement between 0.06-0.10 in F1 on these datasets.