Yao Guo

CV
h-index25
46papers
1,309citations
Novelty47%
AI Score47

46 Papers

CVJul 20, 2022
Tackling Long-Tailed Category Distribution Under Domain Shifts

Xiao Gu, Yao Guo, Zeju Li et al. · oxford

Machine learning models fail to perform well on real-world applications when 1) the category distribution P(Y) of the training dataset suffers from long-tailed distribution and 2) the test data is drawn from different conditional distributions P(X|Y). Existing approaches cannot handle the scenario where both issues exist, which however is common for real-world applications. In this study, we took a step forward and looked into the problem of long-tailed classification under domain shifts. We designed three novel core functional blocks including Distribution Calibrated Classification Loss, Visual-Semantic Mapping and Semantic-Similarity Guided Augmentation. Furthermore, we adopted a meta-learning framework which integrates these three blocks to improve domain generalization on unseen target domains. Two new datasets were proposed for this problem, named AWA2-LTS and ImageNet-LTS. We evaluated our method on the two datasets and extensive experimental results demonstrate that our proposed method can achieve superior performance over state-of-the-art long-tailed/domain generalization approaches and the combinations. Source codes and datasets can be found at our project page https://xiaogu.site/LTDS.

CVMar 2, 2022
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Zhihao Yuan, Xu Yan, Yinghong Liao et al.

3D dense captioning aims to describe individual objects by natural language in 3D scenes, where 3D scenes are usually represented as RGB-D scans or point clouds. However, only exploiting single modal information, e.g., point cloud, previous approaches fail to produce faithful descriptions. Though aggregating 2D features into point clouds may be beneficial, it introduces an extra computational burden, especially in inference phases. In this study, we investigate a cross-modal knowledge transfer using Transformer for 3D dense captioning, X-Trans2Cap, to effectively boost the performance of single-modal 3D caption through knowledge distillation using a teacher-student framework. In practice, during the training phase, the teacher network exploits auxiliary 2D modality and guides the student network that only takes point clouds as input through the feature consistency constraints. Owing to the well-designed cross-modal feature fusion module and the feature alignment in the training phase, X-Trans2Cap acquires rich appearance information embedded in 2D images with ease. Thus, a more faithful caption can be generated only using point clouds during the inference. Qualitative and quantitative results confirm that X-Trans2Cap outperforms previous state-of-the-art by a large margin, i.e., about +21 and about +16 absolute CIDEr score on ScanRefer and Nr3D datasets, respectively.

CVOct 4, 2022
APAUNet: Axis Projection Attention UNet for Small Target in 3D Medical Segmentation

Yuncheng Jiang, Zixun Zhang, Shixi Qin et al.

In 3D medical image segmentation, small targets segmentation is crucial for diagnosis but still faces challenges. In this paper, we propose the Axis Projection Attention UNet, named APAUNet, for 3D medical image segmentation, especially for small targets. Considering the large proportion of the background in the 3D feature space, we introduce a projection strategy to project the 3D features into three orthogonal 2D planes to capture the contextual attention from different views. In this way, we can filter out the redundant feature information and mitigate the loss of critical information for small lesions in 3D scans. Then we utilize a dimension hybridization strategy to fuse the 3D features with attention from different axes and merge them by a weighted summation to adaptively learn the importance of different perspectives. Finally, in the APA Decoder, we concatenate both high and low resolution features in the 2D projection process, thereby obtaining more precise multi-scale information, which is vital for small lesion segmentation. Quantitative and qualitative experimental results on two public datasets (BTCV and MSD) demonstrate that our proposed APAUNet outperforms the other methods. Concretely, our APAUNet achieves an average dice score of 87.84 on BTCV, 84.48 on MSD-Liver and 69.13 on MSD-Pancreas, and significantly surpass the previous SOTA methods on small targets.

CROct 11, 2023
No Privacy Left Outside: On the (In-)Security of TEE-Shielded DNN Partition for On-Device ML

Ziqi Zhang, Chen Gong, Yifeng Cai et al.

On-device ML introduces new security challenges: DNN models become white-box accessible to device users. Based on white-box information, adversaries can conduct effective model stealing (MS) and membership inference attack (MIA). Using Trusted Execution Environments (TEEs) to shield on-device DNN models aims to downgrade (easy) white-box attacks to (harder) black-box attacks. However, one major shortcoming is the sharply increased latency (up to 50X). To accelerate TEE-shield DNN computation with GPUs, researchers proposed several model partition techniques. These solutions, referred to as TEE-Shielded DNN Partition (TSDP), partition a DNN model into two parts, offloading the privacy-insensitive part to the GPU while shielding the privacy-sensitive part within the TEE. This paper benchmarks existing TSDP solutions using both MS and MIA across a variety of DNN models, datasets, and metrics. We show important findings that existing TSDP solutions are vulnerable to privacy-stealing attacks and are not as safe as commonly believed. We also unveil the inherent difficulty in deciding optimal DNN partition configurations (i.e., the highest security with minimal utility cost) for present TSDP solutions. The experiments show that such ``sweet spot'' configurations vary across datasets and models. Based on lessons harvested from the experiments, we present TEESlice, a novel TSDP method that defends against MS and MIA during DNN inference. TEESlice follows a partition-before-training strategy, which allows for accurate separation between privacy-related weights from public weights. TEESlice delivers the same security protection as shielding the entire DNN model inside TEE (the ``upper-bound'' security guarantees) with over 10X less overhead (in both experimental and real-world environments) than prior TSDP solutions and no accuracy loss.

LGApr 11, 2023
Neural Delay Differential Equations: System Reconstruction and Image Classification

Qunxi Zhu, Yao Guo, Wei Lin

Neural Ordinary Differential Equations (NODEs), a framework of continuous-depth neural networks, have been widely applied, showing exceptional efficacy in coping with representative datasets. Recently, an augmented framework has been developed to overcome some limitations that emerged in the application of the original framework. In this paper, we propose a new class of continuous-depth neural networks with delay, named Neural Delay Differential Equations (NDDEs). To compute the corresponding gradients, we use the adjoint sensitivity method to obtain the delayed dynamics of the adjoint. Differential equations with delays are typically seen as dynamical systems of infinite dimension that possess more fruitful dynamics. Compared to NODEs, NDDEs have a stronger capacity of nonlinear representations. We use several illustrative examples to demonstrate this outstanding capacity. Firstly, we successfully model the delayed dynamics where the trajectories in the lower-dimensional phase space could be mutually intersected and even chaotic in a model-free or model-based manner. Traditional NODEs, without any argumentation, are not directly applicable for such modeling. Secondly, we achieve lower loss and higher accuracy not only for the data produced synthetically by complex models but also for the CIFAR10, a well-known image dataset. Our results on the NDDEs demonstrate that appropriately articulating the elements of dynamical systems into the network design is truly beneficial in promoting network performance.

CVJul 5, 2022
Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases

Zhihao Yuan, Xu Yan, Zhuo Li et al.

Recent progress in 3D scene understanding has explored visual grounding (3DVG) to localize a target object through a language description. However, existing methods only consider the dependency between the entire sentence and the target object, ignoring fine-grained relationships between contexts and non-target ones. In this paper, we extend 3DVG to a more fine-grained and interpretable task, called 3D Phrase Aware Grounding (3DPAG). The 3DPAG task aims to localize the target objects in a 3D scene by explicitly identifying all phrase-related objects and then conducting the reasoning according to contextual phrases. To tackle this problem, we manually labeled about 227K phrase-level annotations using a self-developed platform, from 88K sentences of widely used 3DVG datasets, i.e., Nr3D, Sr3D and ScanRefer. By tapping on our datasets, we can extend previous 3DVG methods to the fine-grained phrase-aware scenario. It is achieved through the proposed novel phrase-object alignment optimization and phrase-specific pre-training, boosting conventional 3DVG performance as well. Extensive results confirm significant improvements, i.e., previous state-of-the-art method achieves 3.9%, 3.5% and 4.6% overall accuracy gains on Nr3D, Sr3D and ScanRefer respectively.

CRApr 20
No Data? No Problem: Synthesizing Security Graphs for Better Intrusion Detection

Yi Huang, Shaofei Li, Yao Guo et al.

Provenance graph analysis plays a vital role in intrusion detection, particularly against Advanced Persistent Threats (APTs), by exposing complex attack patterns. While recent systems combine graph neural networks (GNNs) with natural language processing (NLP) to capture structural and semantic features, their effectiveness is limited by class imbalance in real-world data. To address this, we introduce PROVSYN, a novel hybrid provenance graph synthesis framework, which comprises three components: (1) graph structure synthesis via heterogeneous graph generation models, (2) textual attribute synthesis via fine-tuned Large Language Models (LLMs), and (3) five-dimensional fidelity evaluation. Experiments on six benchmark datasets demonstrate that PROVSYN consistently produces higher-fidelity graphs across the five evaluation dimensions compared to four strong baselines. To further demonstrate the practical utility of PROVSYN, we utilize the synthesized graphs to augment training datasets for downstream APT detection models. The results show that PROVSYN effectively mitigates data imbalance, improving normalized entropy by up to 35%, and enhances the generalizability of downstream detection models, achieving an accuracy improvement of up to 38%.

CVDec 26, 2023Code
Learning Deformable Hypothesis Sampling for Accurate PatchMatch Multi-View Stereo

Hongjie Li, Yao Guo, Xianwei Zheng et al.

This paper introduces a learnable Deformable Hypothesis Sampler (DeformSampler) to address the challenging issue of noisy depth estimation for accurate PatchMatch Multi-View Stereo (MVS). We observe that the heuristic depth hypothesis sampling modes employed by PatchMatch MVS solvers are insensitive to (i) the piece-wise smooth distribution of depths across the object surface, and (ii) the implicit multi-modal distribution of depth prediction probabilities along the ray direction on the surface points. Accordingly, we develop DeformSampler to learn distribution-sensitive sample spaces to (i) propagate depths consistent with the scene's geometry across the object surface, and (ii) fit a Laplace Mixture model that approaches the point-wise probabilities distribution of the actual depths along the ray direction. We integrate DeformSampler into a learnable PatchMatch MVS system to enhance depth estimation in challenging areas, such as piece-wise discontinuous surface boundaries and weakly-textured regions. Experimental results on DTU and Tanks \& Temples datasets demonstrate its superior performance and generalization capabilities compared to state-of-the-art competitors. Code is available at https://github.com/Geo-Tell/DS-PMNet.

CVSep 2, 2024
GCCRR: A Short Sequence Gait Cycle Segmentation Method Based on Ear-Worn IMU

Zhenye Xu, Yao Guo

This paper addresses the critical task of gait cycle segmentation using short sequences from ear-worn IMUs, a practical and non-invasive approach for home-based monitoring and rehabilitation of patients with impaired motor function. While previous studies have focused on IMUs positioned on the lower limbs, ear-worn IMUs offer a unique advantage in capturing gait dynamics with minimal intrusion. To address the challenges of gait cycle segmentation using short sequences, we introduce the Gait Characteristic Curve Regression and Restoration (GCCRR) method, a novel two-stage approach designed for fine-grained gait phase segmentation. The first stage transforms the segmentation task into a regression task on the Gait Characteristic Curve (GCC), which is a one-dimensional feature sequence incorporating periodic information. The second stage restores the gait cycle using peak detection techniques. Our method employs Bi-LSTM-based deep learning algorithms for regression to ensure reliable segmentation for short gait sequences. Evaluation on the HamlynGait dataset demonstrates that GCCRR achieves over 80\% Accuracy, with a Timestamp Error below one sampling interval. Despite its promising results, the performance lags behind methods using more extensive sensor systems, highlighting the need for larger, more diverse datasets. Future work will focus on data augmentation using motion capture systems and improving algorithmic generalizability.

CRJan 22
Connect the Dots: Knowledge Graph-Guided Crawler Attack on Retrieval-Augmented Generation Systems

Mengyu Yao, Ziqi Zhang, Ning Luo et al.

Stealing attacks pose a persistent threat to the intellectual property of deployed machine-learning systems. Retrieval-augmented generation (RAG) intensifies this risk by extending the attack surface beyond model weights to knowledge base that often contains IP-bearing assets such as proprietary runbooks, curated domain collections, or licensed documents. Recent work shows that multi-turn questioning can gradually steal corpus content from RAG systems, yet existing attacks are largely heuristic and often plateau early. We address this gap by formulating RAG knowledge-base stealing as an adaptive stochastic coverage problem (ASCP), where each query is a stochastic action and the goal is to maximize the conditional expected marginal gain (CMG) in corpus coverage under a query budget. Bridging ASCP to real-world black-box RAG knowledge-base stealing raises three challenges: CMG is unobservable, the natural-language action space is intractably large, and feasibility constraints require stealthy queries that remain effective under diverse architectures. We introduce RAGCrawler, a knowledge graph-guided attacker that maintains a global attacker-side state to estimate coverage gains, schedule high-value semantic anchors, and generate non-redundant natural queries. Across four corpora and four generators with BGE retriever, RAGCrawler achieves 66.8% average coverage (up to 84.4%) within 1,000 queries, improving coverage by 44.90% relative to the strongest baseline. It also reduces the queries needed to reach 70% coverage by at least 4.03x on average and enables surrogate reconstruction with answer similarity up to 0.699. Our attack is also scalable to retriever switching and newer RAG techniques like query rewriting and multi-query retrieval. These results highlight urgent needs to protect RAG knowledge assets.

ROMar 2, 2025Code
CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

Mingcong Lei, Ge Wang, Yiming Zhao et al.

Large Language Models (LLMs) exhibit remarkable capabilities in the hierarchical decomposition of complex tasks through semantic reasoning. However, their application in embodied systems faces challenges in ensuring reliable execution of subtask sequences and achieving one-shot success in long-term task completion. To address these limitations in dynamic environments, we propose Closed-Loop Embodied Agent (CLEA) -- a novel architecture incorporating four specialized open-source LLMs with functional decoupling for closed-loop task management. The framework features two core innovations: (1) Interactive task planner that dynamically generates executable subtasks based on the environmental memory, and (2) Multimodal execution critic employing an evaluation framework to conduct a probabilistic assessment of action feasibility, triggering hierarchical re-planning mechanisms when environmental perturbations exceed preset thresholds. To validate CLEA's effectiveness, we conduct experiments in a real environment with manipulable objects, using two heterogeneous robots for object search, manipulation, and search-manipulation integration tasks. Across 12 task trials, CLEA outperforms the baseline model, achieving a 67.3% improvement in success rate and a 52.8% increase in task completion rate. These results demonstrate that CLEA significantly enhances the robustness of task planning and execution in dynamic environments.

CLDec 13, 2024Code
ChainStream: An LLM-based Framework for Unified Synthetic Sensing

Jiacheng Liu, Yuanchun Li, Liangyan Li et al. · tsinghua

Many applications demand context sensing to offer personalized and timely services. Yet, developing sensing programs can be challenging for developers and using them is privacy-concerning for end-users. In this paper, we propose to use natural language as the unified interface to process personal data and sense user context, which can effectively ease app development and make the data pipeline more transparent. Our work is inspired by large language models (LLMs) and other generative models, while directly applying them does not solve the problem - letting the model directly process the data cannot handle complex sensing requests and letting the model write the data processing program suffers error-prone code generation. We address the problem with 1) a unified data processing framework that makes context-sensing programs simpler and 2) a feedback-guided query optimizer that makes data query more informative. To evaluate the performance of natural language-based context sensing, we create a benchmark that contains 133 context sensing tasks. Extensive evaluation has shown that our approach is able to automatically solve the context-sensing tasks efficiently and precisely. The code is opensourced at https://github.com/MobileLLM/ChainStream.

CVDec 22, 2021Code
Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation

Xu Yan, Zhihao Yuan, Yuhao Du et al.

Visual Question Answering on 3D Point Cloud (VQA-3D) is an emerging yet challenging field that aims at answering various types of textual questions given an entire point cloud scene. To tackle this problem, we propose the CLEVR3D, a large-scale VQA-3D dataset consisting of 171K questions from 8,771 3D scenes. Specifically, we develop a question engine leveraging 3D scene graph structures to generate diverse reasoning questions, covering the questions of objects' attributes (i.e., size, color, and material) and their spatial relationships. Through such a manner, we initially generated 44K questions from 1,333 real-world scenes. Moreover, a more challenging setup is proposed to remove the confounding bias and adjust the context from a common-sense layout. Such a setup requires the network to achieve comprehensive visual understanding when the 3D scene is different from the general co-occurrence context (e.g., chairs always exist with tables). To this end, we further introduce the compositional scene manipulation strategy and generate 127K questions from 7,438 augmented 3D scenes, which can improve VQA-3D models for real-world comprehension. Built upon the proposed dataset, we baseline several VQA-3D models, where experimental results verify that the CLEVR3D can significantly boost other 3D scene understanding tasks. Our code and dataset will be made publicly available at https://github.com/yanx27/CLEVR3D.

LGMar 2, 2021Code
PFA: Privacy-preserving Federated Adaptation for Effective Model Personalization

Bingyan Liu, Yao Guo, Xiangqun Chen

Federated learning (FL) has become a prevalent distributed machine learning paradigm with improved privacy. After learning, the resulting federated model should be further personalized to each different client. While several methods have been proposed to achieve personalization, they are typically limited to a single local device, which may incur bias or overfitting since data in a single device is extremely limited. In this paper, we attempt to realize personalization beyond a single client. The motivation is that during FL, there may exist many clients with similar data distribution, and thus the personalization performance could be significantly boosted if these similar clients can cooperate with each other. Inspired by this, this paper introduces a new concept called federated adaptation, targeting at adapting the trained model in a federated manner to achieve better personalization results. However, the key challenge for federated adaptation is that we could not outsource any raw data from the client during adaptation, due to privacy concerns. In this paper, we propose PFA, a framework to accomplish Privacy-preserving Federated Adaptation. PFA leverages the sparsity property of neural networks to generate privacy-preserving representations and uses them to efficiently identify clients with similar data distributions. Based on the grouping results, PFA conducts an FL process in a group-wise way on the federated model to accomplish the adaptation. For evaluation, we manually construct several practical FL datasets based on public datasets in order to simulate both the class-imbalance and background-difference conditions. Extensive experiments on these datasets and popular model architectures demonstrate the effectiveness of PFA, outperforming other state-of-the-art methods by a large margin while ensuring user privacy. We will release our code at: https://github.com/lebyni/PFA.

CRMar 14, 2020Code
Security Analysis of EOSIO Smart Contracts

Ningyu He, Ruiyi Zhang, Lei Wu et al.

The EOSIO blockchain, one of the representative Delegated Proof-of-Stake (DPoS) blockchain platforms, has grown rapidly recently. Meanwhile, a number of vulnerabilities and high-profile attacks against top EOSIO DApps and their smart contracts have also been discovered and observed in the wild, resulting in serious financial damages. Most of EOSIO's smart contracts are not open-sourced and they are typically compiled to WebAssembly (Wasm) bytecode, thus making it challenging to analyze and detect the presence of possible vulnerabilities. In this paper, we propose EOSAFE, the first static analysis framework that can be used to automatically detect vulnerabilities in EOSIO smart contracts at the bytecode level. Our framework includes a practical symbolic execution engine for Wasm, a customized library emulator for EOSIO smart contracts, and four heuristics-driven detectors to identify the presence of four most popular vulnerabilities in EOSIO smart contracts. Experiment results suggest that EOSAFE achieves promising results in detecting vulnerabilities, with an F1-measure of 98%. We have applied EOSAFE to all active 53,666 smart contracts in the ecosystem (as of November 15, 2019). Our results show that over 25% of the smart contracts are vulnerable. We further analyze possible exploitation attempts against these vulnerable smart contracts and identify 48 in-the-wild attacks (25 of them have been confirmed by DApp developers), resulting in financial loss of at least 1.7 million USD.

SEJan 9, 2019Code
Humanoid: A Deep Learning-based Approach to Automated Black-box Android App Testing

Yuanchun Li, Ziyue Yang, Yao Guo et al.

Automated input generators are widely used for large-scale dynamic analysis of mobile apps. Such input generators must constantly choose which UI element to interact with and how to interact with it, in order to achieve high coverage with a limited time budget. Currently, most input generators adopt pseudo-random or brute-force searching strategies, which may take very long to find the correct combination of inputs that can drive the app into new and important states. In this paper, we propose Humanoid, a deep learning-based approach to GUI test input generation by learning from human interactions. Our insight is that if we can learn from human-generated interaction traces, it is possible to automatically prioritize test inputs based on their importance as perceived by users. We design and implement a deep neural network model to learn how end-users would interact with an app (specifically, which UI elements to interact with and how). Our experiments showed that the interaction model can successfully prioritize user-preferred inputs for any new UI (with a top-1 accuracy of 51.2% and a top-10 accuracy of 85.2%). We implemented an input generator for Android apps based on the learned model and evaluated it on both open-source apps and market apps. The results indicated that Humanoid was able to achieve higher coverage than six state-of-the-art test generators. However, further analysis showed that the learned model was not the main reason of coverage improvement. Although the learned interaction pattern could drive the app into some important GUI states with higher probabilities, it had limited effect on the width and depth of GUI state search, which is the key to improve test coverage in the long term. Whether and how human interaction patterns can be used to improve coverage is still an unknown and challenging problem.

CRJan 5, 2024
Beyond Fidelity: Explaining Vulnerability Localization of Learning-based Detectors

Baijun Cheng, Shengming Zhao, Kailong Wang et al.

Vulnerability detectors based on deep learning (DL) models have proven their effectiveness in recent years. However, the shroud of opacity surrounding the decision-making process of these detectors makes it difficult for security analysts to comprehend. To address this, various explanation approaches have been proposed to explain the predictions by highlighting important features, which have been demonstrated effective in other domains such as computer vision and natural language processing. Unfortunately, an in-depth evaluation of vulnerability-critical features, such as fine-grained vulnerability-related code lines, learned and understood by these explanation approaches remains lacking. In this study, we first evaluate the performance of ten explanation approaches for vulnerability detectors based on graph and sequence representations, measured by two quantitative metrics including fidelity and vulnerability line coverage rate. Our results show that fidelity alone is not sufficient for evaluating these approaches, as fidelity incurs significant fluctuations across different datasets and detectors. We subsequently check the precision of the vulnerability-related code lines reported by the explanation approaches, and find poor accuracy in this task among all of them. This can be attributed to the inefficiency of explainers in selecting important features and the presence of irrelevant artifacts learned by DL-based detectors.

CRNov 15, 2024
TEESlice: Protecting Sensitive Neural Network Models in Trusted Execution Environments When Attackers have Pre-Trained Models

Ding Li, Ziqi Zhang, Mengyu Yao et al.

Trusted Execution Environments (TEE) are used to safeguard on-device models. However, directly employing TEEs to secure the entire DNN model is challenging due to the limited computational speed. Utilizing GPU can accelerate DNN's computation speed but commercial widely-available GPUs usually lack security protection. To this end, scholars introduce TSDP, a method that protects privacy-sensitive weights within TEEs and offloads insensitive weights to GPUs. Nevertheless, current methods do not consider the presence of a knowledgeable adversary who can access abundant publicly available pre-trained models and datasets. This paper investigates the security of existing methods against such a knowledgeable adversary and reveals their inability to fulfill their security promises. Consequently, we introduce a novel partition before training strategy, which effectively separates privacy-sensitive weights from other components of the model. Our evaluation demonstrates that our approach can offer full model protection with a computational cost reduced by a factor of 10. In addition to traditional CNN models, we also demonstrate the scalability to large language models. Our approach can compress the private functionalities of the large language model to lightweight slices and achieve the same level of protection as the shielding-whole-model baseline.

CVMay 22, 2025
MAGE: A Multi-task Architecture for Gaze Estimation with an Efficient Calibration Module

Haoming Huang, Musen Zhang, Jianxin Yang et al.

Eye gaze can provide rich information on human psychological activities, and has garnered significant attention in the field of Human-Robot Interaction (HRI). However, existing gaze estimation methods merely predict either the gaze direction or the Point-of-Gaze (PoG) on the screen, failing to provide sufficient information for a comprehensive six Degree-of-Freedom (DoF) gaze analysis in 3D space. Moreover, the variations of eye shape and structure among individuals also impede the generalization capability of these methods. In this study, we propose MAGE, a Multi-task Architecture for Gaze Estimation with an efficient calibration module, to predict the 6-DoF gaze information that is applicable for the real-word HRI. Our basic model encodes both the directional and positional features from facial images, and predicts gaze results with dedicated information flow and multiple decoders. To reduce the impact of individual variations, we propose a novel calibration module, namely Easy-Calibration, to fine-tune the basic model with subject-specific data, which is efficient to implement without the need of a screen. Experimental results demonstrate that our method achieves state-of-the-art performance on the public MPIIFaceGaze, EYEDIAP, and our built IMRGaze datasets.

LGMar 13, 2025
Moss: Proxy Model-based Full-Weight Aggregation in Federated Learning with Heterogeneous Models

Yifeng Cai, Ziqi Zhang, Ding Li et al.

Modern Federated Learning (FL) has become increasingly essential for handling highly heterogeneous mobile devices. Current approaches adopt a partial model aggregation paradigm that leads to sub-optimal model accuracy and higher training overhead. In this paper, we challenge the prevailing notion of partial-model aggregation and propose a novel "full-weight aggregation" method named Moss, which aggregates all weights within heterogeneous models to preserve comprehensive knowledge. Evaluation across various applications demonstrates that Moss significantly accelerates training, reduces on-device training time and energy consumption, enhances accuracy, and minimizes network bandwidth utilization when compared to state-of-the-art baselines.

LGOct 22, 2021
DistFL: Distribution-aware Federated Learning for Mobile Scenarios

Bingyan Liu, Yifeng Cai, Ziqi Zhang et al.

Federated learning (FL) has emerged as an effective solution to decentralized and privacy-preserving machine learning for mobile clients. While traditional FL has demonstrated its superiority, it ignores the non-iid (independently identically distributed) situation, which widely exists in mobile scenarios. Failing to handle non-iid situations could cause problems such as performance decreasing and possible attacks. Previous studies focus on the "symptoms" directly, as they try to improve the accuracy or detect possible attacks by adding extra steps to conventional FL models. However, previous techniques overlook the root causes for the "symptoms": blindly aggregating models with the non-iid distributions. In this paper, we try to fundamentally address the issue by decomposing the overall non-iid situation into several iid clusters and conducting aggregation in each cluster. Specifically, we propose \textbf{DistFL}, a novel framework to achieve automated and accurate \textbf{Dist}ribution-aware \textbf{F}ederated \textbf{L}earning in a cost-efficient way. DistFL achieves clustering via extracting and comparing the \textit{distribution knowledge} from the uploaded models. With this framework, we are able to generate multiple personalized models with distinctive distributions and assign them to the corresponding clients. Extensive experiments on mobile scenarios with popular model architectures have demonstrated the effectiveness of DistFL.

CROct 14, 2021
Understanding the Evolution of Blockchain Ecosystems: A Longitudinal Measurement Study of Bitcoin, Ethereum, and EOSIO

Ningyu He, Weihang Su, Zhou Yu et al.

The continuing expansion of the blockchain ecosystems has attracted much attention from the research community. However, although a large number of research studies have been proposed to understand the diverse characteristics of individual blockchain systems (e.g., Bitcoin or Ethereum), little is known at a comprehensive level on the evolution of blockchain ecosystems at scale, longitudinally, and across multiple blockchains. We argue that understanding the dynamics of blockchain ecosystems could provide unique insights that cannot be achieved through studying a single static snapshot or a single blockchain network alone. Based on billions of transaction records collected from three representative and popular blockchain systems (Bitcoin, Ethereum and EOSIO) over 10 years, we conduct the first study on the evolution of multiple blockchain ecosystems from different perspectives. Our exploration suggests that, although the overall blockchain ecosystem shows promising growth over the last decade, a number of worrying outliers exist that have disrupted its evolution.

CVSep 3, 2021
Occlusion-Invariant Rotation-Equivariant Semi-Supervised Depth Based Cross-View Gait Pose Estimation

Xiao Gu, Jianxin Yang, Hanxiao Zhang et al.

Accurate estimation of three-dimensional human skeletons from depth images can provide important metrics for healthcare applications, especially for biomechanical gait analysis. However, there exist inherent problems associated with depth images captured from a single view. The collected data is greatly affected by occlusions where only partial surface data can be recorded. Furthermore, depth images of human body exhibit heterogeneous characteristics with viewpoint changes, and the estimated poses under local coordinate systems are expected to go through equivariant rotations. Most existing pose estimation models are sensitive to both issues. To address this, we propose a novel approach for cross-view generalization with an occlusion-invariant semi-supervised learning framework built upon a novel rotation-equivariant backbone. Our model was trained with real-world data from a single view and unlabelled synthetic data from multiple views. It can generalize well on the real-world data from all the other unseen views. Our approach has shown superior performance on gait analysis on our ICL-Gait dataset compared to other state-of-the-arts and it can produce more convincing keypoints on ITOP dataset, than its provided "ground truth".

CVJul 28, 2021
TransAction: ICL-SJTU Submission to EPIC-Kitchens Action Anticipation Challenge 2021

Xiao Gu, Jianing Qiu, Yao Guo et al.

In this report, the technical details of our submission to the EPIC-Kitchens Action Anticipation Challenge 2021 are given. We developed a hierarchical attention model for action anticipation, which leverages Transformer-based attention mechanism to aggregate features across temporal dimension, modalities, symbiotic branches respectively. In terms of Mean Top-5 Recall of action, our submission with team name ICL-SJTU achieved 13.39% for overall testing set, 10.05% for unseen subsets and 11.88% for tailed subsets. Additionally, it is noteworthy that our submission ranked 1st in terms of verb class in all three (sub)sets.

CVMar 2, 2021
TransTailor: Pruning the Pre-trained Model for Improved Transfer Learning

Bingyan Liu, Yifeng Cai, Yao Guo et al.

The increasing of pre-trained models has significantly facilitated the performance on limited data tasks with transfer learning. However, progress on transfer learning mainly focuses on optimizing the weights of pre-trained models, which ignores the structure mismatch between the model and the target task. This paper aims to improve the transfer performance from another angle - in addition to tuning the weights, we tune the structure of pre-trained models, in order to better match the target task. To this end, we propose TransTailor, targeting at pruning the pre-trained model for improved transfer learning. Different from traditional pruning pipelines, we prune and fine-tune the pre-trained model according to the target-aware weight importance, generating an optimal sub-model tailored for a specific target task. In this way, we transfer a more suitable sub-structure that can be applied during fine-tuning to benefit the final performance. Extensive experiments on multiple pre-trained models and datasets demonstrate that TransTailor outperforms the traditional pruning methods and achieves competitive or even better performance than other state-of-the-art transfer learning methods while using a smaller model. Notably, on the Stanford Dogs dataset, TransTailor can achieve 2.7% accuracy improvement over other transfer methods with 20% fewer FLOPs.

SEMar 1, 2021
CHAMP: Characterizing Undesired App Behaviors from User Comments based on Market Policies

Yangyu Hu, Haoyu Wang, Tiantong Ji et al.

Millions of mobile apps have been available through various app markets. Although most app markets have enforced a number of automated or even manual mechanisms to vet each app before it is released to the market, thousands of low-quality apps still exist in different markets, some of which violate the explicitly specified market policies.In order to identify these violations accurately and timely, we resort to user comments, which can form an immediate feedback for app market maintainers, to identify undesired behaviors that violate market policies, including security-related user concerns. Specifically, we present the first large-scale study to detect and characterize the correlations between user comments and market policies. First, we propose CHAMP, an approach that adopts text mining and natural language processing (NLP) techniques to extract semantic rules through a semi-automated process, and classifies comments into 26 pre-defined types of undesired behaviors that violate market policies. Our evaluation on real-world user comments shows that it achieves both high precision and recall ($>0.9$) in classifying comments for undesired behaviors. Then, we curate a large-scale comment dataset (over 3 million user comments) from apps in Google Play and 8 popular alternative Android app markets, and apply CHAMP to understand the characteristics of undesired behavior comments in the wild. The results confirm our speculation that user comments can be used to pinpoint suspicious apps that violate policies declared by app markets. The study also reveals that policy violations are widespread in many app markets despite their extensive vetting efforts. CHAMP can be a \textit{whistle blower} that assigns policy-violation scores and identifies most informative comments for apps.

LGFeb 22, 2021
Neural Delay Differential Equations

Qunxi Zhu, Yao Guo, Wei Lin

Neural Ordinary Differential Equations (NODEs), a framework of continuous-depth neural networks, have been widely applied, showing exceptional efficacy in coping with some representative datasets. Recently, an augmented framework has been successfully developed for conquering some limitations emergent in application of the original framework. Here we propose a new class of continuous-depth neural networks with delay, named as Neural Delay Differential Equations (NDDEs), and, for computing the corresponding gradients, we use the adjoint sensitivity method to obtain the delayed dynamics of the adjoint. Since the differential equations with delays are usually seen as dynamical systems of infinite dimension possessing more fruitful dynamics, the NDDEs, compared to the NODEs, own a stronger capacity of nonlinear representations. Indeed, we analytically validate that the NDDEs are of universal approximators, and further articulate an extension of the NDDEs, where the initial function of the NDDEs is supposed to satisfy ODEs. More importantly, we use several illustrative examples to demonstrate the outstanding capacities of the NDDEs and the NDDEs with ODEs' initial value. Specifically, (1) we successfully model the delayed dynamics where the trajectories in the lower-dimensional phase space could be mutually intersected, while the traditional NODEs without any argumentation are not directly applicable for such modeling, and (2) we achieve lower loss and higher accuracy not only for the data produced synthetically by complex models but also for the real-world image datasets, i.e., CIFAR10, MNIST, and SVHN. Our results on the NDDEs reveal that appropriately articulating the elements of dynamical systems into the network design is truly beneficial to promoting the network performance.

SESep 29, 2020
Dynamic Slicing for Deep Neural Networks

Ziqi Zhang, Yuanchun Li, Yao Guo et al.

Program slicing has been widely applied in a variety of software engineering tasks. However, existing program slicing techniques only deal with traditional programs that are constructed with instructions and variables, rather than neural networks that are composed of neurons and synapses. In this paper, we propose NNSlicer, the first approach for slicing deep neural networks based on data flow analysis. Our method understands the reaction of each neuron to an input based on the difference between its behavior activated by the input and the average behavior over the whole dataset. Then we quantify the neuron contributions to the slicing criterion by recursively backtracking from the output neurons, and calculate the slice as the neurons and the synapses with larger contributions. We demonstrate the usefulness and effectiveness of NNSlicer with three applications, including adversarial input detection, model pruning, and selective model protection. In all applications, NNSlicer significantly outperforms other baselines that do not rely on data flow analysis.

IVJun 16, 2020
End-to-End Real-time Catheter Segmentation with Optical Flow-Guided Warping during Endovascular Intervention

Anh Nguyen, Dennis Kundrat, Giulio Dagnino et al.

Accurate real-time catheter segmentation is an important pre-requisite for robot-assisted endovascular intervention. Most of the existing learning-based methods for catheter segmentation and tracking are only trained on small-scale datasets or synthetic data due to the difficulties of ground-truth annotation. Furthermore, the temporal continuity in intraoperative imaging sequences is not fully utilised. In this paper, we present FW-Net, an end-to-end and real-time deep learning framework for endovascular intervention. The proposed FW-Net has three modules: a segmentation network with encoder-decoder architecture, a flow network to extract optical flow information, and a novel flow-guided warping function to learn the frame-to-frame temporal continuity. We show that by effectively learning temporal continuity, the network can successfully segment and track the catheters in real-time sequences using only raw ground-truth for training. Detailed validation results confirm that our FW-Net outperforms state-of-the-art techniques while achieving real-time performance.

CRJun 11, 2020
DEPOSafe: Demystifying the Fake Deposit Vulnerability in Ethereum Smart Contracts

Ru Ji, Ningyu He, Lei Wu et al.

Cryptocurrency has seen an explosive growth in recent years, thanks to the evolvement of blockchain technology and its economic ecosystem. Besides Bitcoin, thousands of cryptocurrencies have been distributed on blockchains, while hundreds of cryptocurrency exchanges are emerging to facilitate the trading of digital assets. At the same time, it also attracts the attentions of attackers. Fake deposit, as one of the most representative attacks (vulnerabilities) related to exchanges and tokens, has been frequently observed in the blockchain ecosystem, causing large financial losses. However, besides a few security reports, our community lacks of the understanding of this vulnerability, for example its scale and the impacts. In this paper, we take the first step to demystify the fake deposit vulnerability. Based on the essential patterns we have summarized, we implement DEPOSafe, an automated tool to detect and verify (exploit) the fake deposit vulnerability in ERC-20 smart contracts. DEPOSafe incorporates several key techniques including symbolic execution based static analysis and behavior modeling based dynamic verification. By applying DEPOSafe to 176,000 ERC-20 smart contracts, we have identified over 7,000 vulnerable contracts that may suffer from two types of attacks. Our findings demonstrate the urgency to identify and prevent the fake deposit vulnerability.

CRMay 29, 2020
Beyond the Virus: A First Look at Coronavirus-themed Mobile Malware

Liu Wang, Ren He, Haoyu Wang et al.

As the COVID-19 pandemic emerged in early 2020, a number of malicious actors have started capitalizing the topic. Although a few media reports mentioned the existence of coronavirus-themed mobile malware, the research community lacks the understanding of the landscape of the coronavirus-themed mobile malware. In this paper, we present the first systematic study of coronavirus-themed Android malware. We first make efforts to create a daily growing COVID-19 themed mobile app dataset, which contains 4,322 COVID-19 themed apk samples (2,500 unique apps) and 611 potential malware samples (370 unique malicious apps) by the time of mid-November, 2020. We then present an analysis of them from multiple perspectives including trends and statistics, installation methods, malicious behaviors and malicious actors behind them. We observe that the COVID-19 themed apps as well as malicious ones began to flourish almost as soon as the pandemic broke out worldwide. Most malicious apps are camouflaged as benign apps using the same app identifiers (e.g., app name, package name and app icon). Their main purposes are either stealing users' private information or making profit by using tricks like phishing and extortion. Furthermore, only a quarter of the COVID-19 malware creators are habitual developers who have been active for a long time, while 75% of them are newcomers in this pandemic. The malicious developers are mainly located in US, mostly targeting countries including English-speaking countries, China, Arabic countries and Europe. To facilitate future research, we have publicly released all the well-labelled COVID-19 themed apps (and malware) to the research community. Till now, over 30 research institutes around the world have requested our dataset for COVID-19 themed research.

CVMar 23, 2020
Adversarial Attacks on Monocular Depth Estimation

Ziqi Zhang, Xinge Zhu, Yingwei Li et al.

Recent advances of deep learning have brought exceptional performance on many computer vision tasks such as semantic segmentation and depth estimation. However, the vulnerability of deep neural networks towards adversarial examples have caused grave concerns for real-world deployment. In this paper, we present to the best of our knowledge the first systematic study of adversarial attacks on monocular depth estimation, an important task of 3D scene understanding in scenarios such as autonomous driving and robot navigation. In order to understand the impact of adversarial attacks on depth estimation, we first define a taxonomy of different attack scenarios for depth estimation, including non-targeted attacks, targeted attacks and universal attacks. We then adapt several state-of-the-art attack methods for classification on the field of depth estimation. Besides, multi-task attacks are introduced to further improve the attack performance for universal attacks. Experimental results show that it is possible to generate significant errors on depth estimation. In particular, we demonstrate that our methods can conduct targeted attacks on given objects (such as a car), resulting in depth estimation 3-4x away from the ground truth (e.g., from 20m to 80m).

HCFeb 24, 2020
On-Orbit Operations Simulator for Workload Measurement during Telerobotic Training

Daniel Freer, Yao Guo, Fani Deligianni et al.

Training for telerobotic systems often makes heavy use of simulated platforms, which ensure safe operation during the learning process. Outer space is one domain in which such a simulated training platform would be useful, as On-Orbit Operations (O3) can be costly, inefficient, or even dangerous if not performed properly. In this paper, we present a new telerobotic training simulator for the Canadarm2 on the International Space Station (ISS), which is able to modulate workload through the addition of confounding factors such as latency, obstacles, and time pressure. In addition, multimodal physiological data is collected from subjects as they perform a task from the simulator under these different conditions. As most current workload measures are subjective, we analyse objective measures from the simulator and EEG data that can provide a reliable measure. ANOVA of task data revealed which simulator-based performance measures could predict the presence of latency and time pressure. Furthermore, EEG classification using a Riemannian classifier and Leave-One-Subject-Out cross-validation showed promising classification performance and allowed for comparison of different channel configurations and preprocessing methods. Additionally, Riemannian distance and beta power of EEG data were investigated as potential cross-trial and continuous workload measures.

CRFeb 5, 2020
MadDroid: Characterising and Detecting Devious Ad Content for Android Apps

Tianming Liu, Haoyu Wang, Li Li et al.

Advertisement drives the economy of the mobile app ecosystem. As a key component in the mobile ad business model, mobile ad content has been overlooked by the research community, which poses a number of threats, e.g., propagating malware and undesirable contents. To understand the practice of these devious ad behaviors, we perform a large-scale study on the app contents harvested through automated app testing. In this work, we first provide a comprehensive categorization of devious ad contents, including five kinds of behaviors belonging to two categories: \emph{ad loading content} and \emph{ad clicking content}. Then, we propose MadDroid, a framework for automated detection of devious ad contents. MadDroid leverages an automated app testing framework with a sophisticated ad view exploration strategy for effectively collecting ad-related network traffic and subsequently extracting ad contents. We then integrate dedicated approaches into the framework to identify devious ad contents. We have applied MadDroid to 40,000 Android apps and found that roughly 6\% of apps deliver devious ad contents, e.g., distributing malicious apps that cannot be downloaded via traditional app markets. Experiment results indicate that devious ad contents are prevalent, suggesting that our community should invest more effort into the detection and mitigation of devious ads towards building a trustworthy mobile advertising ecosystem.

MED-PHDec 23, 2019
Artificial Intelligence in Surgery

Xiao-Yun Zhou, Yao Guo, Mali Shen et al.

Artificial Intelligence (AI) is gradually changing the practice of surgery with the advanced technological development of imaging, navigation and robotic intervention. In this article, the recent successful and influential applications of AI in surgery are reviewed from pre-operative planning and intra-operative guidance to the integration of surgical robots. We end with summarizing the current state, emerging trends and major challenges in the future development of AI in surgery.

CVNov 15, 2019
OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning

Qi She, Fan Feng, Xinyue Hao et al.

The recent breakthroughs in computer vision have benefited from the availability of large representative datasets (e.g. ImageNet and COCO) for training. Yet, robotic vision poses unique challenges for applying visual algorithms developed from these standard computer vision datasets due to their implicit assumption over non-varying distributions for a fixed set of tasks. Fully retraining models each time a new task becomes available is infeasible due to computational, storage and sometimes privacy issues, while naïve incremental strategies have been shown to suffer from catastrophic forgetting. It is crucial for the robots to operate continuously under open-set and detrimental conditions with adaptive visual perceptual systems, where lifelong learning is a fundamental capability. However, very few datasets and benchmarks are available to evaluate and compare emerging techniques. To fill this gap, we provide a new lifelong robotic vision dataset ("OpenLORIS-Object") collected via RGB-D cameras. The dataset embeds the challenges faced by a robot in the real-life application and provides new benchmarks for validating lifelong object recognition algorithms. Moreover, we have provided a testbed of $9$ state-of-the-art lifelong learning algorithms. Each of them involves $48$ tasks with $4$ evaluation metrics over the OpenLORIS-Object dataset. The results demonstrate that the object recognition task in the ever-changing difficulty environments is far from being solved and the bottlenecks are at the forward/backward transfer designs. Our dataset and benchmark are publicly available at at \href{https://lifelong-robotic-vision.github.io/dataset/object}{\underline{https://lifelong-robotic-vision.github.io/dataset/object}}.

CRJul 16, 2019
Automated Deobfuscation of Android Native Binary Code

Zeliang Kan, Haoyu Wang, Lei Wu et al.

With the popularity of Android apps, different techniques have been proposed to enhance app protection. As an effective approach to prevent reverse engineering, obfuscation can be used to serve both benign and malicious purposes. In recent years, more and more sensitive logic or data have been implemented as obfuscated native code because of the limitations of Java bytecode. As a result, native code obfuscation becomes a great obstacle for security analysis to understand the complicated logic. In this paper, we propose DiANa, an automated system to facilitate the deobfuscation of native binary code in Android apps. Specifically, given a binary obfuscated by Obfuscator-LLVM (the most popular native code obfuscator), DiANa is capable of recovering the original Control Flow Graph. To the best of our knowledge, DiANa is the first system that aims to tackle the problem of Android native binary deobfuscation. We have applied DiANa in different scenarios, and the experimental results demonstrate the effectiveness of DiANa based on generic similarity comparison metrics.

CVJul 2, 2019
High-speed Railway Fastener Detection and Localization Method based on convolutional neural network

Qing Song, Yao Guo, Jianan Jiang et al.

Railway transportation is the artery of China's national economy and plays an important role in the development of today's society. Due to the late start of China's railway security inspection technology, the current railway security inspection tasks mainly rely on manual inspection, but the manual inspection efficiency is low, and a lot of manpower and material resources are needed. In this paper, we establish a steel rail fastener detection image dataset, which contains 4,000 rail fastener pictures about 4 types. We use the regional suggestion network to generate the region of interest, extracts the features using the convolutional neural network, and fuses the classifier into the detection network. With online hard sample mining to improve the accuracy of the model, we optimize the Faster RCNN detection framework by reducing the number of regions of interest. Finally, the model accuracy reaches 99% and the speed reaches 35FPS in the deployment environment of TITAN X GPU.

CRMay 1, 2019
Characterizing Code Clones in the Ethereum Smart Contract Ecosystem

Ningyu He, Lei Wu, Haoyu Wang et al.

In this paper, we present the first large-scale and systematic study to characterize the code reuse practice in the Ethereum smart contract ecosystem. We first performed a detailed similarity comparison study on a dataset of 10 million contracts we had harvested, and then we further conducted a qualitative analysis to characterize the diversity of the ecosystem, understand the correlation between code reuse and vulnerabilities, and detect the plagiarist DApps. Our analysis revealed that over 96% of the contracts had duplicates, while a large number of them were similar, which suggests that the ecosystem is highly homogeneous. Our results also suggested that roughly 9.7% of the similar contract pairs have exactly the same vulnerabilities, which we assume were introduced by code clones. In addition, we identified 41 DApps clusters, involving 73 plagiarized DApps which had caused huge financial loss to the original creators, accounting for 1/3 of the original market volume.

CVFeb 13, 2019
Highly Efficient Follicular Segmentation in Thyroid Cytopathological Whole Slide Image

Siyan Tao, Yao Guo, Chuang Zhu et al.

In this paper, we propose a novel method for highly efficient follicular segmentation of thyroid cytopathological WSIs. Firstly, we propose a hybrid segmentation architecture, which integrates a classifier into Deeplab V3 by adding a branch. A large amount of the WSI segmentation time is saved by skipping the irrelevant areas using the classification branch. Secondly, we merge the low scale fine features into the original atrous spatial pyramid pooling (ASPP) in Deeplab V3 to accurately represent the details in cytopathological images. Thirdly, our hybrid model is trained by a criterion-oriented adaptive loss function, which leads the model converging much faster. Experimental results on a collection of thyroid patches demonstrate that the proposed model reaches 80.9% on the segmentation accuracy. Besides, 93% time is reduced for the WSI segmentation by using our proposed method, and the WSI-level accuracy achieves 53.4%.

NISep 26, 2018
Beyond Google Play: A Large-Scale Comparative Study of Chinese Android App Markets

Haoyu Wang, Zhe Liu, Jingyue Liang et al.

China is one of the largest Android markets in the world. As Chinese users cannot access Google Play to buy and install Android apps, a number of independent app stores have emerged and compete in the Chinese app market. Some of the Chinese app stores are pre-installed vendor-specific app markets (e.g., Huawei, Xiaomi and OPPO), whereas others are maintained by large tech companies (e.g., Baidu, Qihoo 360 and Tencent). The nature of these app stores and the content available through them vary greatly, including their trustworthiness and security guarantees. As of today, the research community has not studied the Chinese Android ecosystem in depth. To fill this gap, we present the first large-scale comparative study that covers more than 6 million Android apps downloaded from 16 Chinese app markets and Google Play. We focus our study on catalog similarity across app stores, their features, publishing dynamics, and the prevalence of various forms of misbehavior (including the presence of fake, cloned and malicious apps). Our findings also suggest heterogeneous developer behavior across app stores, in terms of code maintenance, use of third-party services, and so forth. Overall, Chinese app markets perform substantially worse when taking active measures to protect mobile users and legit developers from deceptive and abusive actors, showing a significantly higher prevalence of malware, fake, and cloned apps than Google Play.

IRAug 6, 2018
Automated Extraction of Personal Knowledge from Smartphone Push Notifications

Yuanchun Li, Ziyue Yang, Yao Guo et al.

Personalized services are in need of a rich and powerful personal knowledge base, i.e. a knowledge base containing information about the user. This paper proposes an approach to extracting personal knowledge from smartphone push notifications, which are used by mobile systems and apps to inform users of a rich range of information. Our solution is based on the insight that most notifications are formatted using templates, while knowledge entities can be usually found within the parameters to the templates. As defining all the notification templates and their semantic rules are impractical due to the huge number of notification templates used by potentially millions of apps, we propose an automated approach for personal knowledge extraction from push notifications. We first discover notification templates through pattern mining, then use machine learning to understand the template semantics. Based on the templates and their semantics, we are able to translate notification text into knowledge facts automatically. Users' privacy is preserved as we only need to upload the templates to the server for model training, which do not contain any personal information. According to our experiments with about 120 million push notifications from 100,000 smartphone users, our system is able to extract personal knowledge accurately and efficiently.

CYJul 13, 2018
Dating with Scambots: Understanding the Ecosystem of Fraudulent Dating Applications

Yangyu Hu, Haoyu Wang, Yajin Zhou et al.

In this work, we are focusing on a new and yet uncovered way for malicious apps to gain profit. They claim to be dating apps. However, their sole purpose is to lure users into purchasing premium/VIP services to start conversations with other (likely fake female) accounts in the app. We call these apps as fraudulent dating apps. This paper performs a systematic study to understand the whole ecosystem of fraudulent dating apps. Specifically, we have proposed a three-phase method to detect them and subsequently comprehend their characteristics via analyzing the existing account profiles. Our observation reveals that most of the accounts are not managed by real persons, but by chatbots based on predefined conversation templates. We also analyze the business model of these apps and reveal that multiple parties are actually involved in the ecosystem, including producers who develop apps, publishers who publish apps to gain profit, and the distribution network that is responsible for distributing apps to end users. Finally, we analyze the impact of them to users (i.e., victims) and estimate the overall revenue. Our work is the first systematic study on fraudulent dating apps, and the results demonstrate the urge for a solution to protect users.

CRSep 5, 2017
FraudDroid: Automated Ad Fraud Detection for Android Apps

Feng Dong, Haoyu Wang, Li Li et al.

Although mobile ad frauds have been widespread, state-of-the-art approaches in the literature have mainly focused on detecting the so-called static placement frauds, where only a single UI state is involved and can be identified based on static information such as the size or location of ad views. Other types of fraud exist that involve multiple UI states and are performed dynamically while users interact with the app. Such dynamic interaction frauds, although now widely spread in apps, have not yet been explored nor addressed in the literature. In this work, we investigate a wide range of mobile ad frauds to provide a comprehensive taxonomy to the research community. We then propose, FraudDroid, a novel hybrid approach to detect ad frauds in mobile Android apps. FraudDroid analyses apps dynamically to build UI state transition graphs and collects their associated runtime network traffics, which are then leveraged to check against a set of heuristic-based rules for identifying ad fraudulent behaviours. We show empirically that FraudDroid detects ad frauds with a high precision (93%) and recall (92%). Experimental results further show that FraudDroid is capable of detecting ad frauds across the spectrum of fraud types. By analysing 12,000 ad-supported Android apps, FraudDroid identified 335 cases of fraud associated with 20 ad networks that are further confirmed to be true positive results and are shared with our fellow researchers to promote advanced ad fraud detection

ROJun 18, 2016
RRV: A Spatiotemporal Descriptor for Rigid Body Motion Recognition

Yao Guo, Youfu Li, Zhanpeng Shao

Motion behaviors of a rigid body can be characterized by a 6-dimensional motion trajectory, which contains position vectors of a reference point on the rigid body and rotations of this rigid body over time. This paper devises a Rotation and Relative Velocity (RRV) descriptor by exploring the local translational and rotational invariants of motion trajectories of rigid bodies, which is insensitive to noise, invariant to rigid transformation and scaling. A flexible metric is also introduced to measure the distance between two RRV descriptors. The RRV descriptor is then applied to characterize motions of a human body skeleton modeled as articulated interconnections of multiple rigid bodies. To illustrate the descriptive ability of the RRV descriptor, we explore it for different rigid body motion recognition tasks. The experimental results on benchmark datasets demonstrate that this simple RRV descriptor outperforms the previous ones regarding recognition accuracy without increasing computational cost.

CRDec 25, 2015
A Study on Power Side Channels on Mobile Devices

Lin Yan, Yao Guo, Xiangqun Chen et al.

Power side channel is a very important category of side channels, which can be exploited to steal confidential information from a computing system by analyzing its power consumption. In this paper, we demonstrate the existence of various power side channels on popular mobile devices such as smartphones. Based on unprivileged power consumption traces, we present a list of real-world attacks that can be initiated to identify running apps, infer sensitive UIs, guess password lengths, and estimate geo-locations. These attack examples demonstrate that power consumption traces can be used as a practical side channel to gain various confidential information of mobile apps running on smartphones. Based on these power side channels, we discuss possible exploitations and present a general approach to exploit a power side channel on an Android smartphone, which demonstrates that power side channels pose imminent threats to the security and privacy of mobile users. We also discuss possible countermeasures to mitigate the threats of power side channels.