Jianhua Li

Semantic Scholar Profile

h-index19

41papers

949citations

Novelty49%

AI Score57

Ranked #18,872 of 201,326 authors (top 9%)#4,250 in LG (top 10%)

41 Papers

LGSep 24, 2023Code

From Cluster Assumption to Graph Convolution: Graph-based Semi-Supervised Learning Revisited

Zheng Wang, Hongming Ding, Li Pan et al.

Graph-based semi-supervised learning (GSSL) has long been a hot research topic. Traditional methods are generally shallow learners, based on the cluster assumption. Recently, graph convolutional networks (GCNs) have become the predominant techniques for their promising performance. In this paper, we theoretically discuss the relationship between these two types of methods in a unified optimization framework. One of the most intriguing findings is that, unlike traditional ones, typical GCNs may not jointly consider the graph structure and label information at each layer. Motivated by this, we further propose three simple but powerful graph convolution methods. The first is a supervised method OGC which guides the graph convolution process with labels. The others are two unsupervised methods: GGC and its multi-scale version GGCM, both aiming to preserve the graph structure information during the convolution process. Finally, we conduct extensive experiments to show the effectiveness of our methods. Code is available at https://github.com/zhengwang100/ogc_ggcm.

AIJul 29, 2024Code

rLLM: Relational Table Learning with LLMs

Weichen Li, Xiaotong Huang, Jianwu Zheng et al.

We introduce rLLM (relationLLM), a PyTorch library designed for Relational Table Learning (RTL) with Large Language Models (LLMs). The core idea is to decompose state-of-the-art Graph Neural Networks, LLMs, and Table Neural Networks into standardized modules, to enable the fast construction of novel RTL-type models in a simple "combine, align, and co-train" manner. To illustrate the usage of rLLM, we introduce a simple RTL method named \textbf{BRIDGE}. Additionally, we present three novel relational tabular datasets (TML1M, TLF2K, and TACM12K) by enhancing classic datasets. We hope rLLM can serve as a useful and easy-to-use development framework for RTL-related tasks. Our code is available at: https://github.com/rllm-project/rllm.

LGOct 30, 2025Code

Accurate Target Privacy Preserving Federated Learning Balancing Fairness and Utility

Kangkang Sun, Jun Wu, Minyi Guo et al.

Federated Learning (FL) enables collaborative model training without data sharing, yet participants face a fundamental challenge, e.g., simultaneously ensuring fairness across demographic groups while protecting sensitive client data. We introduce a differentially private fair FL algorithm (\textit{FedPF}) that transforms this multi-objective optimization into a zero-sum game where fairness and privacy constraints compete against model utility. Our theoretical analysis reveals a surprising inverse relationship, i.e., stricter privacy protection fundamentally limits the system's ability to detect and correct demographic biases, creating an inherent tension between privacy and fairness. Counterintuitively, we prove that moderate fairness constraints initially improve model generalization before causing performance degradation, where a non-monotonic relationship that challenges conventional wisdom about fairness-utility tradeoffs. Experimental validation demonstrates up to 42.9 % discrimination reduction across three datasets while maintaining competitive accuracy, but more importantly, reveals that the privacy-fairness tension is unavoidable, i.e., achieving both objectives simultaneously requires carefully balanced compromises rather than optimization of either in isolation. The source code for our proposed algorithm is publicly accessible at https://github.com/szpsunkk/FedPF.

LGNov 30, 2023

Toward the Tradeoffs between Privacy, Fairness and Utility in Federated Learning

Kangkang Sun, Xiaojin Zhang, Xi Lin et al.

Federated Learning (FL) is a novel privacy-protection distributed machine learning paradigm that guarantees user privacy and prevents the risk of data leakage due to the advantage of the client's local training. Researchers have struggled to design fair FL systems that ensure fairness of results. However, the interplay between fairness and privacy has been less studied. Increasing the fairness of FL systems can have an impact on user privacy, while an increase in user privacy can affect fairness. In this work, on the client side, we use fairness metrics, such as Demographic Parity (DemP), Equalized Odds (EOs), and Disparate Impact (DI), to construct the local fair model. To protect the privacy of the client model, we propose a privacy-protection fairness FL method. The results show that the accuracy of the fair model with privacy increases because privacy breaks the constraints of the fairness metrics. In our experiments, we conclude the relationship between privacy, fairness and utility, and there is a tradeoff between these.

GTMar 30

Privacy as Commodity: MFG-RegretNet for Large-Scale Privacy Trading in Federated Learning

Kangkang Sun, Jianhua Li, Xiuzhen Chen et al.

Federated Learning (FL) has emerged as a prominent paradigm for privacy-preserving distributed machine learning, yet two fundamental challenges hinder its large-scale adoption. First, gradient inversion attacks can reconstruct sensitive training data from uploaded model updates, so privacy risk persists even when raw data remain local. Second, without adequate monetary compensation, rational clients have little incentive to contribute high-quality gradients, limiting participation at scale. To address these challenges, a privacy trading market is developed in which clients sell their differential privacy budgets as a commodity and receive explicit economic compensation for privacy sacrifice. This market is formalized as a Privacy Auction Game (PAG), and the existence of a Bayesian Nash Equilibrium is established under dominant-strategy incentive compatibility (DSIC), individual rationality (IR), and budget feasibility. To overcome the NP-hard, high-dimensional Nash Equilibrium computation at scale, \textit{MFG-RegretNet} is introduced as a deep-learning-based auction mechanism that combines mean-field game (MFG) approximation with differentiable mechanism design. The MFG reduction lowers per-round computational complexity from $\mathcal{O}(N^2 \log N)$ to $\mathcal{O}(N)$ while incurring only an $\mathcal{O}(N^{-1/2})$ equilibrium approximation gap. Extensive experiments on MNIST and CIFAR-10 demonstrate that MFG-RegretNet outperforms state-of-the-art baselines in incentive compatibility, auction revenue, and social welfare, while maintaining competitive downstream FL model accuracy.

CLApr 29Code

DSIPA: Detecting LLM-Generated Texts via Sentiment-Invariant Patterns Divergence Analysis

Siyuan Li, Aodu Wulianghai, Guangyan Li et al.

The rapid advancement of large language models (LLMs) presents new security challenges, particularly in detecting machine-generated text used for misinformation, impersonation, and content forgery. Most existing detection approaches struggle with robustness against adversarial perturbation, paraphrasing attacks, and domain shifts, often requiring restrictive access to model parameters or large labeled datasets. To address this, we propose DSIPA, a novel training-free framework that detects LLM-generated content by quantifying sentiment distributional stability under controlled stylistic variation. It is based on the observation that LLMs typically exhibit more emotionally consistent outputs, while human-written texts display greater affective variation. Our framework operates in a zero-shot, black-box manner, leveraging two unsupervised metrics, sentiment distribution consistency and sentiment distribution preservation, to capture these intrinsic behavioral asymmetries without the need for parameter updates or probability access. Extensive experiments are conducted on state-of-the-art proprietary and open-source models, including GPT-5.2, Gemini-1.5-pro, Claude-3, and LLaMa-3.3. Evaluations on five domains, such as news articles, programming code, student essays, academic papers, and community comments, demonstrate that DSIPA improves F1 detection scores by up to 49.89% over baseline methods. The framework exhibits superior generalizability across domains and strong resilience to adversarial conditions, providing a robust and interpretable behavioral signal for secure content identification in the evolving LLM landscape.

LGFeb 11Code

LakeMLB: Data Lake Machine Learning Benchmark

Feiyu Pan, Tianbin Zhang, Aoqian Zhang et al.

Modern data lakes have emerged as foundational platforms for large-scale machine learning, enabling flexible storage of heterogeneous data and structured analytics through table-oriented abstractions. Despite their growing importance, standardized benchmarks for evaluating machine learning performance in data lake environments remain scarce. To address this gap, we present LakeMLB (Data Lake Machine Learning Benchmark), designed for the most common multi-source, multi-table scenarios in data lakes. LakeMLB focuses on two representative multi-table scenarios, Union and Join, and provides three real-world datasets for each scenario, covering government open data, finance, Wikipedia, and online marketplaces. The benchmark supports three representative integration strategies: pre-training-based, data augmentation-based, and feature augmentation-based approaches. We conduct extensive experiments with state-of-the-art tabular learning methods, offering insights into their performance under complex data lake scenarios. We release both datasets and code to facilitate rigorous research on machine learning in data lake ecosystems; the benchmark is available at https://github.com/zhengwang100/LakeMLB.

AIMar 30

CoE: Collaborative Entropy for Uncertainty Quantification in Agentic Multi-LLM Systems

Kangkang Sun, Jun Wu, Jianhua Li et al.

Uncertainty estimation in multi-LLM systems remains largely single-model-centric: existing methods quantify uncertainty within each model but do not adequately capture semantic disagreement across models. To address this gap, we propose Collaborative Entropy (CoE), a unified information-theoretic metric for semantic uncertainty in multi-LLM collaboration. CoE is defined on a shared semantic cluster space and combines two components: intra-model semantic entropy and inter-model divergence to the ensemble mean. CoE is not a weighted ensemble predictor; it is a system-level uncertainty measure that characterizes collaborative confidence and disagreement. We analyze several core properties of CoE, including non-negativity, zero-value certainty under perfect semantic consensus, and the behavior of CoE when individual models collapse to delta distributions. These results clarify when reducing per-model uncertainty is sufficient and when residual inter-model disagreement remains. We also present a simple CoE-guided, training-free post-hoc coordination heuristic as a practical application of the metric. Experiments on \textit{TriviaQA} and \textit{SQuAD} with LLaMA-3.1-8B-Instruct, Qwen-2.5-7B-Instruct, and Mistral-7B-Instruct show that CoE provides stronger uncertainty estimation than standard entropy- and divergence-based baselines, with gains becoming larger as additional heterogeneous models are introduced. Overall, CoE offers a useful uncertainty-aware perspective on multi-LLM collaboration.

CVApr 17

Splats in Splats++: Robust and Generalizable 3D Gaussian Splatting Steganography

Yijia Guo, Wenkai Huang, Tong Hu et al.

3D Gaussian Splatting (3DGS) has recently redefined the paradigm of 3D reconstruction, striking an unprecedented balance between visual fidelity and computational efficiency. As its adoption proliferates, safeguarding the copyright of explicit 3DGS assets has become paramount. However, existing invisible message embedding frameworks struggle to reconcile secure and high-capacity data embedding with intrinsic asset utility, often disrupting the native rendering pipeline or exhibiting vulnerability to structural perturbations. In this work, we present \textbf{\textit{Splats in Splats++}}, a unified and pipeline-agnostic steganography framework that seamlessly embeds high-capacity 3D/4D content directly within the native 3DGS representation. Grounded in a principled analysis of the frequency distribution of Spherical Harmonics (SH), we propose an importance-graded SH coefficient encryption scheme that achieves imperceptible embedding without compromising the original expressive power. To fundamentally resolve the geometric ambiguities that lead to message leakage, we introduce a \textbf{Hash-Grid Guided Opacity Mapping} mechanism. Coupled with a novel \textbf{Gradient-Gated Opacity Consistency Loss}, our formulation enforces a stringent spatial-attribute coupling between the original and hidden scenes, effectively projecting the discrete attribute mapping into a continuous, attack-resilient latent manifold. Extensive experiments demonstrate that our method substantially outperforms existing approaches, achieving up to \textbf{6.28 db} higher message fidelity, \textbf{3$\times$} faster rendering, and exceptional robustness against aggressive 3D-targeted structural attacks (e.g., GSPure). Furthermore, our framework exhibits remarkable versatility, generalizing seamlessly to 2D image embedding, 4D dynamic scene steganography, and diverse downstream tasks.

NIApr 19

Safety-Aware AoI Scheduling for LEO Satellite-Assisted Autonomous Driving

Kangkang Sun, Junyi He, Juntong Liu et al.

Autonomous platoons traversing infrastructure gaps increasingly depend on LEO satellite backhaul for safety-critical updates, yet no existing framework jointly addresses compound Doppler from simultaneous satellite and vehicle motion, sub-slot handover outages that exceed collision-alert deadlines, and heterogeneous freshness requirements across three vehicular priority classes. The core challenge is a \emph{timescale mismatch}: coarse control slots hide sub-slot outages, which makes both AoI spike analysis and safety verification ill-posed. Ping-pong handover oscillations further compound AoI cost in a way that purely reactive schedulers cannot mitigate. We address these challenges through a unified framework that couples a two-timescale AoI model with tiered time-average safety constraints enforced by virtual queues. A closed-form ping-pong AoI envelope reveals that cumulative penalty grows quadratically in oscillation length, analytically justifying oscillation suppression as the highest-leverage safety mechanism. The resulting drift-plus-penalty template is instantiated as SafeScale-MATD3 with proactive handover timing and multi-task dual-critic MARL. A key finding is that suppressing brief but repeated ping-pong oscillations yields larger safety returns than shortening any single outage, and that tick-level AoI accounting is a necessary condition for verifiable collision-alert guarantees under LEO handovers. Simulations show that SafeScale-MATD3 is the only method satisfying the strict 1 % collision-alert violation budget, reducing violation rate by 4 to 5.5 times versus baselines, while achieving 35 % lower collision-alert AoI and strict Pareto dominance on the energy and freshness tradeoff.

LGAug 29, 2024

SFR-GNN: Simple and Fast Robust GNNs against Structural Attacks

Xing Ai, Guanyu Zhu, Yulin Zhu et al.

Graph Neural Networks (GNNs) have demonstrated commendable performance for graph-structured data. Yet, GNNs are often vulnerable to adversarial structural attacks as embedding generation relies on graph topology. Existing efforts are dedicated to purifying the maliciously modified structure or applying adaptive aggregation, thereby enhancing the robustness against adversarial structural attacks. It is inevitable for a defender to consume heavy computational costs due to lacking prior knowledge about modified structures. To this end, we propose an efficient defense method, called Simple and Fast Robust Graph Neural Network (SFR-GNN), supported by mutual information theory. The SFR-GNN first pre-trains a GNN model using node attributes and then fine-tunes it over the modified graph in the manner of contrastive learning, which is free of purifying modified structures and adaptive aggregation, thus achieving great efficiency gains. Consequently, SFR-GNN exhibits a 24%--162% speedup compared to advanced robust models, demonstrating superior robustness for node classification tasks.

CLOct 14, 2025Code

StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis

Siyuan Li, Aodu Wulianghai, Xi Lin et al.

With the increasing integration of large language models (LLMs) into open-domain writing, detecting machine-generated text has become a critical task for ensuring content authenticity and trust. Existing approaches rely on statistical discrepancies or model-specific heuristics to distinguish between LLM-generated and human-written text. However, these methods struggle in real-world scenarios due to limited generalization, vulnerability to paraphrasing, and lack of explainability, particularly when facing stylistic diversity or hybrid human-AI authorship. In this work, we propose StyleDecipher, a robust and explainable detection framework that revisits LLM-generated text detection using combined feature extractors to quantify stylistic differences. By jointly modeling discrete stylistic indicators and continuous stylistic representations derived from semantic embeddings, StyleDecipher captures distinctive style-level divergences between human and LLM outputs within a unified representation space. This framework enables accurate, explainable, and domain-agnostic detection without requiring access to model internals or labeled segments. Extensive experiments across five diverse domains, including news, code, essays, reviews, and academic abstracts, demonstrate that StyleDecipher consistently achieves state-of-the-art in-domain accuracy. Moreover, in cross-domain evaluations, it surpasses existing baselines by up to 36.30%, while maintaining robustness against adversarial perturbations and mixed human-AI content. Further qualitative and quantitative analysis confirms that stylistic signals provide explainable evidence for distinguishing machine-generated text. Our source code can be accessed at https://github.com/SiyuanLi00/StyleDecipher.

CVApr 1, 2020Code

High-Performance Long-Term Tracking with Meta-Updater

Kenan Dai, Yunhua Zhang, Dong Wang et al.

Long-term visual tracking has drawn increasing attention because it is much closer to practical applications than short-term tracking. Most top-ranked long-term trackers adopt the offline-trained Siamese architectures, thus, they cannot benefit from great progress of short-term trackers with online update. However, it is quite risky to straightforwardly introduce online-update-based trackers to solve the long-term problem, due to long-term uncertain and noisy observations. In this work, we propose a novel offline-trained Meta-Updater to address an important but unsolved problem: Is the tracker ready for updating in the current frame? The proposed meta-updater can effectively integrate geometric, discriminative, and appearance cues in a sequential manner, and then mine the sequential information with a designed cascaded LSTM module. Our meta-updater learns a binary output to guide the tracker's update and can be easily embedded into different trackers. This work also introduces a long-term tracking framework consisting of an online local tracker, an online verifier, a SiamRPN-based re-detector, and our meta-updater. Numerous experimental results on the VOT2018LT, VOT2019LT, OxUvALT, TLP, and LaSOT benchmarks show that our tracker performs remarkably better than other competing algorithms. Our project is available on the website: https://github.com/Daikenan/LTMU.

CRJan 7

HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense

Siyuan Li, Xi Lin, Jun Wu et al.

Jailbreak attacks pose significant threats to large language models (LLMs), enabling attackers to bypass safeguards. However, existing reactive defense approaches struggle to keep up with the rapidly evolving multi-turn jailbreaks, where attackers continuously deepen their attacks to exploit vulnerabilities. To address this critical challenge, we propose HoneyTrap, a novel deceptive LLM defense framework leveraging collaborative defenders to counter jailbreak attacks. It integrates four defensive agents, Threat Interceptor, Misdirection Controller, Forensic Tracker, and System Harmonizer, each performing a specialized security role and collaborating to complete a deceptive defense. To ensure a comprehensive evaluation, we introduce MTJ-Pro, a challenging multi-turn progressive jailbreak dataset that combines seven advanced jailbreak strategies designed to gradually deepen attack strategies across multi-turn attacks. Besides, we present two novel metrics: Mislead Success Rate (MSR) and Attack Resource Consumption (ARC), which provide more nuanced assessments of deceptive defense beyond conventional measures. Experimental results on GPT-4, GPT-3.5-turbo, Gemini-1.5-pro, and LLaMa-3.1 demonstrate that HoneyTrap achieves an average reduction of 68.77% in attack success rates compared to state-of-the-art baselines. Notably, even in a dedicated adaptive attacker setting with intensified conditions, HoneyTrap remains resilient, leveraging deceptive engagement to prolong interactions, significantly increasing the time and computational costs required for successful exploitation. Unlike simple rejection, HoneyTrap strategically wastes attacker resources without impacting benign queries, improving MSR and ARC by 118.11% and 149.16%, respectively.

CLMay 7

Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation

Siyuan Li, Aodu Wulianghai, Xi Lin et al.

The increasing prevalence of Large Language Models (LLMs) in content creation has made distinguishing human-written textual content from LLM-generated counterparts a critical task for multimedia moderation. Existing detectors often rely on statistical cues or model-specific heuristics, making them vulnerable to paraphrasing and adversarial manipulations, and consequently limiting their robustness and interpretability. In this work, we proposeLiSCP , a novel lightweight stylistic consistency profiling method for robust detection of LLM-generated textual content, focusing on feature stability under adversarial manipulation. Our approach constructs a consistency profile that combines discrete stylistic features with continuous semantic signals, leveraging stylistic stability across multimodal-guided paraphrased text variants. Experiments spanning real-world multimedia news and movie datasets and conventional text domains demonstrate that LiSCP achieves superior performance on in-domain detection and outperforms existing approaches by up to 11.79% in cross-domain settings. Additionally,it demonstrates notable robustness under adversarial scenarios, including adversarial attacks and hybrid human-AI settings.

GTApr 1

Heterogeneous Mean Field Game Framework for LEO Satellite-Assisted V2X Networks

Kangkang Sun, Jianhua Li, Xiuzhen Chen et al.

Coordinating mixed fleets of $10^4$ to $10^5$ vehicles, passenger cars, freight trucks, and autonomous vehicles, under stringent delay constraints is a central scalability bottleneck in next-generation V2X networks. Heterogeneous mean field games (HMFG) offer a principled coordination framework, yet a fundamental design question lacks theoretical guidance: how many agent types $K$ should be used for a fleet of size $N$? The core challenge is a two-sided trade-off that existing theory does not resolve: increasing $K$ reduces type-discretization error but simultaneously starves each class of the samples needed for reliable mean-field approximation. We resolve this trade-off by deriving an explicit $\varepsilon$-Nash error decomposition driven by a Wasserstein-based heterogeneity measure, and prove that the unique error-minimizing type count satisfies $K^*(N)=Î(N^{1/3})$ in the canonical one-dimensional queue setting. We further establish a heterogeneity-aware convergence condition for G-prox PDHG and extend the framework to temporal-graph LEO satellite backhaul dynamics with provable robustness guarantees. A perhaps surprising consequence is that even for $N=10^5$ vehicles, only about 28 type classes suffice, cube-root compression rather than per-vehicle modeling, so type-granularity selection is largely a set-once design decision. Experiments validate the scaling law, achieve $2.3\times$ faster PDHG convergence at $K=5$, and deliver up to $29.5\%$ lower delay and $60\%$ higher throughput compared with homogeneous baselines.

NIMay 5, 2024

Multi-Agent RL-Based Industrial AIGC Service Offloading over Wireless Edge Networks

Siyuan Li, Xi Lin, Hansong Xu et al.

Currently, the generative model has garnered considerable attention due to its application in addressing the challenge of scarcity of abnormal samples in the industrial Internet of Things (IoT). However, challenges persist regarding the edge deployment of generative models and the optimization of joint edge AI-generated content (AIGC) tasks. In this paper, we focus on the edge optimization of AIGC task execution and propose GMEL, a generative model-driven industrial AIGC collaborative edge learning framework. This framework aims to facilitate efficient few-shot learning by leveraging realistic sample synthesis and edge-based optimization capabilities. First, a multi-task AIGC computational offloading model is presented to ensure the efficient execution of heterogeneous AIGC tasks on edge servers. Then, we propose an attention-enhanced multi-agent reinforcement learning (AMARL) algorithm aimed at refining offloading policies within the IoT system, thereby supporting generative model-driven edge learning. Finally, our experimental results demonstrate the effectiveness of the proposed algorithm in optimizing the total system latency of the edge-based AIGC task completion.

CRApr 25

Toward Polymorphic Backdoor against Semantic Communication via Intensity-Based Poisoning

Xiao Yang, Yuni Lai, Gaolei Li et al.

Semantic Communication (SC) backdoor attacks aim to utilize triggers to manipulate the system into producing predetermined outputs via backdoored shared knowledge. Current SC backdoors adopt monomorphic paradigms with single attack target, which suffers from limited attack diversity, efficiency, and flexibility in heterogeneous downstream scenarios. To overcome the limitations, we propose SemBugger, a polymorphic SC backdoor. By dynamically adjusting the trigger intensity, SemBugger finely-grained controls over the SC knowledge to generate diverse malicious results from the system. Specifically, SemBugger is realized through a multi-effect poisoning-training framework. It introduces graded-intensity triggers to poison training data and optimizes SC systems with hierarchical malicious loss. The trained system's knowledge dynamically adapts to trigger intensity in inputs to yield target outputs, all while preserving transmission fidelity for benign samples. Moreover, to augment SC security, we propose a provable robustness defense that resists SemBugger's homogeneous attacks through a controlled noise mechanism. It operates via strategically adding noise in SC inputs, and we formally provide a theoretical lower bound on the defense efficacy. Experiments across diverse SC models and benchmark datasets indicate that SemBugger attains high attack efficacy while maintaining the regular functionality of SC systems. Meanwhile, the designed defense effectively neutralizes SemBugger attacks.

CVDec 4, 2024

Splats in Splats: Embedding Invisible 3D Watermark within Gaussian Splatting

Yijia Guo, Wenkai Huang, Yang Li et al.

3D Gaussian splatting (3DGS) has demonstrated impressive 3D reconstruction performance with explicit scene representations. Given the widespread application of 3DGS in 3D reconstruction and generation tasks, there is an urgent need to protect the copyright of 3DGS assets. However, existing copyright protection techniques for 3DGS overlook the usability of 3D assets, posing challenges for practical deployment. Here we describe WaterGS, the first 3DGS watermarking framework that embeds 3D content in 3DGS itself without modifying any attributes of the vanilla 3DGS. To achieve this, we take a deep insight into spherical harmonics (SH) and devise an importance-graded SH coefficient encryption strategy to embed the hidden SH coefficients. Furthermore, we employ a convolutional autoencoder to establish a mapping between the original Gaussian primitives' opacity and the hidden Gaussian primitives' opacity. Extensive experiments indicate that WaterGS significantly outperforms existing 3D steganography techniques, with 5.31% higher scene fidelity and 3X faster rendering speed, while ensuring security, robustness, and user experience. Codes and data will be released at https://water-gs.github.io.

CRMar 27, 2024

Spikewhisper: Temporal Spike Backdoor Attacks on Federated Neuromorphic Learning over Low-power Devices

Hanqing Fu, Gaolei Li, Jun Wu et al.

Federated neuromorphic learning (FedNL) leverages event-driven spiking neural networks and federated learning frameworks to effectively execute intelligent analysis tasks over amounts of distributed low-power devices but also perform vulnerability to poisoning attacks. The threat of backdoor attacks on traditional deep neural networks typically comes from time-invariant data. However, in FedNL, unknown threats may be hidden in time-varying spike signals. In this paper, we start to explore a novel vulnerability of FedNL-based systems with the concept of time division multiplexing, termed Spikewhisper, which allows attackers to evade detection as much as possible, as multiple malicious clients can imperceptibly poison with different triggers at different timeslices. In particular, the stealthiness of Spikewhisper is derived from the time-domain divisibility of global triggers, in which each malicious client pastes only one local trigger to a certain timeslice in the neuromorphic sample, and also the polarity and motion of each local trigger can be configured by attackers. Extensive experiments based on two different neuromorphic datasets demonstrate that the attack success rate of Spikewispher is higher than the temporally centralized attacks. Besides, it is validated that the effect of Spikewispher is sensitive to the trigger duration.

CRApr 5

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Siyuan Li, Zehao Liu, Xi Lin et al.

As Large Language Models (LLMs) are increasingly deployed in complex applications, their vulnerability to adversarial attacks raises urgent safety concerns, especially those evolving over multi-round interactions. Existing defenses are largely reactive and struggle to adapt as adversaries refine strategies across rounds. In this work, we propose CoopGuard , a stateful multi-round LLM defense framework based on cooperative agents that maintains and updates an internal defense state to counter evolving attacks. It employs three specialized agents (Deferring Agent, Tempting Agent, and Forensic Agent) for complementary round-level strategies, coordinated by System Agent, which conditions decisions on the evolving defense state (interaction history) and orchestrates agents over time. To evaluate evolving threats, we introduce the EMRA benchmark with 5,200 adversarial samples across 8 attack types, simulating progressively LLM multi-round attacks. Experiments show that CoopGuard reduces attack success rate by 78.9% over state-of-the-art defenses, while improving deceptive rate by 186% and reducing attack efficiency by 167.9%, offering a more comprehensive assessment of multi-round defense. These results demonstrate that CoopGuard provides robust protection for LLMs in multi-round adversarial scenarios.

GTMar 31

Hierarchical Battery-Aware Game Algorithm for ISL Power Allocation in LEO Mega-Constellations

Kangkang Sun, Jianhua Li, Xiuzhen Chen et al.

Sustaining high inter-satellite link (ISL) throughput under intermittent solar harvesting is a fundamental challenge for LEO mega-constellations. Existing frameworks impose static power ceilings that ignore real-time battery state and comprehensive onboard power budgets, causing eclipse-period energy crises. Learning-based approaches capture battery dynamics but lack equilibrium guarantees and do not scale beyond small constellations. We propose the Hierarchical Battery-Aware Game (HBAG) algorithm, a unified game-theoretic framework for ISL power allocation that operates identically across finite and megaconstellation regimes. For finite constellations, HBAG converges to a unique variational equilibrium; as constellation size grows, the same distributed update rule converges to the mean field equilibrium without algorithm redesign. Comprehensive experiments on Starlink Shell A (172 satellites) show that HBAG achieves 100% energy sustainability rate (87.4 percentage points improvement over SATFLOW), eliminates eclipse-period battery depletion, maintains flow violation ratio below the 10% industry tolerance, and scales linearly to 5,000 satellites with less than 75 ms per-slot runtime.

CRMar 6

Alkaid: Resilience to Edit Errors in Provably Secure Steganography via Distance-Constrained Encoding

Zhihan Cao, Gaolei Li, Jun Wu et al.

While provably secure steganography provides strong concealment by ensuring stego carriers are indistinguishable from natural samples, such systems remain vulnerable to real-world edit errors (e.g., insertions, deletions, substitutions) because their decoding depends on perfect synchronization and lacks error-correcting capability. To bridge this gap, we propose Alkaid, a provably secure steganographic scheme resilient to edit errors via distance-constrained encoding. The key innovation integrates the minimum distance decoding principle directly into the encoding process by enforcing a strict lower bound on the edit distance between codewords of different messages. Specifically, if two candidate codewords violate this bound, they are merged to represent the same message, thereby guaranteeing reliable recovery. While maintaining provable security, we theoretically prove that Alkaid offers deterministic robustness against bounded errors. To implement this scheme efficiently, we adopt block-wise and batch processing. Extensive experiments demonstrate that Alkaid achieves decoding success rates of 99\% to 100\% across diverse error channels, delivers a payload of 0.2 bits per token for high embedding capacity, and maintains an encoding speed of 6.72 bits per second, significantly surpassing state-of-the-art (SOTA) methods in robustness, capacity, and efficiency.

CVNov 27, 2025

Can Protective Watermarking Safeguard the Copyright of 3D Gaussian Splatting?

Wenkai Huang, Yijia Guo, Gaolei Li et al.

3D Gaussian Splatting (3DGS) has emerged as a powerful representation for 3D scenes, widely adopted due to its exceptional efficiency and high-fidelity visual quality. Given the significant value of 3DGS assets, recent works have introduced specialized watermarking schemes to ensure copyright protection and ownership verification. However, can existing 3D Gaussian watermarking approaches genuinely guarantee robust protection of the 3D assets? In this paper, for the first time, we systematically explore and validate possible vulnerabilities of 3DGS watermarking frameworks. We demonstrate that conventional watermark removal techniques designed for 2D images do not effectively generalize to the 3DGS scenario due to the specialized rendering pipeline and unique attributes of each gaussian primitives. Motivated by this insight, we propose GSPure, the first watermark purification framework specifically for 3DGS watermarking representations. By analyzing view-dependent rendering contributions and exploiting geometrically accurate feature clustering, GSPure precisely isolates and effectively removes watermark-related Gaussian primitives while preserving scene integrity. Extensive experiments demonstrate that our GSPure achieves the best watermark purification performance, reducing watermark PSNR by up to 16.34dB while minimizing degradation to original scene fidelity with less than 1dB PSNR loss. Moreover, it consistently outperforms existing methods in both effectiveness and generalization.

LGAug 12, 2025

Fre-CW: Targeted Attack on Time Series Forecasting using Frequency Domain Loss

Naifu Feng, Lixing Chen, Junhua Tang et al.

Transformer-based models have made significant progress in time series forecasting. However, a key limitation of deep learning models is their susceptibility to adversarial attacks, which has not been studied enough in the context of time series prediction. In contrast to areas such as computer vision, where adversarial robustness has been extensively studied, frequency domain features of time series data play an important role in the prediction task but have not been sufficiently explored in terms of adversarial attacks. This paper proposes a time series prediction attack algorithm based on frequency domain loss. Specifically, we adapt an attack method originally designed for classification tasks to the prediction field and optimize the adversarial samples using both time-domain and frequency-domain losses. To the best of our knowledge, there is no relevant research on using frequency information for time-series adversarial attacks. Our experimental results show that these current time series prediction models are vulnerable to adversarial attacks, and our approach achieves excellent performance on major time series forecasting datasets.

CLAug 9, 2025

Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection

Siyuan Li, Xi Lin, Guangyan Li et al.

The rapid advancement of large language models (LLMs) has resulted in increasingly sophisticated AI-generated content, posing significant challenges in distinguishing LLM-generated text from human-written language. Existing detection methods, primarily based on lexical heuristics or fine-tuned classifiers, often suffer from limited generalizability and are vulnerable to paraphrasing, adversarial perturbations, and cross-domain shifts. In this work, we propose SentiDetect, a model-agnostic framework for detecting LLM-generated text by analyzing the divergence in sentiment distribution stability. Our method is motivated by the empirical observation that LLM outputs tend to exhibit emotionally consistent patterns, whereas human-written texts display greater emotional variability. To capture this phenomenon, we define two complementary metrics: sentiment distribution consistency and sentiment distribution preservation, which quantify stability under sentiment-altering and semantic-preserving transformations. We evaluate SentiDetect on five diverse datasets and a range of advanced LLMs,including Gemini-1.5-Pro, Claude-3, GPT-4-0613, and LLaMa-3.3. Experimental results demonstrate its superiority over state-of-the-art baselines, with over 16% and 11% F1 score improvements on Gemini-1.5-Pro and GPT-4-0613, respectively. Moreover, SentiDetect also shows greater robustness to paraphrasing, adversarial attacks, and text length variations, outperforming existing detectors in challenging scenarios.

LGMar 29, 2025

AuditVotes: A Framework Towards More Deployable Certified Robustness for Graph Neural Networks

Yuni Lai, Yulin Zhu, Yixuan Sun et al.

Despite advancements in Graph Neural Networks (GNNs), adaptive attacks continue to challenge their robustness. Certified robustness based on randomized smoothing has emerged as a promising solution, offering provable guarantees that a model's predictions remain stable under adversarial perturbations within a specified range. However, existing methods face a critical trade-off between accuracy and robustness, as achieving stronger robustness requires introducing greater noise into the input graph. This excessive randomization degrades data quality and disrupts prediction consistency, limiting the practical deployment of certifiably robust GNNs in real-world scenarios where both accuracy and robustness are essential. To address this challenge, we propose \textbf{AuditVotes}, the first framework to achieve both high clean accuracy and certifiably robust accuracy for GNNs. It integrates randomized smoothing with two key components, \underline{au}gmentation and con\underline{dit}ional smoothing, aiming to improve data quality and prediction consistency. The augmentation, acting as a pre-processing step, de-noises the randomized graph, significantly improving data quality and clean accuracy. The conditional smoothing, serving as a post-processing step, employs a filtering function to selectively count votes, thereby filtering low-quality predictions and improving voting consistency. Extensive experimental results demonstrate that AuditVotes significantly enhances clean accuracy, certified robustness, and empirical robustness while maintaining high computational efficiency. Notably, compared to baseline randomized smoothing, AuditVotes improves clean accuracy by $437.1\%$ and certified accuracy by $409.3\%$ when the attacker can arbitrarily insert $20$ edges on the Cora-ML datasets, representing a substantial step toward deploying certifiably robust GNNs in real-world applications.

NIJun 22, 2024

OpticGAI: Generative AI-aided Deep Reinforcement Learning for Optical Networks Optimization

Siyuan Li, Xi Lin, Yaju Liu et al.

Deep Reinforcement Learning (DRL) is regarded as a promising tool for optical network optimization. However, the flexibility and efficiency of current DRL-based solutions for optical network optimization require further improvement. Currently, generative models have showcased their significant performance advantages across various domains. In this paper, we introduce OpticGAI, the AI-generated policy design paradigm for optical networks. In detail, it is implemented as a novel DRL framework that utilizes generative models to learn the optimal policy network. Furthermore, we assess the performance of OpticGAI on two NP-hard optical network problems, Routing and Wavelength Assignment (RWA) and dynamic Routing, Modulation, and Spectrum Allocation (RMSA), to show the feasibility of the AI-generated policy paradigm. Simulation results have shown that OpticGAI achieves the highest reward and the lowest blocking rate of both RWA and RMSA problems. OpticGAI poses a promising direction for future research on generative AI-enhanced flexible optical network optimization.

LGJun 15, 2024

Graph Neural Backdoor: Fundamentals, Methodologies, Applications, and Future Directions

Xiao Yang, Gaolei Li, Jianhua Li

Graph Neural Networks (GNNs) have significantly advanced various downstream graph-relevant tasks, encompassing recommender systems, molecular structure prediction, social media analysis, etc. Despite the boosts of GNN, recent research has empirically demonstrated its potential vulnerability to backdoor attacks, wherein adversaries employ triggers to poison input samples, inducing GNN to adversary-premeditated malicious outputs. This is typically due to the controlled training process, or the deployment of untrusted models, such as delegating model training to third-party service, leveraging external training sets, and employing pre-trained models from online sources. Although there's an ongoing increase in research on GNN backdoors, comprehensive investigation into this field is lacking. To bridge this gap, we propose the first survey dedicated to GNN backdoors. We begin by outlining the fundamental definition of GNN, followed by the detailed summarization and categorization of current GNN backdoor attacks and defenses based on their technical characteristics and application scenarios. Subsequently, the analysis of the applicability and use cases of GNN backdoors is undertaken. Finally, the exploration of potential research directions of GNN backdoors is presented. This survey aims to explore the principles of graph backdoors, provide insights to defenders, and promote future security research.

CRMay 9, 2024

Trustworthy AI-Generative Content for Intelligent Network Service: Robustness, Security, and Fairness

Siyuan Li, Xi Lin, Yaju Liu et al.

AI-generated content (AIGC) models, represented by large language models (LLM), have revolutionized content creation. High-speed next-generation communication technology is an ideal platform for providing powerful AIGC network services. At the same time, advanced AIGC techniques can also make future network services more intelligent, especially various online content generation services. However, the significant untrustworthiness concerns of current AIGC models, such as robustness, security, and fairness, greatly affect the credibility of intelligent network services, especially in ensuring secure AIGC services. This paper proposes TrustGAIN, a trustworthy AIGC framework that incorporates robust, secure, and fair network services. We first discuss the robustness to adversarial attacks faced by AIGC models in network systems and the corresponding protection issues. Subsequently, we emphasize the importance of avoiding unsafe and illegal services and ensuring the fairness of the AIGC network services. Then as a case study, we propose a novel sentiment analysis-based detection method to guide the robust detection of unsafe content in network services. We conduct our experiments on fake news, malicious code, and unsafe review datasets to represent LLM application scenarios. Our results indicate that TrustGAIN is an exploration of future networks that can support trustworthy AIGC network services.

LGJan 19, 2024

Adversarial Robustness of Link Sign Prediction in Signed Graphs

Jialong Zhou, Xing Ai, Yuni Lai et al.

Signed graphs serve as fundamental data structures for representing positive and negative relationships in social networks, with signed graph neural networks (SGNNs) emerging as the primary tool for their analysis. Our investigation reveals that balance theory, while essential for modeling signed relationships in SGNNs, inadvertently introduces exploitable vulnerabilities to black-box attacks. To showcase this, we propose balance-attack, a novel adversarial strategy specifically designed to compromise graph balance degree, and develop an efficient heuristic algorithm to solve the associated NP-hard optimization problem. While existing approaches attempt to restore attacked graphs through balance learning techniques, they face a critical challenge we term "Irreversibility of Balance-related Information," as restored edges fail to align with original attack targets. To address this limitation, we introduce Balance Augmented-Signed Graph Contrastive Learning (BA-SGCL), an innovative framework that combines contrastive learning with balance augmentation techniques to achieve robust graph representations. By maintaining high balance degree in the latent space, BA-SGCL not only effectively circumvents the irreversibility challenge but also significantly enhances model resilience. Extensive experiments across multiple SGNN architectures and real-world datasets demonstrate both the effectiveness of our proposed balance-attack and the superior robustness of BA-SGCL, advancing the security and reliability of signed graph analysis in social networks. Datasets and codes of the proposed framework are at the github repository https://anonymous.4open.science/r/BA-SGCL-submit-DF41/.

LGJan 18, 2024

Towards Robust Graph Structural Learning Beyond Homophily via Preserving Neighbor Similarity

Yulin Zhu, Yuni Lai, Xing Ai et al.

Despite the tremendous success of graph-based learning systems in handling structural data, it has been widely investigated that they are fragile to adversarial attacks on homophilic graph data, where adversaries maliciously modify the semantic and topology information of the raw graph data to degrade the predictive performances. Motivated by this, a series of robust models are crafted to enhance the adversarial robustness of graph-based learning systems on homophilic graphs. However, the security of graph-based learning systems on heterophilic graphs remains a mystery to us. To bridge this gap, in this paper, we start to explore the vulnerability of graph-based learning systems regardless of the homophily degree, and theoretically prove that the update of the negative classification loss is negatively correlated with the pairwise similarities based on the powered aggregated neighbor features. The theoretical finding inspires us to craft a novel robust graph structural learning strategy that serves as a useful graph mining module in a robust model that incorporates a dual-kNN graph constructions pipeline to supervise the neighbor-similarity-preserved propagation, where the graph convolutional layer adaptively smooths or discriminates the features of node pairs according to their affluent local structures. In this way, the proposed methods can mine the ``better" topology of the raw graph data under diverse graph homophily and achieve more reliable data management on homophilic and heterophilic graphs.

CVAug 9, 2021

Video Annotation for Visual Tracking via Selection and Refinement

Kenan Dai, Jie Zhao, Lijun Wang et al.

Deep learning based visual trackers entail offline pre-training on large volumes of video datasets with accurate bounding box annotations that are labor-expensive to achieve. We present a new framework to facilitate bounding box annotations for video sequences, which investigates a selection-and-refinement strategy to automatically improve the preliminary annotations generated by tracking algorithms. A temporal assessment network (T-Assess Net) is proposed which is able to capture the temporal coherence of target locations and select reliable tracking results by measuring their quality. Meanwhile, a visual-geometry refinement network (VG-Refine Net) is also designed to further enhance the selected tracking results by considering both target appearance and temporal geometry constraints, allowing inaccurate tracking results to be corrected. The combination of the above two networks provides a principled approach to ensure the quality of automatic video annotation. Experiments on large scale tracking benchmarks demonstrate that our method can deliver highly accurate bounding box annotations and significantly reduce human labor by 94.0%, yielding an effective means to further boost tracking performance with augmented training data.

CRJul 3, 2021

Too Expensive to Attack: Enlarge the Attack Expense through Joint Defense at the Edge

Jianhua Li, Ximeng Liu, Jiong JIn et al.

The distributed denial of service (DDoS) attack is detrimental to businesses and individuals as people are heavily relying on the Internet. Due to remarkable profits, crackers favor DDoS as cybersecurity weapons to attack a victim. Even worse, edge servers are more vulnerable. Current solutions lack adequate consideration to the expense of attackers and inter-defender collaborations. Hence, we revisit the DDoS attack and defense, clarifying the advantages and disadvantages of both parties. We further propose a joint defense framework to defeat attackers by incurring a significant increment of required bots and enlarging attack expenses. The quantitative evaluation and experimental assessment showcase that such expense can surge up to thousands of times. The skyrocket of expenses leads to heavy loss to the cracker, which prevents further attacks.

CRApr 1, 2021

Too Expensive to Attack: A Joint Defense Framework to Mitigate Distributed Attacks for the Internet of Things Grid

Jianhua Li, Ximeng Liu, Jiong Jin et al.

The distributed denial of service (DDoS) attack is detrimental to businesses and individuals as we are heavily relying on the Internet. Due to remarkable profits, crackers favor DDoS as cybersecurity weapons in attacking servers, computers, IoT devices, and even the entire Internet. Many current detection and mitigation solutions concentrate on specific technologies in combating DDoS, whereas the attacking expense and the cross-defender collaboration have not drawn enough attention. Under this circumstance, we revisit the DDoS attack and defense in terms of attacking cost and populations of both parties, proposing a joint defense framework to incur higher attacking expense in a grid of Internet service providers (ISPs), businesses, individuals, and third-party organizations (IoT Grid). Meanwhile, the defender's cost does not grow much during combats. The skyrocket of attacking expense discourages profit-driven attackers from launching further attacks effectively. The quantitative evaluation and experimental assessment reinforce the effectiveness of our framework.

SPDec 29, 2020

Leveraging AI and Intelligent Reflecting Surface for Energy-Efficient Communication in 6G IoT

Qianqian Pan, Jun Wu, Xi Zheng et al.

The ever-increasing data traffic, various delay-sensitive services, and the massive deployment of energy-limited Internet of Things (IoT) devices have brought huge challenges to the current communication networks, motivating academia and industry to move to the sixth-generation (6G) network. With the powerful capability of data transmission and processing, 6G is considered as an enabler for IoT communication with low latency and energy cost. In this paper, we propose an artificial intelligence (AI) and intelligent reflecting surface (IRS) empowered energy-efficiency communication system for 6G IoT. First, we design a smart and efficient communication architecture including the IRS-aided data transmission and the AI-driven network resource management mechanisms. Second, an energy efficiency-maximizing model under given transmission latency for 6G IoT system is formulated, which jointly optimizes the settings of all communication participants, i.e. IoT transmission power, IRS-reflection phase shift, and BS detection matrix. Third, a deep reinforcement learning (DRL) empowered network resource control and allocation scheme is proposed to solve the formulated optimization model. Based on the network and channel status, the DRL-enabled scheme facilities the energy-efficiency and low-latency communication. Finally, experimental results verified the effectiveness of our proposed communication system for 6G IoT.

CRNov 12, 2020

A Fast and Scalable Authentication Scheme in IoT for Smart Living

Jianhua Li, Jiong Jin, Lingjuan Lyu et al.

Numerous resource-limited smart objects (SOs) such as sensors and actuators have been widely deployed in smart environments, opening new attack surfaces to intruders. The severe security flaw discourages the adoption of the Internet of things in smart living. In this paper, we leverage fog computing and microservice to push certificate authority (CA) functions to the proximity of data sources. Through which, we can minimize attack surfaces and authentication latency, and result in a fast and scalable scheme in authenticating a large volume of resource-limited devices. Then, we design lightweight protocols to implement the scheme, where both a high level of security and low computation workloads on SO (no bilinear pairing requirement on the client-side) is accomplished. Evaluations demonstrate the efficiency and effectiveness of our scheme in handling authentication and registration for a large number of nodes, meanwhile protecting them against various threats to smart living. Finally, we showcase the success of computing intelligence movement towards data sources in handling complicated services.

ROMar 1, 2018

GelSlim: A High-Resolution, Compact, Robust, and Calibrated Tactile-sensing Finger

Elliott Donlon, Siyuan Dong, Melody Liu et al.

This work describes the development of a high-resolution tactile-sensing finger for robot grasping. This finger, inspired by previous GelSight sensing techniques, features an integration that is slimmer, more robust, and with more homogeneous output than previous vision-based tactile sensors. To achieve a compact integration, we redesign the optical path from illumination source to camera by combining light guides and an arrangement of mirror reflections. We parameterize the optical path with geometric design variables and describe the tradeoffs between the finger thickness, the depth of field of the camera, and the size of the tactile sensing area. The sensor sustains the wear from continuous use -- and abuse -- in grasping tasks by combining tougher materials for the compliant soft gel, a textured fabric skin, a structurally rigid body, and a calibration process that maintains homogeneous illumination and contrast of the tactile images during use. Finally, we evaluate the sensor's durability along four metrics that track the signal quality during more than 3000 grasping experiments.

ROFeb 27, 2018

Slip Detection with Combined Tactile and Visual Information

Jianhua Li, Siyuan Dong, Edward Adelson

Slip detection plays a vital role in robotic manipulation and it has long been a challenging problem in the robotic community. In this paper, we propose a new method based on deep neural network (DNN) to detect slip. The training data is acquired by a GelSight tactile sensor and a camera mounted on a gripper when we use a robot arm to grasp and lift 94 daily objects with different grasping forces and grasping positions. The DNN is trained to classify whether a slip occurred or not. To evaluate the performance of the DNN, we test 10 unseen objects in 152 grasps. A detection accuracy as high as 88.03% is achieved. It is anticipated that the accuracy can be further improved with a larger dataset. This method is beneficial for robots to make stable grasps, which can be widely applied to automatic force control, grasping strategy selection and fine manipulation.

CVJun 10, 2017

Deep Learning-Based Food Calorie Estimation Method in Dietary Assessment

Yanchao Liang, Jianhua Li

Obesity treatment requires obese patients to record all food intakes per day. Computer vision has been introduced to estimate calories from food images. In order to increase accuracy of detection and reduce the error of volume estimation in food calorie estimation, we present our calorie estimation method in this paper. To estimate calorie of food, a top view and side view is needed. Faster R-CNN is used to detect the food and calibration object. GrabCut algorithm is used to get each food's contour. Then the volume is estimated with the food and corresponding object. Finally we estimate each food's calorie. And the experiment results show our estimation method is effective.

CVMay 22, 2017

Computer vision-based food calorie estimation: dataset, method, and experiment

Yanchao Liang, Jianhua Li

Computer vision has been introduced to estimate calories from food images. But current food image data sets don't contain volume and mass records of foods, which leads to an incomplete calorie estimation. In this paper, we present a novel food image data set with volume and mass records of foods, and a deep learning method for food detection, to make a complete calorie estimation. Our data set includes 2978 images, and every image contains corresponding each food's annotation, volume and mass records, as well as a certain calibration reference. To estimate calorie of food in the proposed data set, a deep learning method using Faster R-CNN first is put forward to detect the food. And the experiment results show our method is effective to estimate calories and our data set contains adequate information for calorie estimation. Our data set is the first released food image data set which can be used to evaluate computer vision-based calorie estimation methods.