Zhongjie Wang

CV
h-index84
24papers
142citations
Novelty51%
AI Score55

24 Papers

LGJan 20, 2023Code
Who Should I Engage with At What Time? A Missing Event Aware Temporal Graph Neural Network

Mingyi Liu, Zhiying Tu, Xiaofei Xu et al.

Temporal graph neural network has recently received significant attention due to its wide application scenarios, such as bioinformatics, knowledge graphs, and social networks. There are some temporal graph neural networks that achieve remarkable results. However, these works focus on future event prediction and are performed under the assumption that all historical events are observable. In real-world applications, events are not always observable, and estimating event time is as important as predicting future events. In this paper, we propose MTGN, a missing event-aware temporal graph neural network, which uniformly models evolving graph structure and timing of events to support predicting what will happen in the future and when it will happen.MTGN models the dynamic of both observed and missing events as two coupled temporal point processes, thereby incorporating the effects of missing events into the network. Experimental results on several real-world temporal graphs demonstrate that MTGN significantly outperforms existing methods with up to 89% and 112% more accurate time and link prediction. Code can be found on https://github.com/HIT-ICES/TNNLS-MTGN.

CVApr 20, 2023
Scene Style Text Editing

Tonghua Su, Fuxiang Yang, Xiang Zhou et al.

In this work, we propose a task called "Scene Style Text Editing (SSTE)", changing the text content as well as the text style of the source image while keeping the original text scene. Existing methods neglect to fine-grained adjust the style of the foreground text, such as its rotation angle, color, and font type. To tackle this task, we propose a quadruple framework named "QuadNet" to embed and adjust foreground text styles in the latent feature space. Specifically, QuadNet consists of four parts, namely background inpainting, style encoder, content encoder, and fusion generator. The background inpainting erases the source text content and recovers the appropriate background with a highly authentic texture. The style encoder extracts the style embedding of the foreground text. The content encoder provides target text representations in the latent feature space to implement the content edits. The fusion generator combines the information yielded from the mentioned parts and generates the rendered text images. Practically, our method is capable of performing promisingly on real-world datasets with merely string-level annotation. To the best of our knowledge, our work is the first to finely manipulate the foreground text content and style by deeply semantic editing in the latent feature space. Extensive experiments demonstrate that QuadNet has the ability to generate photo-realistic foreground text and avoid source text shadows in real-world scenes when editing text content.

AINov 3, 2025
OmniFuser: Adaptive Multimodal Fusion for Service-Oriented Predictive Maintenance

Ziqi Wang, Hailiang Zhao, Yuhao Yang et al.

Accurate and timely prediction of tool conditions is critical for intelligent manufacturing systems, where unplanned tool failures can lead to quality degradation and production downtime. In modern industrial environments, predictive maintenance is increasingly implemented as an intelligent service that integrates sensing, analysis, and decision support across production processes. To meet the demand for reliable and service-oriented operation, we present OmniFuser, a multimodal learning framework for predictive maintenance of milling tools that leverages both visual and sensor data. It performs parallel feature extraction from high-resolution tool images and cutting-force signals, capturing complementary spatiotemporal patterns across modalities. To effectively integrate heterogeneous features, OmniFuser employs a contamination-free cross-modal fusion mechanism that disentangles shared and modality-specific components, allowing for efficient cross-modal interaction. Furthermore, a recursive refinement pathway functions as an anchor mechanism, consistently retaining residual information to stabilize fusion dynamics. The learned representations can be encapsulated as reusable maintenance service modules, supporting both tool-state classification (e.g., Sharp, Used, Dulled) and multi-step force signal forecasting. Experiments on real-world milling datasets demonstrate that OmniFuser consistently outperforms state-of-the-art baselines, providing a dependable foundation for building intelligent industrial maintenance services.

DCApr 20, 2023
A Survey on Deep Neural Network Partition over Cloud, Edge and End Devices

Di Xu, Xiang He, Tonghua Su et al.

Deep neural network (DNN) partition is a research problem that involves splitting a DNN into multiple parts and offloading them to specific locations. Because of the recent advancement in multi-access edge computing and edge intelligence, DNN partition has been considered as a powerful tool for improving DNN inference performance when the computing resources of edge and end devices are limited and the remote transmission of data from these devices to clouds is costly. This paper provides a comprehensive survey on the recent advances and challenges in DNN partition approaches over the cloud, edge, and end devices based on a detailed literature collection. We review how DNN partition works in various application scenarios, and provide a unified mathematical model of the DNN partition problem. We developed a five-dimensional classification framework for DNN partition approaches, consisting of deployment locations, partition granularity, partition constraints, optimization objectives, and optimization algorithms. Each existing DNN partition approache can be perfectly defined in this framework by instantiating each dimension into specific values. In addition, we suggest a set of metrics for comparing and evaluating the DNN partition approaches. Based on this, we identify and discuss research challenges that have not yet been investigated or fully addressed. We hope that this work helps DNN partition researchers by highlighting significant future research directions in this domain.

CVDec 31, 2025Code
HaineiFRDM: Explore Diffusion to Restore Defects in Fast-Movement Films

Rongji Xun, Junjie Yuan, Zhongjie Wang

Existing open-source film restoration methods show limited performance compared to commercial methods due to training with low-quality synthetic data and employing noisy optical flows. In addition, high-resolution films have not been explored by the open-source methods.We propose HaineiFRDM(Film Restoration Diffusion Model), a film restoration framework, to explore diffusion model's powerful content-understanding ability to help human expert better restore indistinguishable film defects.Specifically, we employ a patch-wise training and testing strategy to make restoring high-resolution films on one 24GB-VRAMR GPU possible and design a position-aware Global Prompt and Frame Fusion Modules.Also, we introduce a global-local frequency module to reconstruct consistent textures among different patches. Besides, we firstly restore a low-resolution result and use it as global residual to mitigate blocky artifacts caused by patching process.Furthermore, we construct a film restoration dataset that contains restored real-degraded films and realistic synthetic data.Comprehensive experimental results conclusively demonstrate the superiority of our model in defect restoration ability over existing open-source methods. Code and the dataset will be released.

CLJan 7, 2023
A Personalized Utterance Style (PUS) based Dialogue Strategy for Efficient Service Requirement Elicitation

Demin Yu, Min Liu, Zhongjie Wang

With the flourish of services on the Internet, a prerequisite for service providers to precisely deliver services to their customers is to capture user requirements comprehensively, accurately, and efficiently. This is called the ``Service Requirement Elicitation (SRE)'' task. Considering the amount of customers is huge, it is an inefficient way for service providers to interact with each user by face-to-face dialog. Therefore, to elicit user requirements with the assistance of virtual intelligent assistants has become a mainstream way. Since user requirements generally consist of different levels of details and need to be satisfied by services from multiple domains, there is a huge potential requirement space for SRE to explore to elicit complete requirements. Considering that traditional dialogue system with static slots cannot be directly applied to the SRE task, it is a challenge to design an efficient dialogue strategy to guide users to express their complete and accurate requirements in such a huge potential requirement space. Based on the phenomenon that users tend to express requirements subjectively in a sequential manner, we propose a Personalized Utterance Style (PUS) module to perceive the personalized requirement expression habits, and then apply PUS to an dialogue strategy to efficiently complete the SRE task. Specifically, the dialogue strategy chooses suitable response actions for dynamically updating the dialogue state. With the assistance of PUS extracted from dialogue history, the system can shrink the search scope of potential requirement space. Experiment results show that the dialogue strategy with PUS can elicit more accurate user requirements with fewer dialogue rounds.

CVDec 3, 2025
Global-Local Aware Scene Text Editing

Fuxiang Yang, Tonghua Su, Donglin Di et al.

Scene Text Editing (STE) involves replacing text in a scene image with new target text while preserving both the original text style and background texture. Existing methods suffer from two major challenges: inconsistency and length-insensitivity. They often fail to maintain coherence between the edited local patch and the surrounding area, and they struggle to handle significant differences in text length before and after editing. To tackle these challenges, we propose an end-to-end framework called Global-Local Aware Scene Text Editing (GLASTE), which simultaneously incorporates high-level global contextual information along with delicate local features. Specifically, we design a global-local combination structure, joint global and local losses, and enhance text image features to ensure consistency in text style within local patches while maintaining harmony between local and global areas. Additionally, we express the text style as a vector independent of the image size, which can be transferred to target text images of various sizes. We use an affine fusion to fill target text images into the editing patch while maintaining their aspect ratio unchanged. Extensive experiments on real-world datasets validate that our GLASTE model outperforms previous methods in both quantitative metrics and qualitative results and effectively mitigates the two challenges.

LGDec 14, 2025Code
Plug-and-Play Parameter-Efficient Tuning of Embeddings for Federated Recommendation

Haochen Yuan, Yang Zhang, Xiang He et al.

With the rise of cloud-edge collaboration, recommendation services are increasingly trained in distributed environments. Federated Recommendation (FR) enables such multi-end collaborative training while preserving privacy by sharing model parameters instead of raw data. However, the large number of parameters, primarily due to the massive item embeddings, significantly hampers communication efficiency. While existing studies mainly focus on improving the efficiency of FR models, they largely overlook the issue of embedding parameter overhead. To address this gap, we propose a FR training framework with Parameter-Efficient Fine-Tuning (PEFT) based embedding designed to reduce the volume of embedding parameters that need to be transmitted. Our approach offers a lightweight, plugin-style solution that can be seamlessly integrated into existing FR methods. In addition to incorporating common PEFT techniques such as LoRA and Hash-based encoding, we explore the use of Residual Quantized Variational Autoencoders (RQ-VAE) as a novel PEFT strategy within our framework. Extensive experiments across various FR model backbones and datasets demonstrate that our framework significantly reduces communication overhead while improving accuracy. The source code is available at https://github.com/young1010/FedPEFT.

CLMay 25, 2025Code
Self-Critique Guided Iterative Reasoning for Multi-hop Question Answering

Zheng Chu, Huiming Fan, Jingchang Chen et al.

Although large language models (LLMs) have demonstrated remarkable reasoning capabilities, they still face challenges in knowledge-intensive multi-hop reasoning. Recent work explores iterative retrieval to address complex problems. However, the lack of intermediate guidance often results in inaccurate retrieval and flawed intermediate reasoning, leading to incorrect reasoning. To address these, we propose Self-Critique Guided Iterative Reasoning (SiGIR), which uses self-critique feedback to guide the iterative reasoning process. Specifically, through end-to-end training, we enable the model to iteratively address complex problems via question decomposition. Additionally, the model is able to self-evaluate its intermediate reasoning steps. During iterative reasoning, the model engages in branching exploration and employs self-evaluation to guide the selection of promising reasoning trajectories. Extensive experiments on three multi-hop reasoning datasets demonstrate the effectiveness of our proposed method, surpassing the previous SOTA by $8.6\%$. Furthermore, our thorough analysis offers insights for future research. Our code, data, and models are available at Github: https://github.com/zchuz/SiGIR-MHQA.

CLJan 8, 2020Code
LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition

Mingyi Liu, Zhiying Tu, Tong Zhang et al.

In recent years, deep learning has achieved great success in many natural language processing tasks including named entity recognition. The shortcoming is that a large amount of manually-annotated data is usually required. Previous studies have demonstrated that active learning could elaborately reduce the cost of data annotation, but there is still plenty of room for improvement. In real applications we found existing uncertainty-based active learning strategies have two shortcomings. Firstly, these strategies prefer to choose long sequence explicitly or implicitly, which increase the annotation burden of annotators. Secondly, some strategies need to invade the model and modify to generate some additional information for sample selection, which will increase the workload of the developer and increase the training/prediction time of the model. In this paper, we first examine traditional active learning strategies in a specific case of BiLstm-CRF that has widely used in named entity recognition on several typical datasets. Then we propose an uncertainty-based active learning strategy called Lowest Token Probability (LTP) which combines the input and output of CRF to select informative instance. LTP is simple and powerful strategy that does not favor long sequences and does not need to invade the model. We test LTP on multiple datasets, and the experiments show that LTP performs slightly better than traditional strategies with obviously less annotation tokens on both sentence-level accuracy and entity-level F1-score. Related code have been release on https://github.com/HIT-ICES/AL-NER

CLMar 14
Memory-Driven Role-Playing: Evaluation and Enhancement of Persona Knowledge Utilization in LLMs

Kai Wang, Haoyang You, Yang Zhang et al.

A core challenge for faithful LLM role-playing is sustaining consistent characterization throughout long, open-ended dialogues, as models frequently fail to recall and accurately apply their designated persona knowledge without explicit cues. To tackle this, we propose the Memory-Driven Role-Playing paradigm. Inspired by Stanislavski's "emotional memory" acting theory, this paradigm frames persona knowledge as the LLM's internal memory store, requiring retrieval and application based solely on dialogue context, thereby providing a rigorous test of depth and autonomous use of knowledge. Centered on this paradigm, we contribute: (1) MREval, a fine-grained evaluation framework assessing four memory-driven abilities - Anchoring, Recalling, Bounding, and Enacting; (2) MRPrompt, a prompting architecture that guides structured memory retrieval and response generation; and (3) MRBench, a bilingual (Chinese/English) benchmark for fine-grained diagnosis. The novel paradigm provides a comprehensive diagnostic for four-staged role-playing abilities across 12 LLMs. Crucially, experiments show that MRPrompt allows small models (e.g., Qwen3-8B) to match the performance of much larger closed-source LLMs (e.g., Qwen3-Max and GLM-4.7), and confirms that upstream memory gains directly enhance downstream response quality, validating the staged theoretical foundation.

CLMar 12, 2024
LaERC-S: Improving LLM-based Emotion Recognition in Conversation with Speaker Characteristics

Yumeng Fu, Junjie Wu, Zhongjie Wang et al.

Emotion recognition in conversation (ERC), the task of discerning human emotions for each utterance within a conversation, has garnered significant attention in human-computer interaction systems. Previous ERC studies focus on speaker-specific information that predominantly stems from relationships among utterances, which lacks sufficient information around conversations. Recent research in ERC has sought to exploit pre-trained large language models (LLMs) with speaker modelling to comprehend emotional states. Although these methods have achieved encouraging results, the extracted speaker-specific information struggles to indicate emotional dynamics. In this paper, motivated by the fact that speaker characteristics play a crucial role and LLMs have rich world knowledge, we present LaERC-S, a novel framework that stimulates LLMs to explore speaker characteristics involving the mental state and behavior of interlocutors, for accurate emotion predictions. To endow LLMs with this knowledge information, we adopt the two-stage learning to make the models reason speaker characteristics and track the emotion of the speaker in complex conversation scenarios. Extensive experiments on three benchmark datasets demonstrate the superiority of LaERC-S, reaching the new state-of-the-art.

CVNov 19, 2025
Multimodal Continual Instruction Tuning with Dynamic Gradient Guidance

Songze Li, Mingyu Gao, Tonghua Su et al.

Multimodal continual instruction tuning enables multimodal large language models to sequentially adapt to new tasks while building upon previously acquired knowledge. However, this continual learning paradigm faces the significant challenge of catastrophic forgetting, where learning new tasks leads to performance degradation on previous ones. In this paper, we introduce a novel insight into catastrophic forgetting by conceptualizing it as a problem of missing gradients from old tasks during new task learning. Our approach approximates these missing gradients by leveraging the geometric properties of the parameter space, specifically using the directional vector between current parameters and previously optimal parameters as gradient guidance. This approximated gradient can be further integrated with real gradients from a limited replay buffer and regulated by a Bernoulli sampling strategy that dynamically balances model stability and plasticity. Extensive experiments on multimodal continual instruction tuning datasets demonstrate that our method achieves state-of-the-art performance without model expansion, effectively mitigating catastrophic forgetting while maintaining a compact architecture.

CVApr 14, 2025
Balancing Stability and Plasticity in Pretrained Detector: A Dual-Path Framework for Incremental Object Detection

Songze Li, Qixing Xu, Tonghua Su et al.

The balance between stability and plasticity remains a fundamental challenge in pretrained model-based incremental object detection (PTMIOD). While existing PTMIOD methods demonstrate strong performance on in-domain tasks aligned with pretraining data, their plasticity to cross-domain scenarios remains underexplored. Through systematic component-wise analysis of pretrained detectors, we reveal a fundamental discrepancy: the localization modules demonstrate inherent cross-domain stability-preserving precise bounding box estimation across distribution shifts-while the classification components require enhanced plasticity to mitigate discriminability degradation in cross-domain scenarios. Motivated by these findings, we propose a dual-path framework built upon pretrained DETR-based detectors which decouples localization stability and classification plasticity: the localization path maintains stability to preserve pretrained localization knowledge, while the classification path facilitates plasticity via parameter-efficient fine-tuning and resists forgetting with pseudo-feature replay. Extensive evaluations on both in-domain (MS COCO and PASCAL VOC) and cross-domain (TT100K) benchmarks show state-of-the-art performance, demonstrating our method's ability to effectively balance stability and plasticity in PTMIOD, achieving robust cross-domain adaptation and strong retention of anti-forgetting capabilities.

CVApr 9, 2025
DUKAE: DUal-level Knowledge Accumulation and Ensemble for Pre-Trained Model-Based Continual Learning

Songze Li, Tonghua Su, Xu-Yao Zhang et al.

Pre-trained model-based continual learning (PTMCL) has garnered growing attention, as it enables more rapid acquisition of new knowledge by leveraging the extensive foundational understanding inherent in pre-trained model (PTM). Most existing PTMCL methods use Parameter-Efficient Fine-Tuning (PEFT) to learn new knowledge while consolidating existing memory. However, they often face some challenges. A major challenge lies in the misalignment of classification heads, as the classification head of each task is trained within a distinct feature space, leading to inconsistent decision boundaries across tasks and, consequently, increased forgetting. Another critical limitation stems from the restricted feature-level knowledge accumulation, with feature learning typically restricted to the initial task only, which constrains the model's representation capabilities. To address these issues, we propose a method named DUal-level Knowledge Accumulation and Ensemble (DUKAE) that leverages both feature-level and decision-level knowledge accumulation by aligning classification heads into a unified feature space through Gaussian distribution sampling and introducing an adaptive expertise ensemble to fuse knowledge across feature subspaces. Extensive experiments on CIFAR-100, ImageNet-R, CUB-200, and Cars-196 datasets demonstrate the superior performance of our approach.

CLMar 31, 2025
BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation

Yumeng Fu, Junjie Wu, Zhongjie Wang et al.

Multimodal emotion recognition in conversation (MERC), the task of identifying the emotion label for each utterance in a conversation, is vital for developing empathetic machines. Current MLLM-based MERC studies focus mainly on capturing the speaker's textual or vocal characteristics, but ignore the significance of video-derived behavior information. Different from text and audio inputs, learning videos with rich facial expression, body language and posture, provides emotion trigger signals to the models for more accurate emotion predictions. In this paper, we propose a novel behavior-aware MLLM-based framework (BeMERC) to incorporate speaker's behaviors, including subtle facial micro-expression, body language and posture, into a vanilla MLLM-based MERC model, thereby facilitating the modeling of emotional dynamics during a conversation. Furthermore, BeMERC adopts a two-stage instruction tuning strategy to extend the model to the conversations scenario for end-to-end training of a MERC predictor. Experiments demonstrate that BeMERC achieves superior performance than the state-of-the-art methods on two benchmark datasets, and also provides a detailed discussion on the significance of video-derived behavior information in MERC.

CVJan 24, 2025
GUSLO: General and Unified Structured Light Optimization

Tinglei Wan, Tonghua Su, Zhongjie Wang

Structured light (SL) 3D reconstruction captures the precise surface shape of objects, providing high-accuracy 3D data essential for industrial inspection and cultural heritage digitization. However, existing methods suffer from two key limitations: reliance on scene-specific calibration with manual parameter tuning, and optimization frameworks tailored to specific SL patterns, limiting their generalizability across varied scenarios. We propose General and Unified Structured Light Optimization (GUSLO), a novel framework addressing these issues through two coordinated innovations: (1) single-shot calibration via 2D triangulation-based interpolation that converts sparse matches into dense correspondence fields, and (2) artifact-aware photometric adaptation via explicit transfer functions, balancing generalization and color fidelity. We conduct diverse experiments covering binary, speckle, and color-coded settings. Results show that GUSLO consistently improves accuracy and cross-encoding robustness over conventional methods in challenging industrial and cultural scenarios.

SEAug 21, 2021
Data Correction and Evolution Analysis of the ProgrammableWeb Service Ecosystem

Mingyi Liu, Zhiying Tu, Yeqi Zhu et al.

The evolution analysis on Web service ecosystems has become a critical problem as the frequency of service changes on the Internet increases rapidly. Developers need to understand these evolution patterns to assist in their decision-making on service selection. ProgrammableWeb is a popular Web service ecosystem on which several evolution analyses have been conducted in the literature. However, the existing studies have ignored the quality issues of the ProgrammableWeb dataset and the issue of service obsolescence. In this study, we first report the quality issues identified in the ProgrammableWeb dataset from our empirical study. Then, we propose a novel method to correct the relevant evolution analysis data by estimating the life cycle of application programming interfaces (APIs) and mashups. We also reveal how to use three different dynamic network models in the service ecosystem evolution analysis based on the corrected ProgrammableWeb dataset. Our experimental experience iterates the quality issues of the original ProgrammableWeb and highlights several research opportunities.

AIAug 7, 2021
DySR: A Dynamic Representation Learning and Aligning based Model for Service Bundle Recommendation

Mingyi Liu, Zhiying Tu, Xiaofei Xu et al.

An increasing number and diversity of services are available, which result in significant challenges to effective reuse service during requirement satisfaction. There have been many service bundle recommendation studies and achieved remarkable results. However, there is still plenty of room for improvement in the performance of these methods. The fundamental problem with these studies is that they ignore the evolution of services over time and the representation gap between services and requirements. In this paper, we propose a dynamic representation learning and aligning based model called DySR to tackle these issues. DySR eliminates the representation gap between services and requirements by learning a transformation function and obtains service representations in an evolving social environment through dynamic graph representation learning. Extensive experiments conducted on a real-world dataset from ProgrammableWeb show that DySR outperforms existing state-of-the-art methods in commonly used evaluation metrics, improving $F1@5$ from $36.1\%$ to $69.3\%$.

LGJun 3, 2021
Learning Representation over Dynamic Graph using Aggregation-Diffusion Mechanism

Mingyi Liu, Zhiying Tu, Xiaofei Xu et al.

Representation learning on graphs that evolve has recently received significant attention due to its wide application scenarios, such as bioinformatics, knowledge graphs, and social networks. The propagation of information in graphs is important in learning dynamic graph representations, and most of the existing methods achieve this by aggregation. However, relying only on aggregation to propagate information in dynamic graphs can result in delays in information propagation and thus affect the performance of the method. To alleviate this problem, we propose an aggregation-diffusion (AD) mechanism that actively propagates information to its neighbor by diffusion after the node updates its embedding through the aggregation mechanism. In experiments on two real-world datasets in the dynamic link prediction task, the AD mechanism outperforms the baseline models that only use aggregation to propagate information. We further conduct extensive experiments to discuss the influence of different factors in the AD mechanism.

CRNov 3, 2020
You Do (Not) Belong Here: Detecting DPI Evasion Attacks with Context Learning

Shitong Zhu, Shasha Li, Zhongjie Wang et al.

As Deep Packet Inspection (DPI) middleboxes become increasingly popular, a spectrum of adversarial attacks have emerged with the goal of evading such middleboxes. Many of these attacks exploit discrepancies between the middlebox network protocol implementations, and the more rigorous/complete versions implemented at end hosts. These evasion attacks largely involve subtle manipulations of packets to cause different behaviours at DPI and end hosts, to cloak malicious network traffic that is otherwise detectable. With recent automated discovery, it has become prohibitively challenging to manually curate rules for detecting these manipulations. In this work, we propose CLAP, the first fully-automated, unsupervised ML solution to accurately detect and localize DPI evasion attacks. By learning what we call the packet context, which essentially captures inter-relationships across both (1) different packets in a connection; and (2) different header fields within each packet, from benign traffic traces only, CLAP can detect and pinpoint packets that violate the benign packet contexts (which are the ones that are specially crafted for evasion purposes). Our evaluations with 73 state-of-the-art DPI evasion attacks show that CLAP achieves an Area Under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.963, an Equal Error Rate (EER) of only 0.061 in detection, and an accuracy of 94.6% in localization. These results suggest that CLAP can be a promising tool for thwarting DPI evasion attacks.

SESep 4, 2020
Domain Priori Knowledge based Integrated Solution Design for Internet of Services

Hanchuan Xu, Xiao Wang, Yuxin Wang et al.

Various types of services, such as web APIs, IoT services, O2O services, and many others, have flooded on the Internet. Interconnections among these services have resulted in a new phenomenon called "Internet of Services" (IoS). By IoS,people don't need to request multiple services by themselves to fulfill their daily requirements, but it is an IoS platform that is responsible for constructing integrated solutions for them. Since user requirements (URs) are usually coarse-grained and transboundary, IoS platforms have to integrate services from multiple domains to fulfill the requirements. Considering there are too many available services in IoS, a big challenge is how to look for a tradeoff between the construction efficiency and the precision of final solutions. For this challenge, we introduce a framework and a platform for transboundary user requirement oriented solution design in IoS. The main idea is to make use of domain priori knowledge derived from the commonness and similarities among massive historical URs and among historical integrated service solutions(ISSs). Priori knowledge is classified into three types: requirement patterns (RPs), service patterns (SPs), and probabilistic matching matrix (PMM) between RPs and SPs. A UR is modeled in the form of an intention tree (ITree) along with a set of constraints on intention nodes, and then optimal RPs are selected to cover the I-Tree as much as possible. By taking advantage of the PMM, a set of SPs are filtered out and composed together to form the final ISS. Finally, the design of a platform supporting the above process is introduced.

AISep 3, 2020
User Intention Recognition and Requirement Elicitation Method for Conversational AI Services

Junrui Tian, Zhiying Tu, Zhongjie Wang et al.

In recent years, chat-bot has become a new type of intelligent terminal to guide users to consume services. However, it is criticized most that the services it provides are not what users expect or most expect. This defect mostly dues to two problems, one is that the incompleteness and uncertainty of user's requirement expression caused by the information asymmetry, the other is that the diversity of service resources leads to the difficulty of service selection. Conversational bot is a typical mesh device, so the guided multi-rounds Q$\&$A is the most effective way to elicit user requirements. Obviously, complex Q$\&$A with too many rounds is boring and always leads to bad user experience. Therefore, we aim to obtain user requirements as accurately as possible in as few rounds as possible. To achieve this, a user intention recognition method based on Knowledge Graph (KG) was developed for fuzzy requirement inference, and a requirement elicitation method based on Granular Computing was proposed for dialog policy generation. Experimental results show that these two methods can effectively reduce the number of conversation rounds, and can quickly and accurately identify the user intention.

CRJan 29, 2020
A4 : Evading Learning-based Adblockers

Shitong Zhu, Zhongjie Wang, Xun Chen et al.

Efforts by online ad publishers to circumvent traditional ad blockers towards regaining fiduciary benefits, have been demonstrably successful. As a result, there have recently emerged a set of adblockers that apply machine learning instead of manually curated rules and have been shown to be more robust in blocking ads on websites including social media sites such as Facebook. Among these, AdGraph is arguably the state-of-the-art learning-based adblocker. In this paper, we develop A4, a tool that intelligently crafts adversarial samples of ads to evade AdGraph. Unlike the popular research on adversarial samples against images or videos that are considered less- to un-restricted, the samples that A4 generates preserve application semantics of the web page, or are actionable. Through several experiments we show that A4 can bypass AdGraph about 60% of the time, which surpasses the state-of-the-art attack by a significant margin of 84.3%; in addition, changes to the visual layout of the web page due to these perturbations are imperceptible. We envision the algorithmic framework proposed in A4 is also promising in improving adversarial attacks against other learning-based web applications with similar requirements.