Tao Wan

10papers

287citations

Novelty48%

AI Score45

Ranked #65,453 of 201,326 authors (top 33%)#12,630 in CL (top 39%)

10 Papers

64.6CLJun 3

PersonaTree: Structured Lifecycle Memory for Person Understanding in LLM Agents

Yubo Hou, Jingwei Song, Hongbo Zhang et al.

Persistent LLM agents require memory representations that make the formation of person understanding explicit across long term interaction. Existing agent memory methods emphasize information retention and retrieval, yet give limited account of how accumulated interaction evidence is abstracted into person understanding. We view this process as schema formation, where situated evidence is abstracted into reusable patterns and stable person level claims. We introduce PersonaTree, a structured lifecycle memory framework that realizes this view as a three level persona tree with explicit support paths from evidence to claims. PersonaTree maintains the tree through conservative writing, confidence guided consolidation, and query conditioned path retrieval, returning only the evidence depth required by each query. Across six person understanding and persistent memory benchmarks with three answer backbones, PersonaTree ranks first in 12 of 18 compact scores and reaches the top two in 16 settings. Ablations show that hierarchy improves abstract person understanding on KnowMe, while support path retrieval improves RealPref alignment under a comparable context budget.

CVApr 10, 2022

Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog

Shunyu Zhang, Xiaoze Jiang, Zequn Yang et al.

Visual Dialog requires an agent to engage in a conversation with humans grounded in an image. Many studies on Visual Dialog focus on the understanding of the dialog history or the content of an image, while a considerable amount of commonsense-required questions are ignored. Handling these scenarios depends on logical reasoning that requires commonsense priors. How to capture relevant commonsense knowledge complementary to the history and the image remains a key challenge. In this paper, we propose a novel model by Reasoning with Multi-structure Commonsense Knowledge (RMK). In our model, the external knowledge is represented with sentence-level facts and graph-level facts, to properly suit the scenario of the composite of dialog history and image. On top of these multi-structure representations, our model can capture relevant knowledge and incorporate them into the vision and semantic features, via graph-based interaction and transformer-based fusion. Experimental results and analysis on VisDial v1.0 and VisDialCK datasets show that our proposed model effectively outperforms comparative methods.

CLJan 9

FlashMem: Distilling Intrinsic Latent Memory via Computation Reuse

Yubo Hou, Zhisheng Chen, Tao Wan et al.

The stateless architecture of Large Language Models inherently lacks the mechanism to preserve dynamic context, compelling agents to redundantly reprocess history to maintain long-horizon autonomy. While latent memory offers a solution, current approaches are hindered by architectural segregation, relying on auxiliary encoders that decouple memory from the reasoning backbone. We propose FlashMem, a framework that distills intrinsic memory directly from transient reasoning states via computation reuse. Leveraging the property that internal representations uniquely encode input trajectories, FlashMem identifies the last hidden state as a sufficient statistic for the interaction history. This enables a Shared-KV Consolidator to synthesize memory by attending directly to the backbone's frozen cache, eliminating redundant re-parameterization. Furthermore, a parameter-free Cognitive Monitor leverages attention entropy to adaptively trigger consolidation only when high epistemic uncertainty is detected. Experiments demonstrate that FlashMem matches the performance of heavy baselines while reducing inference latency by 5 times, effectively bridging the gap between efficiency and persistent cognition.

CRNov 20, 2018

FALCON: A Fourier Transform Based Approach for Fast and Secure Convolutional Neural Network Predictions

Shaohua Li, Kaiping Xue, Chenkai Ding et al.

Machine learning as a service has been widely deployed to utilize deep neural network models to provide prediction services. However, this raises privacy concerns since clients need to send sensitive information to servers. In this paper, we focus on the scenario where clients want to classify private images with a convolutional neural network model hosted in the server, while both parties keep their data private. We present FALCON, a fast and secure approach for CNN predictions based on Fourier Transform. Our solution enables linear layers of a CNN model to be evaluated simply and efficiently with fully homomorphic encryption. We also introduce the first efficient and privacy-preserving protocol for softmax function, which is an indispensable component in CNNs and has not yet been evaluated in previous works due to its high complexity. We implemented the FALCON and evaluated the performance on real-world CNN models. The experimental results show that FALCON outperforms the best known works in both computation and communication cost.

CVNov 1, 2018

A sequential guiding network with attention for image captioning

Daouda Sow, Zengchang Qin, Mouhamed Niasse et al.

The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description generation from natural images. In this challenge, the encoder-decoder framework has achieved promising performance when a convolutional neural network (CNN) is used as image encoder and a recurrent neural network (RNN) as decoder. In this paper, we introduce a sequential guiding network that guides the decoder during word generation. The new model is an extension of the encoder-decoder framework with attention that has an additional guiding long short-term memory (LSTM) and can be trained in an end-to-end manner by using image/descriptions pairs. We validate our approach by conducting extensive experiments on a benchmark dataset, i.e., MS COCO Captions. The proposed model achieves significant improvement comparing to the other state-of-the-art deep learning models.

CVNov 1, 2018

Pixel Level Data Augmentation for Semantic Image Segmentation using Generative Adversarial Networks

Shuangting Liu, Jiaqi Zhang, Yuxin Chen et al.

Semantic segmentation is one of the basic topics in computer vision, it aims to assign semantic labels to every pixel of an image. Unbalanced semantic label distribution could have a negative influence on segmentation accuracy. In this paper, we investigate using data augmentation approach to balance the semantic label distribution in order to improve segmentation performance. We propose using generative adversarial networks (GANs) to generate realistic images for improving the performance of semantic segmentation networks. Experimental results show that the proposed method can not only improve segmentation performance on those classes with low accuracy, but also obtain 1.3% to 2.1% increase in average segmentation accuracy. It shows that this augmentation method can boost accuracy and be easily applicable to any other segmentation models.

CLDec 1, 2017

Text Generation Based on Generative Adversarial Nets with Latent Variable

Heng Wang, Zengchang Qin, Tao Wan

In this paper, we propose a model using generative adversarial net (GAN) to generate realistic text. Instead of using standard GAN, we combine variational autoencoder (VAE) with generative adversarial net. The use of high-level latent random variables is helpful to learn the data distribution and solve the problem that generative adversarial net always emits the similar data. We propose the VGAN model where the generative model is composed of recurrent neural network and VAE. The discriminative model is a convolutional neural network. We train the model via policy gradient. We apply the proposed model to the task of text generation and compare it to other recent neural network based models, such as recurrent neural network language model and SeqGAN. We evaluate the performance of the model by calculating negative log-likelihood and the BLEU score. We conduct experiments on three benchmark datasets, and results show that our model outperforms other previous models.

CLMay 9, 2017

Logical Parsing from Natural Language Based on a Neural Translation Model

Liang Li, Pengyu Li, Yifan Liu et al.

Semantic parsing has emerged as a significant and powerful paradigm for natural language interface and question answering systems. Traditional methods of building a semantic parser rely on high-quality lexicons, hand-crafted grammars and linguistic features which are limited by applied domain or representation. In this paper, we propose a general approach to learn from denotations based on Seq2Seq model augmented with attention mechanism. We encode input sequence into vectors and use dynamic programming to infer candidate logical forms. We utilize the fact that similar utterances should have similar logical forms to help reduce the searching space. Under our learning policy, the Seq2Seq model can learn mappings gradually with noises. Curriculum learning is adopted to make the learning smoother. We test our method on the arithmetic domain which shows our model can successfully infer the correct logical forms and learn the word meanings, compositionality and operation orders simultaneously.

CVMay 8, 2017

Generative Cooperative Net for Image Generation and Data Augmentation

Qiangeng Xu, Zengchang Qin, Tao Wan

How to build a good model for image generation given an abstract concept is a fundamental problem in computer vision. In this paper, we explore a generative model for the task of generating unseen images with desired features. We propose the Generative Cooperative Net (GCN) for image generation. The idea is similar to generative adversarial networks except that the generators and discriminators are trained to work accordingly. Our experiments on hand-written digit generation and facial expression generation show that GCN's two cooperative counterparts (the generator and the classifier) can work together nicely and achieve promising results. We also discovered a usage of such generative model as an data-augmentation tool. Our experiment of applying this method on a recognition task shows that it is very effective comparing to other existing methods. It is easy to set up and could help generate a very large synthesized dataset.

NIMar 20, 2017

A Framework and Comparative Analysis of Control Plane Security of SDN and Conventional Networks

AbdelRahman Abdou, Paul C. van Oorschot, Tao Wan

Software defined networking implements the network control plane in an external entity, rather than in each individual device as in conventional networks. This architectural difference implies a different design for control functions necessary for essential network properties, e.g., loop prevention and link redundancy. We explore how such differences redefine the security weaknesses in the SDN control plane and provide a framework for comparative analysis which focuses on essential network properties required by typical production networks. This enables analysis of how these properties are delivered by the control planes of SDN and conventional networks, and to compare security risks and mitigations. Despite the architectural difference, we find similar, but not identical, exposures in control plane security if both network paradigms provide the same network properties and are analyzed under the same threat model. However, defenses vary; SDN cannot depend on edge based filtering to protect its control plane, while this is arguably the primary defense in conventional networks. Our concrete security analysis suggests that a distributed SDN architecture that supports fault tolerance and consistency checks is important for SDN control plane security. Our analysis methodology may be of independent interest for future security analysis of SDN and conventional networks.