Zhongliang Yang

h-index26

23papers

1,002citations

Novelty50%

AI Score52

Ranked #14,523 of 194,257 authors (top 7%)#252 in CR (top 4%)

23 Papers

1.6CLOct 7, 2022Code

PCAE: A Framework of Plug-in Conditional Auto-Encoder for Controllable Text Generation

Haoqin Tu, Zhongliang Yang, Jinshuai Yang et al.

Controllable text generation has taken a gigantic step forward these days. Yet existing methods are either constrained in a one-off pattern or not efficient enough for receiving multiple conditions at every generation stage. We propose a model-agnostic framework Plug-in Conditional Auto-Encoder for Controllable Text Generation (PCAE) towards flexible and semi-supervised text generation. Our framework is "plug-and-play" with partial parameters to be fine-tuned in the pre-trained model (less than a half). Crucial to the success of PCAE is the proposed broadcasting label fusion network for navigating the global latent code to a specified local and confined space. Visualization of the local latent prior well confirms the primary devotion in hidden space of the proposed model. Moreover, extensive experiments across five related generation tasks (from 2 conditions up to 10 conditions) on both RNN- based and pre-trained BART [26] based auto-encoders reveal the high capability of PCAE, which enables generation that is highly manipulable, syntactically diverse and time-saving with minimum labeled samples. We will release our code at https://github.com/ImKeTT/pcae.

10.3IRMay 31, 2025Code

FinBERT2: A Specialized Bidirectional Encoder for Bridging the Gap in Finance-Specific Deployment of Large Language Models

Xuan Xu, Fufang Wen, Beilin Chu et al.

In natural language processing (NLP), the focus has shifted from encoder-only tiny language models like BERT to decoder-only large language models(LLMs) such as GPT-3. However, LLMs' practical application in the financial sector has revealed three limitations: (1) LLMs often perform worse than fine-tuned BERT on discriminative tasks despite costing much higher computational resources, such as market sentiment analysis in financial reports; (2) Application on generative tasks heavily relies on retrieval augmented generation (RAG) methods to provide current and specialized information, with general retrievers showing suboptimal performance on domain-specific retrieval tasks; (3) There are additional inadequacies in other feature-based scenarios, such as topic modeling. We introduce FinBERT2, a specialized bidirectional encoder pretrained on a high-quality, financial-specific corpus of 32b tokens. This represents the largest known Chinese financial pretraining corpus for models of this parameter size. As a better backbone, FinBERT2 can bridge the gap in the financial-specific deployment of LLMs through the following achievements: (1) Discriminative fine-tuned models (Fin-Labelers) outperform other (Fin)BERT variants by 0.4%-3.3% and leading LLMs by 9.7%-12.3% on average across five financial classification tasks. (2) Contrastive fine-tuned models (Fin-Retrievers) outperform both open-source (e.g., +6.8\% avg improvement over BGE-base-zh) and proprietary (e.g., +4.2\% avg improvement over OpenAI's text-embedding-3-large) embedders across five financial retrieval tasks; (3) Building on FinBERT2 variants, we construct the Fin-TopicModel, which enables superior clustering and topic representation for financial titles. Our work revisits financial BERT models through comparative analysis with contemporary LLMs and offers practical insights for effectively utilizing FinBERT in the LLMs era.

16.7CRMar 17, 2025Code

Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval

Pengcheng Zhou, Yinglun Feng, Zhongliang Yang

The widespread adoption of Retrieval-Augmented Generation (RAG) systems in real-world applications has heightened concerns about the confidentiality and integrity of their proprietary knowledge bases. These knowledge bases, which play a critical role in enhancing the generative capabilities of Large Language Models (LLMs), are increasingly vulnerable to breaches that could compromise sensitive information. To address these challenges, this paper proposes an advanced encryption methodology designed to protect RAG systems from unauthorized access and data leakage. Our approach encrypts both textual content and its corresponding embeddings prior to storage, ensuring that all data remains securely encrypted. This mechanism restricts access to authorized entities with the appropriate decryption keys, thereby significantly reducing the risk of unintended data exposure. Furthermore, we demonstrate that our encryption strategy preserves the performance and functionality of RAG pipelines, ensuring compatibility across diverse domains and applications. To validate the robustness of our method, we provide comprehensive security proofs that highlight its resilience against potential threats and vulnerabilities. These proofs also reveal limitations in existing approaches, which often lack robustness, adaptability, or reliance on open-source models. Our findings suggest that integrating advanced encryption techniques into the design and deployment of RAG systems can effectively enhance privacy safeguards. This research contributes to the ongoing discourse on improving security measures for AI-driven services and advocates for stricter data protection standards within RAG architectures.

6.8CRNov 13, 2019Code

IStego100K: Large-scale Image Steganalysis Dataset

Zhongliang Yang, Ke Wang, Sai Ma et al.

In order to promote the rapid development of image steganalysis technology, in this paper, we construct and release a multivariable large-scale image steganalysis dataset called IStego100K. It contains 208,104 images with the same size of 1024*1024. Among them, 200,000 images (100,000 cover-stego image pairs) are divided as the training set and the remaining 8,104 as testing set. In addition, we hope that IStego100K can help researchers further explore the development of universal image steganalysis algorithms, so we try to reduce limits on the images in IStego100K. For each image in IStego100K, the quality factors is randomly set in the range of 75-95, the steganographic algorithm is randomly selected from three well-known steganographic algorithms, which are J-uniward, nsF5 and UERD, and the embedding rate is also randomly set to be a value of 0.1-0.4. In addition, considering the possible mismatch between training samples and test samples in real environment, we add a test set (DS-Test) whose source of samples are different from the training set. We hope that this test set can help to evaluate the robustness of steganalysis algorithms. We tested the performance of some latest steganalysis algorithms on IStego100K, with specific results and analysis details in the experimental part. We hope that the IStego100K dataset will further promote the development of universal image steganalysis technology. The description of IStego100K and instructions for use can be found at https://github.com/YangzlTHU/IStego100K

4.0NEJun 26, 2019Code

Water Preservation in Soan River Basin using Deep Learning Techniques

Sadaqat ur Rehman, Zhongliang Yang, Muhammad Shahid et al.

Water supplies are crucial for the development of living beings. However, change in the hydrological process i.e. climate and land usage are the key issues. Sustaining water level and accurate estimating for dynamic conditions is a critical job for hydrologists, but predicting hydrological extremes is an open issue. In this paper, we proposed two deep learning techniques and three machine learning algorithms to predict stream flow, given the present climate conditions. The results showed that the Recurrent Neural Network (RNN) or Long Short-term Memory (LSTM), an artificial neural network based method, outperform other conventional and machine-learning algorithms for predicting stream flow. Furthermore, we analyzed that stream flow is directly affected by precipitation, land usage, and temperature. These indexes are critical, which can be used by hydrologists to identify the potential for stream flow. We make the dataset publicly available (https://github.com/sadaqat007/Dataset) so that others should be able to replicate and build upon the results published.

4.2CRNov 20, 2024Code

Efficient Streaming Voice Steganalysis in Challenging Detection Scenarios

Pengcheng Zhou, Zhengyang Fang, Zhongliang Yang et al.

In recent years, there has been an increasing number of information hiding techniques based on network streaming media, focusing on how to covertly and efficiently embed secret information into real-time transmitted network media signals to achieve concealed communication. The misuse of these techniques can lead to significant security risks, such as the spread of malicious code, commands, and viruses. Current steganalysis methods for network voice streams face two major challenges: efficient detection under low embedding rates and short duration conditions. These challenges arise because, with low embedding rates (e.g., as low as 10%) and short transmission durations (e.g., only 0.1 second), detection models struggle to acquire sufficiently rich sample features, making effective steganalysis difficult. To address these challenges, this paper introduces a Dual-View VoIP Steganalysis Framework (DVSF). The framework first randomly obfuscates parts of the native steganographic descriptors in VoIP stream segments, making the steganographic features of hard-to-detect samples more pronounced and easier to learn. It then captures fine-grained local features related to steganography, building on the global features of VoIP. Specially constructed VoIP segment triplets further adjust the feature distances within the model. Ultimately, this method effectively address the detection difficulty in VoIP. Extensive experiments demonstrate that our method significantly improves the accuracy of streaming voice steganalysis in these challenging detection scenarios, surpassing existing state-of-the-art methods and offering superior near-real-time performance.

3.6CRNov 16, 2025

A Content-Preserving Secure Linguistic Steganography

Lingyun Xiang, Chengfu Ou, Xu He et al.

Existing linguistic steganography methods primarily rely on content transformations to conceal secret messages. However, they often cause subtle yet looking-innocent deviations between normal and stego texts, posing potential security risks in real-world applications. To address this challenge, we propose a content-preserving linguistic steganography paradigm for perfectly secure covert communication without modifying the cover text. Based on this paradigm, we introduce CLstega (\textit{C}ontent-preserving \textit{L}inguistic \textit{stega}nography), a novel method that embeds secret messages through controllable distribution transformation. CLstega first applies an augmented masking strategy to locate and mask embedding positions, where MLM(masked language model)-predicted probability distributions are easily adjustable for transformation. Subsequently, a dynamic distribution steganographic coding strategy is designed to encode secret messages by deriving target distributions from the original probability distributions. To achieve this transformation, CLstega elaborately selects target words for embedding positions as labels to construct a masked sentence dataset, which is used to fine-tune the original MLM, producing a target MLM capable of directly extracting secret messages from the cover text. This approach ensures perfect security of secret messages while fully preserving the integrity of the original cover text. Experimental results show that CLstega can achieve a 100\% extraction success rate, and outperforms existing methods in security, effectively balancing embedding capacity and security.

2.7CLOct 3, 2025

Topic Modeling as Long-Form Generation: Can Long-Context LLMs revolutionize NTM via Zero-Shot Prompting?

Xuan Xu, Haolun Li, Zhongliang Yang et al.

Traditional topic models such as neural topic models rely on inference and generation networks to learn latent topic distributions. This paper explores a new paradigm for topic modeling in the era of large language models, framing TM as a long-form generation task whose definition is updated in this paradigm. We propose a simple but practical approach to implement LLM-based topic model tasks out of the box (sample a data subset, generate topics and representative text with our prompt, text assignment with keyword match). We then investigate whether the long-form generation paradigm can beat NTMs via zero-shot prompting. We conduct a systematic comparison between NTMs and LLMs in terms of topic quality and empirically examine the claim that "a majority of NTMs are outdated."

3.6IRAug 4, 2025

FinCPRG: A Bidirectional Generation Pipeline for Hierarchical Queries and Rich Relevance in Financial Chinese Passage Retrieval

Xuan Xu, Beilin Chu, Qinhong Lin et al.

In recent years, large language models (LLMs) have demonstrated significant potential in constructing passage retrieval datasets. However, existing methods still face limitations in expressing cross-doc query needs and controlling annotation quality. To address these issues, this paper proposes a bidirectional generation pipeline, which aims to generate 3-level hierarchical queries for both intra-doc and cross-doc scenarios and mine additional relevance labels on top of direct mapping annotation. The pipeline introduces two query generation methods: bottom-up from single-doc text and top-down from multi-doc titles. The bottom-up method uses LLMs to disassemble and generate structured queries at both sentence-level and passage-level simultaneously from intra-doc passages. The top-down approach incorporates three key financial elements--industry, topic, and time--to divide report titles into clusters and prompts LLMs to generate topic-level queries from each cluster. For relevance annotation, our pipeline not only relies on direct mapping annotation from the generation relationship but also implements an indirect positives mining method to enrich the relevant query-passage pairs. Using this pipeline, we constructed a Financial Passage Retrieval Generated dataset (FinCPRG) from almost 1.3k Chinese financial research reports, which includes hierarchical queries and rich relevance labels. Through evaluations of mined relevance labels, benchmarking and training experiments, we assessed the quality of FinCPRG and validated its effectiveness as a passage retrieval dataset for both training and benchmarking.

10.4CRAug 1, 2025

Provably Secure Retrieval-Augmented Generation

Pengcheng Zhou, Yinglun Feng, Zhongliang Yang

Although Retrieval-Augmented Generation (RAG) systems have been widely applied, the privacy and security risks they face, such as data leakage and data poisoning, have not been systematically addressed yet. Existing defense strategies primarily rely on heuristic filtering or enhancing retriever robustness, which suffer from limited interpretability, lack of formal security guarantees, and vulnerability to adaptive attacks. To address these challenges, this paper proposes the first provably secure framework for RAG systems(SAG). Our framework employs a pre-storage full-encryption scheme to ensure dual protection of both retrieved content and vector embeddings, guaranteeing that only authorized entities can access the data. Through formal security proofs, we rigorously verify the scheme's confidentiality and integrity under a computational security model. Extensive experiments across multiple benchmark datasets demonstrate that our framework effectively resists a range of state-of-the-art attacks. This work establishes a theoretical foundation and practical paradigm for verifiably secure RAG systems, advancing AI-powered services toward formally guaranteed security.

31.8CLJun 3, 2021Code

Provably Secure Generative Linguistic Steganography

Siyu Zhang, Zhongliang Yang, Jinshuai Yang et al.

Generative linguistic steganography mainly utilized language models and applied steganographic sampling (stegosampling) to generate high-security steganographic text (stegotext). However, previous methods generally lead to statistical differences between the conditional probability distributions of stegotext and natural text, which brings about security risks. In this paper, to further ensure security, we present a novel provably secure generative linguistic steganographic method ADG, which recursively embeds secret information by Adaptive Dynamic Grouping of tokens according to their probability given by an off-the-shelf language model. We not only prove the security of ADG mathematically, but also conduct extensive experiments on three public corpora to further verify its imperceptibility. The experimental results reveal that the proposed method is able to generate stegotext with nearly perfect security.

0.5CLJun 2, 2020

Graph-Stega: Semantic Controllable Steganographic Text Generation Guided by Knowledge Graph

Zhongliang Yang, Baitao Gong, Yamin Li et al.

Most of the existing text generative steganographic methods are based on coding the conditional probability distribution of each word during the generation process, and then selecting specific words according to the secret information, so as to achieve information hiding. Such methods have their limitations which may bring potential security risks. Firstly, with the increase of embedding rate, these models will choose words with lower conditional probability, which will reduce the quality of the generated steganographic texts; secondly, they can not control the semantic expression of the final generated steganographic text. This paper proposes a new text generative steganography method which is quietly different from the existing models. We use a Knowledge Graph (KG) to guide the generation of steganographic sentences. On the one hand, we hide the secret information by coding the path in the knowledge graph, but not the conditional probability of each generated word; on the other hand, we can control the semantic expression of the generated steganographic text to a certain extent. The experimental results show that the proposed model can guarantee both the quality of the generated text and its semantic expression, which is a supplement and improvement to the current text generation steganography.

4.3MMDec 30, 2019

Text Steganalysis with Attentional LSTM-CNN

YongJian Bao, Hao Yang, Zhongliang Yang et al.

With the rapid development of Natural Language Processing (NLP) technologies, text steganography methods have been significantly innovated recently, which poses a great threat to cybersecurity. In this paper, we propose a novel attentional LSTM-CNN model to tackle the text steganalysis problem. The proposed method firstly maps words into semantic space for better exploitation of the semantic feature in texts and then utilizes a combination of Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) recurrent neural networks to capture both local and long-distance contextual information in steganography texts. In addition, we apply attention mechanism to recognize and attend to important clues within suspicious sentences. After merge feature clues from Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), we use a softmax layer to categorize the input text as cover or stego. Experiments showed that our model can achieve the state-of-art result in the text steganalysis task.

2.6CVNov 19, 2019

A Promotion Method for Generation Error Based Video Anomaly Detection

Zhiguo Wang, Zhongliang Yang, Yu-Jin Zhang

Surveillance video anomaly detection is to detect events that rarely or never happened in a certain scene. The generation error (GE)-based methods exhibit excellent performance on this task. They firstly train a generative neural network (GNN) to generate normal samples, then judge the samples with large GEs as anomalies. Almost all the GE-based methods utilize frame-level GEs to detect anomalies. However, anomalies generally occur in local areas, the frame-level GE introduces GEs of normal areas to anomaly discriminations, that brings two problems: i) The GE of normal areas reduces the anomaly saliency of the anomalous frame. ii) Different videos have different normal-GE-levels, thus it is hard to set a uniform threshold for all videos to detect anomalies. To address these problems, we propose a promotion method: utilize the maximum of block-level GEs on the frame to detect anomaly. Firstly, we calculate the block-level GEs at each position on the frame. Then, we utilize the maximum of the block-level GEs on the frame to detect anomalies. Based on the existed GNN models, experiments are carried out on multiple datasets. The results demonstrate the effectiveness of the proposed method and achieve state-of-the-art performance.

3.3MMNov 2, 2019

FCEM: A Novel Fast Correlation Extract Model For Real Time Steganalysis of VoIP Stream via Multi-head Attention

Hao Yang, ZhongLiang Yang, YongJian Bao et al.

Extracting correlation features between codes-words with high computational efficiency is crucial to steganalysis of Voice over IP (VoIP) streams. In this paper, we utilized attention mechanisms, which have recently attracted enormous interests due to their highly parallelizable computation and flexibility in modeling correlation in sequence, to tackle steganalysis problem of Quantization Index Modulation (QIM) based steganography in compressed VoIP stream. We design a light-weight neural network named Fast Correlation Extract Model (FCEM) only based on a variant of attention called multi-head attention to extract correlation features from VoIP frames. Despite its simple form, FCEM outperforms complicated Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) models on both prediction accuracy and time efficiency. It significantly improves the best result in detecting both low embedded rates and short samples recently. Besides, the proposed model accelerates the detection speed as twice as before when the sample length is as short as 0.1s, making it a excellent method for online services.

9.7CROct 22, 2019

Behavioral Security in Covert Communication Systems

Zhongliang Yang, Yuting Hu, Yongfeng Huang et al.

The purpose of the covert communication system is to implement the communication process without causing third party perception. In order to achieve complete covert communication, two aspects of security issues need to be considered. The first one is to cover up the existence of information, that is, to ensure the content security of information; the second one is to cover up the behavior of transmitting information, that is, to ensure the behavioral security of communication. However, most of the existing information hiding models are based on the "Prisoners' Model", which only considers the content security of carriers, while ignoring the behavioral security of the sender and receiver. We think that this is incomplete for the security of covert communication. In this paper, we propose a new covert communication framework, which considers both content security and behavioral security in the process of information transmission. In the experimental part, we analyzed a large amount of collected real Twitter data to illustrate the security risks that may be brought to covert communication if we only consider content security and neglect behavioral security. Finally, we designed a toy experiment, pointing out that in addition to most of the existing content steganography, under the proposed new framework of covert communication, we can also use user's behavior to implement behavioral steganography. We hope this new proposed framework will help researchers to design better covert communication systems.

8.6MMOct 10, 2019

Hierarchical Representation Network for Steganalysis of QIM Steganography in Low-Bit-Rate Speech Signals

Hao Yang, Zhongliang Yang, YongJian Bao et al.

With the Volume of Voice over IP (VoIP) traffic rises shapely, more and more VoIP-based steganography methods have emerged in recent years, which poses a great threat to the security of cyberspace. Low bit-rate speech codecs are widely used in the VoIP application due to its powerful compression capability. QIM steganography makes it possible to hide secret information in VoIP streams. Previous research mostly focus on capturing the inter-frame correlation or inner-frame correlation features in code-words but ignore the hierarchical structure which exists in speech frame. In this paper, motivated by the complex multi-scale structure, we design a Hierarchical Representation Network to tackle the steganalysis of QIM steganography in low-bit-rate speech signal. In the proposed model, Convolution Neural Network (CNN) is used to model the hierarchical structure in the speech frame, and three level of attention mechanisms are applied at different convolution block, enabling it to attend differentially to more and less important content in speech frame. Experiments demonstrated that the steganalysis performance of the proposed method can outperforms the state-of-the-art methods especially in detecting both short and low embeded speech samples. Moreover, our model needs less computation and has higher time efficiency to be applied to real online services.

7.3MMFeb 4, 2019

Real-Time Steganalysis for Stream Media Based on Multi-channel Convolutional Sliding Windows

Zhongliang Yang, Hao Yang, Yuting Hu et al.

Previous VoIP steganalysis methods face great challenges in detecting speech signals at low embedding rates, and they are also generally difficult to perform real-time detection, making them hard to truly maintain cyberspace security. To solve these two challenges, in this paper, combined with the sliding window detection algorithm and Convolution Neural Network we propose a real-time VoIP steganalysis method which based on multi-channel convolution sliding windows. In order to analyze the correlations between frames and different neighborhood frames in a VoIP signal, we define multi channel sliding detection windows. Within each sliding window, we design two feature extraction channels which contain multiple convolution layers with multiple convolution kernels each layer to extract correlation features of the input signal. Then based on these extracted features, we use a forward fully connected network for feature fusion. Finally, by analyzing the statistical distribution of these features, the discriminator will determine whether the input speech signal contains covert information or not.We designed several experiments to test the proposed model's detection ability under various conditions, including different embedding rates, different speech length, etc. Experimental results showed that the proposed model outperforms all the previous methods, especially in the case of low embedding rate, which showed state-of-the-art performance. In addition, we also tested the detection efficiency of the proposed model, and the results showed that it can achieve almost real-time detection of VoIP speech signals.

14.7CRNov 12, 2018

Automatically Generate Steganographic Text Based on Markov Model and Huffman Coding

Zhongliang Yang, Shuyu Jin, Yongfeng Huang et al.

Steganography, as one of the three basic information security systems, has long played an important role in safeguarding the privacy and confidentiality of data in cyberspace. The text is the most widely used information carrier in people's daily life, using text as a carrier for information hiding has broad research prospects. However, due to the high coding degree and less information redundancy in the text, it has been an extremely challenging problem to hide information in it for a long time. In this paper, we propose a steganography method which can automatically generate steganographic text based on the Markov chain model and Huffman coding. It can automatically generate fluent text carrier in terms of secret information which need to be embedded. The proposed model can learn from a large number of samples written by people and obtain a good estimate of the statistical language model. We evaluated the proposed model from several perspectives. Experimental results show that the performance of the proposed model is superior to all the previous related methods in terms of information imperceptibility and information hidden capacity.

9.6CROct 18, 2018

TS-CNN: Text Steganalysis from Semantic Space Based on Convolutional Neural Network

Zhongliang Yang, Nan Wei, Junyi Sheng et al.

Steganalysis has been an important research topic in cybersecurity that helps to identify covert attacks in public network. With the rapid development of natural language processing technology in the past two years, coverless steganography has been greatly developed. Previous text steganalysis methods have shown unsatisfactory results on this new steganography technique and remain an unsolved challenge. Different from all previous text steganalysis methods, in this paper, we propose a text steganalysis method(TS-CNN) based on semantic analysis, which uses convolutional neural network(CNN) to extract high-level semantic features of texts, and finds the subtle distribution differences in the semantic space before and after embedding the secret information. To train and test the proposed model, we collected and released a large text steganalysis(CT-Steg) dataset, which contains a total number of 216,000 texts with various lengths and various embedding rates. Experimental results show that the proposed model can achieve nearly 100\% precision and recall, outperforms all the previous methods. Furthermore, the proposed model can even estimate the capacity of the hidden information inside. These results strongly support that using the subtle changes in the semantic space before and after embedding the secret information to conduct text steganalysis is feasible and effective.

8.5CRSep 10, 2018Code

AAG-Stega: Automatic Audio Generation-based Steganography

Zhongliang Yang, Xingjian Du, Yilin Tan et al.

Steganography, as one of the three basic information security systems, has long played an important role in safeguarding the privacy and confidentiality of data in cyberspace. Audio is one of the most common means of information transmission in our daily life. Thus it's of great practical significance to using audio as a carrier of information hiding. At present, almost all audio-based information hiding methods are based on carrier modification mode. However, this mode is equivalent to adding noise to the original signal, resulting in a difference in the statistical feature distribution of the carrier before and after steganography, which impairs the concealment of the entire system. In this paper, we propose an automatic audio generation-based steganography(AAG-Stega), which can automatically generate high-quality audio covers on the basis of the secret bits stream that needs to be embedded. In the automatic audio generation process, we reasonably encode the conditional probability distribution space of each sampling point and select the corresponding signal output according to the bitstream to realize the secret information embedding. We designed several experiments to test the proposed model from the perspectives of information imperceptibility and information hidden capacity. The experimental results show that the proposed model can guarantee high hidden capacity and concealment at the same time.

2.3CRSep 9, 2018

A novel method of speech information hiding based on 3D-Magic Matrix

Zhongliang Yang, Xueshun Peng, Yongfeng Huang et al.

Redundant information of low-bit-rate speech is extremely small, thus it's very difficult to implement large capacity steganography on the low-bit-rate speech. Based on multiple vector quantization characteristics of the Line Spectrum Pair (LSP) of the speech codec, this paper proposes a steganography scheme using a 3D-Magic matrix to enlarge capacity and improve quality of speech. A cyclically moving algorithm to construct a 3D-Magic matrix for steganography is proposed in this paper, as well as an embedding and an extracting algorithm of steganography based on the 3D-Magic matrix in low-bit-rate speech codec. Theoretical analysis is provided to demonstrate that the concealment and the hidden capacity are greatly improved with the proposed scheme. Experimental results show the hidden capacity is raised to 200bps in ITU-T G.723.1 codec. Moreover, the quality of steganography speech in Perceptual Evaluation of Speech Quality (PESQ) reduces no more than 4%, indicating a little impact on the quality of speech. In addition, the proposed hidden scheme could prevent being detected by some steganalysis tools effectively.

5.6CVJun 8, 2017

Image Captioning with Object Detection and Localization

Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman et al.

Automatically generating a natural language description of an image is a task close to the heart of image understanding. In this paper, we present a multi-model neural network method closely related to the human visual system that automatically learns to describe the content of images. Our model consists of two sub-models: an object detection and localization model, which extract the information of objects and their spatial relationship in images respectively; Besides, a deep recurrent neural network (RNN) based on long short-term memory (LSTM) units with attention mechanism for sentences generation. Each word of the description will be automatically aligned to different objects of the input image when it is generated. This is similar to the attention mechanism of the human visual system. Experimental results on the COCO dataset showcase the merit of the proposed method, which outperforms previous benchmark models.