Xiaohong Li

CL
h-index12
23papers
2,209citations
Novelty43%
AI Score54

23 Papers

SDMay 2, 2022
A Novel Speech-Driven Lip-Sync Model with CNN and LSTM

Xiaohong Li, Xiang Wang, Kai Wang et al.

Generating synchronized and natural lip movement with speech is one of the most important tasks in creating realistic virtual characters. In this paper, we present a combined deep neural network of one-dimensional convolutions and LSTM to generate vertex displacement of a 3D template face model from variable-length speech input. The motion of the lower part of the face, which is represented by the vertex movement of 3D lip shapes, is consistent with the input speech. In order to enhance the robustness of the network to different sound signals, we adapt a trained speech recognition model to extract speech feature, and a velocity loss term is adopted to reduce the jitter of generated facial animation. We recorded a series of videos of a Chinese adult speaking Mandarin and created a new speech-animation dataset to compensate the lack of such public data. Qualitative and quantitative evaluations indicate that our model is able to generate smooth and natural lip movements synchronized with speech.

CVOct 6, 2022
Vision-Based Defect Classification and Weight Estimation of Rice Kernels

Xiang Wang, Kai Wang, Xiaohong Li et al.

Rice is one of the main staple food in many areas of the world. The quality estimation of rice kernels are crucial in terms of both food safety and socio-economic impact. This was usually carried out by quality inspectors in the past, which may result in both objective and subjective inaccuracies. In this paper, we present an automatic visual quality estimation system of rice kernels, to classify the sampled rice kernels according to their types of flaws, and evaluate their quality via the weight ratios of the perspective kernel types. To compensate for the imbalance of different kernel numbers and classify kernels with multiple flaws accurately, we propose a multi-stage workflow which is able to locate the kernels in the captured image and classify their properties. We define a novel metric to measure the relative weight of each kernel in the image from its area, such that the relative weight of each type of kernels with regard to the all samples can be computed and used as the basis for rice quality estimation. Various experiments are carried out to show that our system is able to output precise results in a contactless way and replace tedious and error-prone manual works.

CLMar 6
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Yu Chen, Runkai Chen, Sheng Yi et al.

Long-term memory is a cornerstone of human intelligence. Enabling AI to process lifetime-scale information remains a long-standing pursuit in the field. Due to the constraints of full-attention architectures, the effective context length of large language models (LLMs) is typically limited to 1M tokens. Existing approaches, such as hybrid linear attention, fixed-size memory states (e.g., RNNs), and external storage methods like RAG or agent systems, attempt to extend this limit. However, they often suffer from severe precision degradation and rapidly increasing latency as context length grows, an inability to dynamically modify memory content, or a lack of end-to-end optimization. These bottlenecks impede complex scenarios like large-corpus summarization, Digital Twins, and long-history agent reasoning, while limiting memory capacity and slowing inference. We present Memory Sparse Attention (MSA), an end-to-end trainable, efficient, and massively scalable memory model framework. Through core innovations including scalable sparse attention and document-wise RoPE, MSA achieves linear complexity in both training and inference while maintaining exceptional stability, exhibiting less than 9% degradation when scaling from 16K to 100M tokens. Furthermore, KV cache compression, combined with Memory Parallel, enables 100M-token inference on 2xA800 GPUs. We also propose Memory Interleaving to facilitate complex multi-hop reasoning across scattered memory segments. MSA significantly surpasses frontier LLMs, state-of-the-art RAG systems, and leading memory agents in long-context benchmarks. These results demonstrate that by decoupling memory capacity from reasoning, MSA provides a scalable foundation to endow general-purpose models with intrinsic, lifetime-scale memory.

CRMar 27
Privacy-Enhancing Encryption in Data Sharing: A Survey on Security, Performance and Functionality

Yongyang Lv, Xiaohong Li, Ruitao Feng et al.

The vigorous development of the Internet has spurred exponential data growth, yet data is predominantly stored in isolated user entities, hampering its full value realization. In large-scale deployment of ``AI+industries'' such as smart medical care, intelligent transportation and smart homes, the gap between data supply and demand continues to widen, and establishing an effective data sharing mechanism is the core of promoting high-quality industrial development. However, data sharing faces significant challenges in security, performance, and functional adaptability. Privacy-enhancing encryption technologies, including Attribute-Based Encryption (ABE), Proxy Re-encryption (PRE), and Searchable Encryption (SE), offer promising solutions with distinct advantages in enhancing security, improving flexibility, and enabling efficient sharing. Statistical analysis of relevant literature from 2020 to 2025 reveals a rising research trend in ABE, PRE and SE, focusing on their data sharing applications. Firstly, this work proposes a data sharing process framework and identifies 20 potential attacks across its stages. Secondly, this work integrates ABE, SE, PRE with 12 enhancement technologies and examines their multi-dimensional impacts on the security, performance, and functional adaptability of data sharing schemes. Lastly, this work outlines key application scenarios, challenges, and future research directions, providing valuable insights for advancing data sharing mechanisms based on privacy-enhancing encryption technologies.

CROct 24, 2025
QAE-BAC: Achieving Quantifiable Anonymity and Efficiency in Blockchain-Based Access Control with Attribute

Jie Zhang, Xiaohong Li, Mengke Zhang et al.

Blockchain-based Attribute-Based Access Control (BC-ABAC) offers a decentralized paradigm for secure data governance but faces two inherent challenges: the transparency of blockchain ledgers threatens user privacy by enabling reidentification attacks through attribute analysis, while the computational complexity of policy matching clashes with blockchain's performance constraints. Existing solutions, such as those employing Zero-Knowledge Proofs (ZKPs), often incur high overhead and lack measurable anonymity guarantees, while efficiency optimizations frequently ignore privacy implications. To address these dual challenges, this paper proposes QAEBAC (Quantifiable Anonymity and Efficiency in Blockchain-Based Access Control with Attribute). QAE-BAC introduces a formal (r, t)-anonymity model to dynamically quantify the re-identification risk of users based on their access attributes and history. Furthermore, it features an Entropy-Weighted Path Tree (EWPT) that optimizes policy structure based on realtime anonymity metrics, drastically reducing policy matching complexity. Implemented and evaluated on Hyperledger Fabric, QAE-BAC demonstrates a superior balance between privacy and performance. Experimental results show that it effectively mitigates re-identification risks and outperforms state-of-the-art baselines, achieving up to an 11x improvement in throughput and an 87% reduction in latency, proving its practicality for privacy-sensitive decentralized applications.

LGMar 21
Neuronal Self-Adaptation Enhances Capacity and Robustness of Representation in Spiking Neural Networks

Zhuobin Yang, Yeyao Bao, Liangfu Lv et al.

Spiking Neural Networks (SNNs) are promising for energy-efficient, real-time edge computing, yet their performance is often constrained by the limited adaptability of conventional leaky integrate-and-fire (LIF) neurons. Existing LIF models struggle with restricted information capacity and susceptibility to noise, leading to degraded accuracy and compromised robustness. Inspired by the dynamic self-regulation of biological potassium channels, we propose the Potassium-regulated LIF (KvLIF) neuron model. KvLIF introduces an auxiliary conductance state that integrates membrane potential and spiking history to adaptively modulate neuronal excitability and reset dynamics. This design extends the dynamic response range of neurons to varying input intensities and effectively suppresses noise-induced spikes. We extensively evaluate KvLIF on both static image and neuromorphic datasets, demonstrating consistent improvements in classification accuracy and superior robustness compared to existing LIF models. Our work bridges biological plausibility with computational efficiency, offering a neuron model that enhances SNN performance while maintaining suitability for low-power neuromorphic deployment.

CLJan 28
Beyond the Needle's Illusion: Decoupled Evaluation of Evidence Access and Use under Semantic Interference at 326M-Token Scale

Tianwei Lin, Zuyi Zhou, Xinda Zhao et al.

Long-context LLM agents must access the right evidence from large environments and use it faithfully. However, the popular Needle-in-a-Haystack (NIAH) evaluation mostly measures benign span localization. The needle is near-unique, and the haystack is largely irrelevant. We introduce EverMemBench-S (EMB-S), an adversarial NIAH-style benchmark built on a 326M-token MemoryBank. While the full MemoryBank spans 326M tokens for retrieval-based (RAG) evaluation, we evaluate native long-context models only at scales that fit within each model's context window (up to 1M tokens in this work) to ensure a fair comparison. EMB-S pairs queries with collision-tested near-miss hard negatives and gold evidence sets spanning one or more documents, validated via human screening and LLM verification. We also propose a decoupled diagnostic protocol that reports evidence access (document-ID localization) separately from end-to-end QA quality under full-context prompting. This enables consistent diagnosis for both native long-context prompting and retrieval pipelines. Across a reference-corpus ladder from domain-isolated 64K contexts to a globally shared 326M-token environment, we observe a clear reality gap. Systems that saturate benign NIAH degrade sharply in evidence access under semantic interference. These results indicate that semantic discrimination, not context length alone, is the dominant bottleneck for long-context memory at scale.

CRMar 6
SemFuzz: A Semantics-Aware Fuzzing Framework for Network Protocol Implementations

Yanbang Sun, Quan Luo, Yuelin Wang et al.

Network protocols are the foundation of modern communication, yet their implementations often contain semantic vulnerabilities stemming from inadequate understanding of specification semantics. Existing gray-box and black-box testing approaches lack semantic modeling of protocols, making it difficult to precisely express testing intent and cover boundary conditions. Moreover, they typically rely on coarse-grained oracles such as crashes, which are inadequate for identifying deep semantic vulnerabilities. To address these limitations, we present a semantics-aware fuzzing framework, SemFuzz. The framework leverages large language models to extract structured semantic rules from RFC documents and generates test cases that intentionally violate these rules to encode specific testing intents. It then detects deep semantic vulnerabilities by comparing the observed responses with the expected ones. Evaluation on seven widely deployed protocol implementations shows that SemFuzz identified sixteen potential vulnerabilities, ten of which have been confirmed. Among the confirmed vulnerabilities, five were previously unknown and four have been assigned CVEs. These results demonstrate the effectiveness of SemFuzz in detecting semantic vulnerabilities.

CLFeb 1
EverMemBench: Benchmarking Long-Term Interactive Memory in Large Language ModelsEverMemBench: Benchmarking Long-Term Interactive Memory in Large Language Models

Chuanrui Hu, Tong Li, Xingze Gao et al.

Long-term conversational memory is essential for LLM-based assistants, yet existing benchmarks focus on dyadic, single-topic dialogues that fail to capture real-world complexity. We introduce EverMemBench, a benchmark featuring multi-party, multi-group conversations spanning over 1 million tokens with temporally evolving information, cross-topic interleaving, and role-specific personas. EverMemBench evaluates memory systems across three dimensions through 1,000+ QA pairs: fine-grained recall, memory awareness, and user profile understanding. Our evaluation reveals critical limitations: (1) multi-hop reasoning collapses in multi-party settings, with even oracle models achieving only 26%; (2) temporal reasoning remains unsolved, requiring version semantics beyond timestamp matching; (3) memory awareness is bottlenecked by retrieval, where current similarity-based methods fail to bridge the semantic gap between queries and implicitly relevant memories. EverMemBench provides a challenging testbed for developing next-generation memory architectures.

SEFeb 19, 2025
Explore-Construct-Filter: An Automated Framework for Rich and Reliable API Knowledge Graph Construction

Yanbang Sun, Qing Huang, Xiaoxue Ren et al.

The API Knowledge Graph (API KG) is a structured network that models API entities and their relations, providing essential semantic insights for tasks such as API recommendation, code generation, and API misuse detection. However, constructing a knowledge-rich and reliable API KG presents several challenges. Existing schema-based methods rely heavily on manual annotations to design KG schemas, leading to excessive manual overhead. On the other hand, schema-free methods, due to the lack of schema guidance, are prone to introducing noise, reducing the KG's reliability. To address these issues, we propose the Explore-Construct-Filter framework, an automated approach for API KG construction based on large language models (LLMs). This framework consists of three key modules: 1) KG exploration: LLMs simulate the workflow of annotators to automatically design a schema with comprehensive type triples, minimizing human intervention; 2) KG construction: Guided by the schema, LLMs extract instance triples to construct a rich yet unreliable API KG; 3) KG filtering: Removing invalid type triples and suspicious instance triples to construct a rich and reliable API KG. Experimental results demonstrate that our method surpasses the state-of-the-art method, achieving a 25.2% improvement in F1 score. Moreover, the Explore-Construct-Filter framework proves effective, with the KG exploration module increasing KG richness by 133.6% and the KG filtering module improving reliability by 26.6%. Finally, cross-model experiments confirm the generalizability of our framework.

LGFeb 27, 2022
Short-term passenger flow prediction for multi-traffic modes: A Transformer and residual network based multi-task learning method

Yongjie Yang, Jinlei Zhang, Lixing Yang et al.

With the prevailing of mobility as a service (MaaS), it becomes increasingly important to manage multi-traffic modes simultaneously and cooperatively. As an important component of MaaS, short-term passenger flow prediction for multi-traffic modes has thus been brought into focus. It is a challenging problem because the spatiotemporal features of multi-traffic modes are critically complex. Moreover, the passenger flows of multi-traffic modes differentiate and fluctuate significantly. To solve these problems, this paper proposes a multitask learning-based model, called Res-Transformer, for short-term inflow prediction of multi-traffic modes (subway, taxi, and bus). Each traffic mode is treated as a single task in the model. The Res-Transformer consists of two parts: (1) several modified Transformer layers comprising the conv-Transformer layer and the multi-head attention mechanism, which helps to extract the spatial and temporal features of multi-traffic modes, (2) the structure of residual network is utilized to obtain the correlations of different traffic modes and prevent gradient vanishing, gradient explosion, and overfitting. The Res-Transformer model is evaluated on two large-scale real-world datasets from Beijing, China. One is the region of a traffic hub and the other is the region of a residential area. Experiments are conducted to compare the performance of the proposed model with several baseline models. Results prove the effectiveness and robustness of the proposed method. This paper can give critical insights into the short-term inflow prediction for multi-traffic modes.

IVMay 12, 2021
AVA: Adversarial Vignetting Attack against Visual Recognition

Binyu Tian, Felix Juefei-Xu, Qing Guo et al.

Vignetting is an inherited imaging phenomenon within almost all optical systems, showing as a radial intensity darkening toward the corners of an image. Since it is a common effect for photography and usually appears as a slight intensity variation, people usually regard it as a part of a photo and would not even want to post-process it. Due to this natural advantage, in this work, we study vignetting from a new viewpoint, i.e., adversarial vignetting attack (AVA), which aims to embed intentionally misleading information into vignetting and produce a natural adversarial example without noise patterns. This example can fool the state-of-the-art deep convolutional neural networks (CNNs) but is imperceptible to humans. To this end, we first propose the radial-isotropic adversarial vignetting attack (RI-AVA) based on the physical model of vignetting, where the physical parameters (e.g., illumination factor and focal length) are tuned through the guidance of target CNN models. To achieve higher transferability across different CNNs, we further propose radial-anisotropic adversarial vignetting attack (RA-AVA) by allowing the effective regions of vignetting to be radial-anisotropic and shape-free. Moreover, we propose the geometry-aware level-set optimization method to solve the adversarial vignetting regions and physical parameters jointly. We validate the proposed methods on three popular datasets, i.e., DEV, CIFAR10, and Tiny ImageNet, by attacking four CNNs, e.g., ResNet50, EfficientNet-B0, DenseNet121, and MobileNet-V2, demonstrating the advantages of our methods over baseline methods on both transferability and image quality.

CLApr 16, 2021
IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Samuel Cahyawijaya, Genta Indra Winata, Bryan Wilie et al.

Natural language generation (NLG) benchmarks provide an important avenue to measure progress and develop better NLG systems. Unfortunately, the lack of publicly available NLG benchmarks for low-resource languages poses a challenging barrier for building NLG systems that work well for languages with limited amounts of data. Here we introduce IndoNLG, the first benchmark to measure natural language generation (NLG) progress in three low-resource -- yet widely spoken -- languages of Indonesia: Indonesian, Javanese, and Sundanese. Altogether, these languages are spoken by more than 100 million native speakers, and hence constitute an important use case of NLG systems today. Concretely, IndoNLG covers six tasks: summarization, question answering, chit-chat, and three different pairs of machine translation (MT) tasks. We collate a clean pretraining corpus of Indonesian, Sundanese, and Javanese datasets, Indo4B-Plus, which is used to pretrain our models: IndoBART and IndoGPT. We show that IndoBART and IndoGPT achieve competitive performance on all tasks -- despite using only one-fifth the parameters of a larger multilingual model, mBART-LARGE (Liu et al., 2020). This finding emphasizes the importance of pretraining on closely related, local languages to achieve more efficient learning and faster inference for very low-resource languages like Javanese and Sundanese.

LGJan 5, 2021
Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization

Jiamou Sun, Zhenchang Xing, Hao Guo et al.

ExploitDB is one of the important public websites, which contributes a large number of vulnerabilities to official CVE database. Over 60\% of these vulnerabilities have high- or critical-security risks. Unfortunately, over 73\% of exploits appear publicly earlier than the corresponding CVEs, and about 40\% of exploits do not even have CVEs. To assist in documenting CVEs for the ExploitDB posts, we propose an open information method to extract 9 key vulnerability aspects (vulnerable product/version/component, vulnerability type, vendor, attacker type, root cause, attack vector and impact) from the verbose and noisy ExploitDB posts. The extracted aspects from an ExploitDB post are then composed into a CVE description according to the suggested CVE description templates, which is must-provided information for requesting new CVEs. Through the evaluation on 13,017 manually labeled sentences and the statistically sampling of 3,456 extracted aspects, we confirm the high accuracy of our extraction method. Compared with 27,230 reference CVE descriptions. Our composed CVE descriptions achieve high ROUGH-L (0.38), a longest common subsequence based metric for evaluating text summarization methods.

CVJan 4, 2021
Stereo Correspondence and Reconstruction of Endoscopic Data Challenge

Max Allan, Jonathan Mcleod, Congcong Wang et al.

The stereo correspondence and reconstruction of endoscopic data sub-challenge was organized during the Endovis challenge at MICCAI 2019 in Shenzhen, China. The task was to perform dense depth estimation using 7 training datasets and 2 test sets of structured light data captured using porcine cadavers. These were provided by a team at Intuitive Surgical. 10 teams participated in the challenge day. This paper contains 3 additional methods which were submitted after the challenge finished as well as a supplemental section from these teams on issues they found with the dataset.

IVSep 19, 2020
Bias Field Poses a Threat to DNN-based X-Ray Recognition

Binyu Tian, Qing Guo, Felix Juefei-Xu et al.

The chest X-ray plays a key role in screening and diagnosis of many lung diseases including the COVID-19. More recently, many works construct deep neural networks (DNNs) for chest X-ray images to realize automated and efficient diagnosis of lung diseases. However, bias field caused by the improper medical image acquisition process widely exists in the chest X-ray images while the robustness of DNNs to the bias field is rarely explored, which definitely poses a threat to the X-ray-based automated diagnosis system. In this paper, we study this problem based on the recent adversarial attack and propose a brand new attack, i.e., the adversarial bias field attack where the bias field instead of the additive noise works as the adversarial perturbations for fooling the DNNs. This novel attack posts a key problem: how to locally tune the bias field to realize high attack success rate while maintaining its spatial smoothness to guarantee high realisticity. These two goals contradict each other and thus has made the attack significantly challenging. To overcome this challenge, we propose the adversarial-smooth bias field attack that can locally tune the bias field with joint smooth & adversarial constraints. As a result, the adversarial X-ray images can not only fool the DNNs effectively but also retain very high level of realisticity. We validate our method on real chest X-ray datasets with powerful DNNs, e.g., ResNet50, DenseNet121, and MobileNet, and show different properties to the state-of-the-art attacks in both image realisticity and attack transferability. Our method reveals the potential threat to the DNN-based X-ray automated diagnosis and can definitely benefit the development of bias-field-robust automated diagnosis system.

CLSep 11, 2020
IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

Bryan Wilie, Karissa Vincentio, Genta Indra Winata et al.

Although Indonesian is known to be the fourth most frequently used language over the internet, the research progress on this language in the natural language processing (NLP) is slow-moving due to a lack of available resources. In response, we introduce the first-ever vast resource for the training, evaluating, and benchmarking on Indonesian natural language understanding (IndoNLU) tasks. IndoNLU includes twelve tasks, ranging from single sentence classification to pair-sentences sequence labeling with different levels of complexity. The datasets for the tasks lie in different domains and styles to ensure task diversity. We also provide a set of Indonesian pre-trained models (IndoBERT) trained from a large and clean Indonesian dataset Indo4B collected from publicly available sources such as social media texts, blogs, news, and websites. We release baseline models for all twelve tasks, as well as the framework for benchmark evaluation, and thus it enables everyone to benchmark their system performances.

SEAug 31, 2020
A3Ident: A Two-phased Approach to Identify the Leading Authors of Android Apps

Wei Wang, Guozhu Meng, Haoyu Wang et al.

Authorship identification is the process of identifying and classifying authors through given codes. Authorship identification can be used in a wide range of software domains, e.g., code authorship disputes, plagiarism detection, exposure of attackers' identity. Besides the inherent challenges from legacy software development, framework programming and crowdsourcing mode in Android raise the difficulties of authorship identification significantly. More specifically, widespread third party libraries and inherited components (e.g., classes, methods, and variables) dilute the primary code within the entire Android app and blur the boundaries of code written by different authors. However, prior research has not well addressed these challenges. To this end, we design a two-phased approach to attribute the primary code of an Android app to the specific developer. In the first phase, we put forward three types of strategies to identify the relationships between Java packages in an app, which consist of context, semantic and structural relationships. A package aggregation algorithm is developed to cluster all packages that are of high probability written by the same authors. In the second phase, we develop three types of features to capture authors' coding habits and code stylometry. Based on that, we generate fingerprints for an author from its developed Android apps and employ several machine learning algorithms for authorship classification. We evaluate our approach in three datasets that contain 15,666 apps from 257 distinct developers and achieve a 92.5% accuracy rate on average. Additionally, we test it on 2,900 obfuscated apps and our approach can classify apps with an accuracy rate of 80.4%.

SEAug 6, 2020
Predicting Missing Information of Key Aspects in Vulnerability Reports

Hao Guo, Zhenchang Xing, Xiaohong Li

Software vulnerabilities have been continually disclosed and documented. An important practice in documenting vulnerabilities is to describe the key vulnerability aspects, such as vulnerability type, root cause, affected product, impact, attacker type and attack vector, for the effective search and management of fast-growing vulnerabilities. We investigate 120,103 vulnerability reports in the Common Vulnerabilities and Exposures (CVE) over the past 20 years. We find that 56%, 85%, 38% and 28% of CVEs miss vulnerability type, root causes, attack vector and attacker type respectively. To help to complete the missing information of these vulnerability aspects, we propose a neural-network based approach for predicting the missing information of a key aspect of a vulnerability based on the known aspects of the vulnerability. We explore the design space of the neural network models and empirically identify the most effective model design. Using a large-scale vulnerability datas\-et from CVE, we show that we can effectively train a neural-network based classifier with less than 20% of historical CVEs. Our model achieves the prediction accuracy 94%, 79%, 89%and 70% for vulnerability type, root cause, attacker type and attack vector, respectively. Our ablation study reveals the prominent correlations among vulnerability aspects and further confirms the practicality of our approach.

LGSep 15, 2019
An Empirical Study towards Characterizing Deep Learning Development and Deployment across Different Frameworks and Platforms

Qianyu Guo, Sen Chen, Xiaofei Xie et al.

Deep Learning (DL) has recently achieved tremendous success. A variety of DL frameworks and platforms play a key role to catalyze such progress. However, the differences in architecture designs and implementations of existing frameworks and platforms bring new challenges for DL software development and deployment. Till now, there is no study on how various mainstream frameworks and platforms influence both DL software development and deployment in practice. To fill this gap, we take the first step towards understanding how the most widely-used DL frameworks and platforms support the DL software development and deployment. We conduct a systematic study on these frameworks and platforms by using two types of DNN architectures and three popular datasets. (1) For development process, we investigate the prediction accuracy under the same runtime training configuration or same model weights/biases. We also study the adversarial robustness of trained models by leveraging the existing adversarial attack techniques. The experimental results show that the computing differences across frameworks could result in an obvious prediction accuracy decline, which should draw the attention of DL developers. (2) For deployment process, we investigate the prediction accuracy and performance (refers to time cost and memory consumption) when the trained models are migrated/quantized from PC to real mobile devices and web browsers. The DL platform study unveils that the migration and quantization still suffer from compatibility and reliability issues. Meanwhile, we find several DL software bugs by using the results as a benchmark. We further validate the results through bug confirmation from stakeholders and industrial positive feedback to highlight the implications of our study. Through our study, we summarize practical guidelines, identify challenges and pinpoint new research directions.

LGNov 13, 2018
An Orchestrated Empirical Study on Deep Learning Frameworks and Platforms

Qianyu Guo, Xiaofei Xie, Lei Ma et al.

Deep learning (DL) has recently achieved tremendous success in a variety of cutting-edge applications, e.g., image recognition, speech and natural language processing, and autonomous driving. Besides the available big data and hardware evolution, DL frameworks and platforms play a key role to catalyze the research, development, and deployment of DL intelligent solutions. However, the difference in computation paradigm, architecture design and implementation of existing DL frameworks and platforms brings challenges for DL software development, deployment, maintenance, and migration. Up to the present, it still lacks a comprehensive study on how current diverse DL frameworks and platforms influence the DL software development process. In this paper, we initiate the first step towards the investigation on how existing state-of-the-art DL frameworks (i.e., TensorFlow, Theano, and Torch) and platforms (i.e., server/desktop, web, and mobile) support the DL software development activities. We perform an in-depth and comparative evaluation on metrics such as learning accuracy, DL model size, robustness, and performance, on state-of-the-art DL frameworks across platforms using two popular datasets MNIST and CIFAR-10. Our study reveals that existing DL frameworks still suffer from compatibility issues, which becomes even more severe when it comes to different platforms. We pinpoint the current challenges and opportunities towards developing high quality and compatible DL systems. To ignite further investigation along this direction to address urgent industrial demands of intelligent solutions, we make all of our assembled feasible toolchain and dataset publicly available.

AISep 18, 2018
SCC-rFMQ Learning in Cooperative Markov Games with Continuous Actions

Chengwei Zhang, Xiaohong Li, Jianye Hao et al.

Although many reinforcement learning methods have been proposed for learning the optimal solutions in single-agent continuous-action domains, multiagent coordination domains with continuous actions have received relatively few investigations. In this paper, we propose an independent learner hierarchical method, named Sample Continuous Coordination with recursive Frequency Maximum Q-Value (SCC-rFMQ), which divides the cooperative problem with continuous actions into two layers. The first layer samples a finite set of actions from the continuous action spaces by a re-sampling mechanism with variable exploratory rates, and the second layer evaluates the actions in the sampled action set and updates the policy using a reinforcement learning cooperative method. By constructing cooperative mechanisms at both levels, SCC-rFMQ can handle cooperative problems in continuous action cooperative Markov games effectively. The effectiveness of SCC-rFMQ is experimentally demonstrated on two well-designed games, i.e., a continuous version of the climbing game and a cooperative version of the boat problem. Experimental results show that SCC-rFMQ outperforms other reinforcement learning algorithms.

AIMar 8, 2018
SA-IGA: A Multiagent Reinforcement Learning Method Towards Socially Optimal Outcomes

Chengwei Zhang, Xiaohong Li, Jianye Hao et al.

In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer's perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner's dilemma game and the coordination game) are analyzed in details. Based on the idea of SA-IGA, we further propose a practical multiagent learning algorithm, called SA-PGA, based on Q-learning update rule. Simulation results show that SA-PGA agent can achieve higher social welfare than previous social-optimality oriented Conditional Joint Action Learner (CJAL) and also is robust against individually rational opponents by reaching Nash equilibrium solutions.