Victor O. K. Li

CL
h-index4
43papers
9,909citations
Novelty48%
AI Score35

43 Papers

SYNov 21, 2017
Delay Aware Intelligent Transient Stability Assessment System

James J. Q. Yu, Albert Y. S. Lam, David J. Hill et al.

Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focus on investigating the influence of communication delay on synchrophasor-based transient stability assessment. In particular, we develop a delay aware intelligent system to address this issue. By utilizing an ensemble of multiple long short-term memory networks, the proposed system can make early assessments to achieve a much shorter response time by utilizing incomplete system variable measurements. Compared with existing work, our system is able to make accurate assessments with a significantly improved efficiency. We perform numerous case studies to demonstrate the superiority of the proposed intelligent system, in which accurate assessments can be developed with time one third less than state-of-the-art methodologies. Moreover, the simulations indicate that noise in the measurements has trivial impact on the assessment performance, demonstrating the robustness of the proposed system.

SYJan 8, 2015
Phase Identification in Distribution Networks with Micro-Synchrophasors

Miles H. F. Wen, Reza Arghandeh, Alexandra von Meier et al.

This paper proposes a novel phase identification method for distribution networks where phases can be severely unbalanced and insufficiently labeled. The analysis approach draws on data from high-precision phasor measurement units (micro-synchrophasors or uPMUs) for distribution systems. A key fact is that time-series voltage phasors taken from a distribution network show specific patterns regarding connected phases at measurement points. The algorithm is based on analyzing crosscorrelations over voltage magnitudes along with phase angle differences on two candidate phases to be matched. If two measurement points are on the same phase, large positive voltage magnitude correlations and small voltage angle differences should be observed. The algorithm is initially validated using the IEEE 13-bus model, and subsequently with actual uPMU measurements on a 12-kV feeder.

SYJan 6, 2017
Coordinated Autonomous Vehicle Parking for Vehicle-to-Grid Services: Formulation and Distributed Algorithm

Albert Y. S. Lam, James J. Q. Yu, Yunhe Hou et al.

Autonomous vehicles (AVs) will revolutionarize ground transport and take a substantial role in the future transportation system. Most AVs are likely to be electric vehicles (EVs) and they can participate in the vehicle-to-grid (V2G) system to support various V2G services. Although it is generally infeasible for EVs to dictate their routes, we can design AV travel plans to fulfill certain system-wide objectives. In this paper, we focus on the AVs looking for parking and study how they can be led to appropriate parking facilities to support V2G services. We formulate the Coordinated Parking Problem (CPP), which can be solved by a standard integer linear program solver but requires long computational time. To make it more practical, we develop a distributed algorithm to address CPP based on dual decomposition. We carry out a series of simulations to evaluate the proposed solution methods. Our results show that the distributed algorithm can produce nearly optimal solutions with substantially less computational time. A coarser time scale can improve computational time but degrade the solution quality resulting in possible infeasible solution. Even with communication loss, the distributed algorithm can still perform well and converge with only little degradation in speed.

SYJan 5, 2017
Vehicular Energy Network

Albert Y. S. Lam, Ka-Cheong Leung, Victor O. K. Li

The smart grid spawns many innovative ideas, but many of them cannot be easily integrated into the existing power system due to power system constraints, such as the lack of capacity to transport renewable energy in remote areas to the urban centers. An energy delivery system can be built upon the traffic network and electric vehicles (EVs) utilized as energy carriers to transport energy over a large geographical region. A generalized architecture called the vehicular energy network (VEN) is constructed and a mathematically tractable framework is developed. Dynamic wireless (dis)charging allows electric energy, as an energy packet, to be added and subtracted from EV batteries seamlessly. With proper routing, energy can be transported from the sources to destinations through EVs along appropriate vehicular routes. This paper gives a preliminary study of VEN. Models are developed to study its operational and economic feasibilities with real traffic data in the United Kingdom. Our study shows that a substantial amount of renewable energy can be transported from some remote wind farms to London under some reasonable settings and VEN is likely to be profitable in the near future. VEN can complement the power network and enhance its power delivery capability.

SYNov 21, 2017
A Unified Framework for Wide Area Measurement System Planning

James J. Q. Yu, Albert Y. S. Lam, David J. Hill et al.

Wide area measurement system (WAMS) is one of the essential components in the future power system. To make WAMS construction plans, practical models of the power network observability, reliability, and underlying communication infrastructures need to be considered. To address this challenging problem, in this paper we propose a unified framework for WAMS planning to cover most realistic concerns in the construction process. The framework jointly optimizes the system construction cost, measurement reliability, and volume of synchrophasor data traffic resulting in a multi-objective optimization problem, which provides multiple Pareto optimal solutions to suit different requirements by the utilities. The framework is verified on two IEEE test systems. The simulation results demonstrate the trade-off relationships among the proposed objectives. Moreover, the proposed framework can develop optimal WAMS plans for full observability with minimal cost. This work develops a comprehensive framework for most practical WAMS construction designs.

SYApr 25, 2017
Coordinated Electric Vehicle Charging Control with Aggregator Power Trading and Indirect Load Control

James J. Q. Yu, Junhao Lin, Albert Y. S. Lam et al.

Due to the increasing concern for greenhouse gas emissions and fossil fuel security, electric vehicles (EVs) have attracted much attention in recent years. EVs can aggregate together constituting the vehicle-to-grid system. Coordination of EVs is beneficial to the power system in many ways. In this paper, we formulate a novel large-scale EV charging problem with energy trading in order to maximize the aggregator profit. This problem is non-convex and can be solved with a centralized iterative approach. To overcome the computation complexity issue brought by the non-convexity, we develop a distributed optimization-based heuristic. To evaluate our proposed approach, a modified IEEE 118 bus testing system is employed with 10 aggregators serving 30 000 EVs. The simulation results indicate that our proposed distributed heuristic with energy trading can effectively increase the total profit of aggregators. In addition, the proposed distributed optimization-based heuristic strategy can achieve near-optimal performance.

CLSep 13, 2021Code
Show Me How To Revise: Improving Lexically Constrained Sentence Generation with XLNet

Xingwei He, Victor O. K. Li

Lexically constrained sentence generation allows the incorporation of prior knowledge such as lexical constraints into the output. This technique has been applied to machine translation, and dialog response generation. Previous work usually used Markov Chain Monte Carlo (MCMC) sampling to generate lexically constrained sentences, but they randomly determined the position to be edited and the action to be taken, resulting in many invalid refinements. To overcome this challenge, we used a classifier to instruct the MCMC-based models where and how to refine the candidate sentences. First, we developed two methods to create synthetic data on which the pre-trained model is fine-tuned to obtain a reliable classifier. Next, we proposed a two-step approach, "Predict and Revise", for constrained sentence generation. During the predict step, we leveraged the classifier to compute the learned prior for the candidate sentence. During the revise step, we resorted to MCMC sampling to revise the candidate sentence by conducting a sampled action at a sampled position drawn from the learned prior. We compared our proposed models with many strong baselines on two tasks, generating sentences with lexical constraints and text infilling. Experimental results have demonstrated that our proposed model performs much better than the previous work in terms of sentence fluency and diversity. Our code and pre-trained models are available at https://github.com/NLPCode/MCMCXLNet.

CLFeb 6, 2025
DECT: Harnessing LLM-assisted Fine-Grained Linguistic Knowledge and Label-Switched and Label-Preserved Data Generation for Diagnosis of Alzheimer's Disease

Tingyu Mo, Jacqueline C. K. Lam, Victor O. K. Li et al.

Alzheimer's Disease (AD) is an irreversible neurodegenerative disease affecting 50 million people worldwide. Low-cost, accurate identification of key markers of AD is crucial for timely diagnosis and intervention. Language impairment is one of the earliest signs of cognitive decline, which can be used to discriminate AD patients from normal control individuals. Patient-interviewer dialogues may be used to detect such impairments, but they are often mixed with ambiguous, noisy, and irrelevant information, making the AD detection task difficult. Moreover, the limited availability of AD speech samples and variability in their speech styles pose significant challenges in developing robust speech-based AD detection models. To address these challenges, we propose DECT, a novel speech-based domain-specific approach leveraging large language models (LLMs) for fine-grained linguistic analysis and label-switched label-preserved data generation. Our study presents four novelties: We harness the summarizing capabilities of LLMs to identify and distill key Cognitive-Linguistic information from noisy speech transcripts, effectively filtering irrelevant information. We leverage the inherent linguistic knowledge of LLMs to extract linguistic markers from unstructured and heterogeneous audio transcripts. We exploit the compositional ability of LLMs to generate AD speech transcripts consisting of diverse linguistic patterns to overcome the speech data scarcity challenge and enhance the robustness of AD detection models. We use the augmented AD textual speech transcript dataset and a more fine-grained representation of AD textual speech transcript data to fine-tune the AD detection model. The results have shown that DECT demonstrates superior model performance with an 11% improvement in AD detection accuracy on the datasets from DementiaBank compared to the baselines.

LGMar 25, 2021
Deep-AIR: A Hybrid CNN-LSTM Framework for Air Quality Modeling in Metropolitan Cities

Yang Han, Qi Zhang, Victor O. K. Li et al.

Air pollution has long been a serious environmental health challenge, especially in metropolitan cities, where air pollutant concentrations are exacerbated by the street canyon effect and high building density. Whilst accurately monitoring and forecasting air pollution are highly crucial, existing data-driven models fail to fully address the complex interaction between air pollution and urban dynamics. Our Deep-AIR, a novel hybrid deep learning framework that combines a convolutional neural network with a long short-term memory network, aims to address this gap to provide fine-grained city-wide air pollution estimation and station-wide forecast. Our proposed framework creates 1x1 convolution layers to strengthen the learning of cross-feature spatial interaction between air pollution and important urban dynamic features, particularly road density, building density/height, and street canyon effect. Using Hong Kong and Beijing as case studies, Deep-AIR achieves a higher accuracy than our baseline models. Our model attains an accuracy of 67.6%, 77.2%, and 66.1% in fine-grained hourly estimation, 1-hr, and 24-hr air pollution forecast for Hong Kong, and an accuracy of 65.0%, 75.3%, and 63.5% for Beijing. Our saliency analysis has revealed that for Hong Kong, street canyon and road density are the best estimators for NO2, while meteorology is the best estimator for PM2.5.

HCMar 24, 2021
AQEyes: Visual Analytics for Anomaly Detection and Examination of Air Quality Data

Dongyu Liu, Kalyan Veeramachaneni, Alexander Geiger et al.

Anomaly detection plays a key role in air quality analysis by enhancing situational awareness and alerting users to potential hazards. However, existing anomaly detection approaches for air quality analysis have their own limitations regarding parameter selection (e.g., need for extensive domain knowledge), computational expense, general applicability (e.g., require labeled data), interpretability, and the efficiency of analysis. Furthermore, the poor quality of collected air quality data (inconsistently formatted and sometimes missing) also increases the difficulty of analysis substantially. In this paper, we systematically formulate design requirements for a system that can solve these limitations and then propose AQEyes, an integrated visual analytics system for efficiently monitoring, detecting, and examining anomalies in air quality data. In particular, we propose a unified end-to-end tunable machine learning pipeline that includes several data pre-processors and featurizers to deal with data quality issues. The pipeline integrates an efficient unsupervised anomaly detection method that works without the use of labeled data and overcomes the limitations of existing approaches. Further, we develop an interactive visualization system to visualize the outputs from the pipeline. The system incorporates a set of novel visualization and interaction designs, allowing analysts to visually examine air quality dynamics and anomalous events in multiple scales and from multiple facets. We demonstrate the performance of this pipeline through a quantitative evaluation and show the effectiveness of the visualization system using qualitative case studies on real-world datasets.

CLOct 6, 2020
On the Sparsity of Neural Machine Translation Models

Yong Wang, Longyue Wang, Victor O. K. Li et al.

Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures. We show that: 1) the pruned parameters can be rejuvenated to improve the baseline model by up to +0.8 BLEU points; 2) the rejuvenated parameters are reallocated to enhance the ability of modeling low-level lexical information.

CVApr 20, 2020
Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

Yingruo Fan, Jacqueline C. K. Lam, Victor O. K. Li

The intensity estimation of facial action units (AUs) is challenging due to subtle changes in the person's facial appearance. Previous approaches mainly rely on probabilistic models or predefined rules for modeling co-occurrence relationships among AUs, leading to limited generalization. In contrast, we present a new learning framework that automatically learns the latent relationships of AUs via establishing semantic correspondences between feature maps. In the heatmap regression-based network, feature maps preserve rich semantic information associated with AU intensities and locations. Moreover, the AU co-occurring pattern can be reflected by activating a set of feature channels, where each channel encodes a specific visual pattern of AU. This motivates us to model the correlation among feature channels, which implicitly represents the co-occurrence relationship of AU intensity levels. Specifically, we introduce a semantic correspondence convolution (SCC) module to dynamically compute the correspondences from deep and low resolution feature maps, and thus enhancing the discriminability of features. The experimental results demonstrate the effectiveness and the superior performance of our method on two benchmark datasets.

CLNov 22, 2019
Go From the General to the Particular: Multi-Domain Translation with Domain Transformation Networks

Yong Wang, Longyue Wang, Shuming Shi et al.

The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model. Previous work shows that the standard neural machine translation (NMT) model, trained on mixed-domain data, generally captures the general knowledge, but misses the domain-specific knowledge. In response to this problem, we augment NMT model with additional domain transformation networks to transform the general representations to domain-specific representations, which are subsequently fed to the NMT decoder. To guarantee the knowledge transformation, we also propose two complementary supervision signals by leveraging the power of knowledge distillation and adversarial learning. Experimental results on several language pairs, covering both balanced and unbalanced multi-domain translation, demonstrate the effectiveness and universality of the proposed approach. Encouragingly, the proposed unified model achieves comparable results with the fine-tuning approach that requires multiple models to preserve the particular knowledge. Further analyses reveal that the domain transformation networks successfully capture the domain-specific knowledge as expected.

CLJun 4, 2019
Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

Jiatao Gu, Yong Wang, Kyunghyun Cho et al.

Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) back-translation. These methods show significant improvement (4~22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach.

CLAug 25, 2018
Meta-Learning for Low-Resource Neural Machine Translation

Jiatao Gu, Yong Wang, Yun Chen et al.

In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML) for low-resource neural machine translation (NMT). We frame low-resource translation as a meta-learning problem, and we learn to adapt to low-resource languages based on multilingual high-resource language tasks. We use the universal lexical representation~\citep{gu2018universal} to overcome the input-output mismatch across different languages. We evaluate the proposed meta-learning strategy using eighteen European languages (Bg, Cs, Da, De, El, Es, Et, Fr, Hu, It, Lt, Nl, Pl, Pt, Sk, Sl, Sv and Ru) as source tasks and five diverse languages (Ro, Lv, Fi, Tr and Ko) as target tasks. We show that the proposed approach significantly outperforms the multilingual, transfer learning based approach~\citep{zoph2016transfer} and enables us to train a competitive NMT system with only a fraction of training examples. For instance, the proposed approach can achieve as high as 22.04 BLEU on Romanian-English WMT'16 by seeing only 16,000 translated words (~600 parallel sentences).

CVJul 12, 2018
Multi-Region Ensemble Convolutional Neural Network for Facial Expression Recognition

Yingruo Fan, Jacqueline C. K. Lam, Victor O. K. Li

Facial expressions play an important role in conveying the emotional states of human beings. Recently, deep learning approaches have been applied to image recognition field due to the discriminative power of Convolutional Neural Network (CNN). In this paper, we first propose a novel Multi-Region Ensemble CNN (MRE-CNN) framework for facial expression recognition, which aims to enhance the learning power of CNN models by capturing both the global and the local features from multiple human face sub-regions. Second, the weighted prediction scores from each sub-network are aggregated to produce the final prediction of high accuracy. Third, we investigate the effects of different sub-regions of the whole face on facial expression recognition. Our proposed method is evaluated based on two well-known publicly available facial expression databases: AFEW 7.0 and RAF-DB, and has been shown to achieve the state-of-the-art recognition accuracy.

LGJul 8, 2018
Large Margin Few-Shot Learning

Yong Wang, Xiao-Ming Wu, Qimai Li et al.

The key issue of few-shot learning is learning to generalize. This paper proposes a large margin principle to improve the generalization capacity of metric based methods for few-shot learning. To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the classification loss function with a large margin distance loss function for training. Extensive experiments on two state-of-the-art few-shot learning methods, graph neural networks and prototypical networks, show that our method can improve the performance of existing models substantially with very little computational overhead, demonstrating the effectiveness of the large margin principle and the potential of our method.

CLApr 21, 2018
A Stable and Effective Learning Strategy for Trainable Greedy Decoding

Yun Chen, Victor O. K. Li, Kyunghyun Cho et al.

Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost. In this paper, we propose a flexible new method that allows us to reap nearly the full benefits of beam search with nearly no additional computational cost. The method revolves around a small neural network actor that is trained to observe and manipulate the hidden state of a previously-trained decoder. To train this actor network, we introduce the use of a pseudo-parallel corpus built using the output of beam search on a base model, ranked by a target quality metric like BLEU. Our method is inspired by earlier work on this problem, but requires no reinforcement learning, and can be trained reliably on a range of models. Experiments on three parallel corpora and three architectures show that the method yields substantial improvements in translation quality and speed over each base system.

CLFeb 15, 2018
Universal Neural Machine Translation for Extremely Low Resource Languages

Jiatao Gu, Hany Hassan, Jacob Devlin et al.

In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multilingual word-level sharing. The sentence-level sharing is represented by a model of experts from all source languages that share the source encoders with all other languages. This enables the low-resource language to utilize the lexical and sentence representations of the higher resource languages. Our approach is able to achieve 23 BLEU on Romanian-English WMT2016 using a tiny parallel corpus of 6k sentences, compared to the 18 BLEU of strong baseline system which uses multilingual training and back-translation. Furthermore, we show that the proposed approach can achieve almost 20 BLEU on the same dataset through fine-tuning a pre-trained multi-lingual system in a zero-shot setting.

CLFeb 9, 2018
Zero-Resource Neural Machine Translation with Multi-Agent Communication Game

Yun Chen, Yang Liu, Victor O. K. Li

While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively exposed to large amounts of parallel corpora, our learners (implemented as encoder-decoder architecture) engage in cooperative image description games, and thus develop their own image captioning or neural machine translation model from the need to communicate in order to succeed at the game. Experimental results on the IAPR-TC12 and Multi30K datasets show that the proposed learning mechanism significantly improves over the state-of-the-art methods.

CLNov 7, 2017
Non-Autoregressive Neural Machine Translation

Jiatao Gu, James Bradbury, Caiming Xiong et al.

Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we achieve this at a cost of as little as 2.0 BLEU points relative to the autoregressive Transformer network used as a teacher. We demonstrate substantial cumulative improvements associated with each of the three aspects of our training strategy, and validate our approach on IWSLT 2016 English-German and two WMT language pairs. By sampling fertilities in parallel at inference time, our non-autoregressive model achieves near-state-of-the-art performance of 29.8 BLEU on WMT 2016 English-Romanian.

CLJun 22, 2017
Neural Machine Translation with Gumbel-Greedy Decoding

Jiatao Gu, Daniel Jiwoong Im, Victor O. K. Li

Previous neural machine translation models used some heuristic search algorithms (e.g., beam search) in order to avoid solving the maximum a posteriori problem over translation sentences at test time. In this paper, we propose the Gumbel-Greedy Decoding which trains a generative network to predict translation under a trained model. We solve such a problem using the Gumbel-Softmax reparameterization, which makes our generative network differentiable and trainable through standard stochastic gradient methods. We empirically demonstrate that our proposed model is effective for generating sequences of discrete words.

CLMay 20, 2017
Search Engine Guided Non-Parametric Neural Machine Translation

Jiatao Gu, Yong Wang, Kyunghyun Cho et al.

In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage--retrieval stage--, an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source sentence. These pairs are further filtered based on a fuzzy matching score based on edit distance. In the second stage--translation stage--, a novel translation model, called translation memory enhanced NMT (TM-NMT), seamlessly uses both the source sentence and a set of retrieved sentence pairs to perform the translation. Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved.

CLMay 2, 2017
A Teacher-Student Framework for Zero-Resource Neural Machine Translation

Yun Chen, Yang Liu, Yong Cheng et al.

While end-to-end neural machine translation (NMT) has made remarkable progress recently, it still suffers from the data scarcity problem for low-resource language pairs and domains. In this paper, we propose a method for zero-resource NMT by assuming that parallel sentences have close probabilities of generating a sentence in a third language. Based on this assumption, our method is able to train a source-to-target NMT model ("student") without parallel corpora available, guided by an existing pivot-to-target NMT model ("teacher") on a source-pivot parallel corpus. Experimental results show that the proposed method significantly improves over a baseline pivot-based model by +3.0 BLEU points across various language pairs.

SYMay 21, 2017
Intelligent Time-Adaptive Transient Stability Assessment System

James J. Q. Yu, David J. Hill, Albert Y. S. Lam et al.

Online identification of post-contingency transient stability is essential in power system control, as it facilitates the grid operator to decide and coordinate system failure correction control actions. Utilizing machine learning methods with synchrophasor measurements for transient stability assessment has received much attention recently with the gradual deployment of wide-area protection and control systems. In this paper, we develop a transient stability assessment system based on the long short-term memory network. By proposing a temporal self-adaptive scheme, our proposed system aims to balance the trade-off between assessment accuracy and response time, both of which may be crucial in real-world scenarios. Compared with previous work, the most significant enhancement is that our system learns from the temporal data dependencies of the input data, which contributes to better assessment accuracy. In addition, the model structure of our system is relatively less complex, speeding up the model training process. Case studies on three power systems demonstrate the efficacy of the proposed transient stability assessment system.

CLFeb 8, 2017
Trainable Greedy Decoding for Neural Machine Translation

Jiatao Gu, Kyunghyun Cho, Victor O. K. Li

Recent research in neural machine translation has largely focused on two aspects; neural network architectures and end-to-end learning algorithms. The problem of decoding, however, has received relatively little attention from the research community. In this paper, we solely focus on the problem of decoding given a trained neural machine translation model. Instead of trying to build a new decoding algorithm for any specific decoding objective, we propose the idea of trainable decoding algorithm in which we train a decoding algorithm to find a translation that maximizes an arbitrary decoding objective. More specifically, we design an actor that observes and manipulates the hidden state of the neural machine translation decoder and propose to train it using a variant of deterministic policy gradient. We extensively evaluate the proposed algorithm using four language pairs and two decoding objectives and show that we can indeed train a trainable greedy decoder that generates a better translation (in terms of a target decoding objective) with minimal computational overhead.

AIOct 22, 2016
pg-Causality: Identifying Spatiotemporal Causal Pathways for Air Pollutants with Urban Big Data

Julie Yixuan Zhu, Chao Zhang, Huichu Zhang et al.

Many countries are suffering from severe air pollution. Understanding how different air pollutants accumulate and propagate is critical to making relevant public policies. In this paper, we use urban big data (air quality data and meteorological data) to identify the \emph{spatiotemporal (ST) causal pathways} for air pollutants. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air quality data, which may lead to unreliable causality analysis, (2) for large-scale data in the ST space, the computational complexity of constructing a causal structure is very high, and (3) the \emph{ST causal pathways} are complex due to the interactions of multiple pollutants and the influence of environmental factors. Therefore, we present \emph{p-Causality}, a novel pattern-aided causality analysis approach that combines the strengths of \emph{pattern mining} and \emph{Bayesian learning} to efficiently and faithfully identify the \emph{ST causal pathways}. First, \emph{Pattern mining} helps suppress the noise by capturing frequent evolving patterns (FEPs) of each monitoring sensor, and greatly reduce the complexity by selecting the pattern-matched sensors as "causers". Then, \emph{Bayesian learning} carefully encodes the local and ST causal relations with a Gaussian Bayesian network (GBN)-based graphical model, which also integrates environmental influences to minimize biases in the final results. We evaluate our approach with three real-world data sets containing 982 air quality sensors, in three regions of China from 01-Jun-2013 to 19-Dec-2015. Results show that our approach outperforms the traditional causal structure learning methods in time efficiency, inference accuracy and interpretability.

CLOct 3, 2016
Learning to Translate in Real-time with Neural Machine Translation

Jiatao Gu, Graham Neubig, Kyunghyun Cho et al.

Translating in real-time, a.k.a. simultaneous translation, outputs translation words before the input sentence ends, which is a challenging problem for conventional machine translation methods. We propose a neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment. To trade off quality and delay, we extensively explore various targets for delay and design a method for beam-search applicable in the simultaneous MT setting. Experiments against state-of-the-art baselines on two language pairs demonstrate the efficacy of the proposed framework both quantitatively and qualitatively.

CLMar 21, 2016
Incorporating Copying Mechanism in Sequence-to-Sequence Learning

Jiatao Gu, Zhengdong Lu, Hang Li et al.

We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. A similar phenomenon is observable in human language communication. For example, humans tend to repeat entity names or even long phrases in conversation. The challenge with regard to copying in Seq2Seq is that new machinery is needed to decide when to perform the operation. In this paper, we incorporate copying into neural network-based Seq2Seq learning and propose a new model called CopyNet with encoder-decoder structure. CopyNet can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence. Our empirical study on both synthetic data sets and real world data sets demonstrates the efficacy of CopyNet. For example, CopyNet can outperform regular RNN-based model with remarkable margins on text summarization tasks.

NESep 26, 2015
A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Optimization Problems

Bo Song, Victor O. K. Li

Infinite population models are important tools for studying population dynamics of evolutionary algorithms. They describe how the distributions of populations change between consecutive generations. In general, infinite population models are derived from Markov chains by exploiting symmetries between individuals in the population and analyzing the limit as the population size goes to infinity. In this paper, we study the theoretical foundations of infinite population models of evolutionary algorithms on continuous optimization problems. First, we show that the convergence proofs in a widely cited study were in fact problematic and incomplete. We further show that the modeling assumption of exchangeability of individuals cannot yield the transition equation. Then, in order to analyze infinite population models, we build an analytical framework based on convergence in distribution of random elements which take values in the metric space of infinite sequences. The framework is concise and mathematically rigorous. It also provides an infrastructure for studying the convergence of the stacking of operators and of iterating the algorithm which previous studies failed to address. Finally, we use the framework to prove the convergence of infinite population models for the mutation operator and the $k$-ary recombination operator. We show that these operators can provide accurate predictions for real population dynamics as the population size goes to infinity, provided that the initial population is identically and independently distributed.

NEJul 27, 2015
A Social Spider Algorithm for Solving the Non-convex Economic Load Dispatch Problem

James J. Q. Yu, Victor O. K. Li

Economic Load Dispatch (ELD) is one of the essential components in power system control and operation. Although conventional ELD formulation can be solved using mathematical programming techniques, modern power system introduces new models of the power units which are non-convex, non-differentiable, and sometimes non-continuous. In order to solve such non-convex ELD problems, in this paper we propose a new approach based on the Social Spider Algorithm (SSA). The classical SSA is modified and enhanced to adapt to the unique characteristics of ELD problems, e.g., valve-point effects, multi-fuel operations, prohibited operating zones, and line losses. To demonstrate the superiority of our proposed approach, five widely-adopted test systems are employed and the simulation results are compared with the state-of-the-art algorithms. In addition, the parameter sensitivity is illustrated by a series of simulations. The simulation results show that SSA can solve ELD problems effectively and efficiently.

NEJul 9, 2015
Adaptive Chemical Reaction Optimization for Global Numerical Optimization

James J. Q. Yu, Albert Y. S. Lam, Victor O. K. Li

A newly proposed chemical-reaction-inspired metaheurisic, Chemical Reaction Optimization (CRO), has been applied to many optimization problems in both discrete and continuous domains. To alleviate the effort in tuning parameters, this paper reduces the number of optimization parameters in canonical CRO and develops an adaptive scheme to evolve them. Our proposed Adaptive CRO (ACRO) adapts better to different optimization problems. We perform simulations with ACRO on a widely-used benchmark of continuous problems. The simulation results show that ACRO has superior performance over canonical CRO.

NEJul 9, 2015
Parameter Sensitivity Analysis of Social Spider Algorithm

James J. Q. Yu, Victor O. K. Li

Social Spider Algorithm (SSA) is a recently proposed general-purpose real-parameter metaheuristic designed to solve global numerical optimization problems. This work systematically benchmarks SSA on a suite of 11 functions with different control parameters. We conduct parameter sensitivity analysis of SSA using advanced non-parametric statistical tests to generate statistically significant conclusion on the best performing parameter settings. The conclusion can be adopted in future work to reduce the effort in parameter tuning. In addition, we perform a success rate test to reveal the impact of the control parameters on the convergence speed of the algorithm.

LGJun 24, 2015
Efficient Learning for Undirected Topic Models

Jiatao Gu, Victor O. K. Li

Replicated Softmax model, a well-known undirected topic model, is powerful in extracting semantic representations of documents. Traditional learning strategies such as Contrastive Divergence are very inefficient. This paper provides a novel estimator to speed up the learning based on Noise Contrastive Estimate, extended for documents of variant lengths and weighted inputs. Experiments on two benchmarks show that the new estimator achieves great learning efficiency and high accuracy on document retrieval and classification.

SYMay 21, 2015
Capacity Estimation for Vehicle-to-Grid Frequency Regulation Services with Smart Charging Mechanism

Albert Y. S. Lam, Ka-Cheong Leung, Victor O. K. Li

Due to various green initiatives, renewable energy will be massively incorporated into the future smart grid. However, the intermittency of the renewables may result in power imbalance, thus adversely affecting the stability of a power system. Frequency regulation may be used to maintain the power balance at all times. As electric vehicles (EVs) become popular, they may be connected to the grid to form a vehicle-to-grid (V2G) system. An aggregation of EVs can be coordinated to provide frequency regulation services. However, V2G is a dynamic system where the participating EVs come and go independently. Thus it is not easy to estimate the regulation capacities for V2G. In a preliminary study, we modeled an aggregation of EVs with a queueing network, whose structure allows us to estimate the capacities for regulation-up and regulation-down, separately. The estimated capacities from the V2G system can be used for establishing a regulation contract between an aggregator and the grid operator, and facilitating a new business model for V2G. In this paper, we extend our previous development by designing a smart charging mechanism which can adapt to given characteristics of the EVs and make the performance of the actual system follow the analytical model.

NEFeb 9, 2015
A Social Spider Algorithm for Global Optimization

James J. Q. Yu, Victor O. K. Li

The growing complexity of real-world problems has motivated computer scientists to search for efficient problem-solving methods. Metaheuristics based on evolutionary computation and swarm intelligence are outstanding examples of nature-inspired solution techniques. Inspired by the social spiders, we propose a novel Social Spider Algorithm to solve global optimization problems. This algorithm is mainly based on the foraging strategy of social spiders, utilizing the vibrations on the spider web to determine the positions of preys. Different from the previously proposed swarm intelligence algorithms, we introduce a new social animal foraging strategy model to solve optimization problems. In addition, we perform preliminary parameter sensitivity analysis for our proposed algorithm, developing guidelines for choosing the parameter values. The Social Spider Algorithm is evaluated by a series of widely-used benchmark functions, and our proposed algorithm has superior performance compared with other state-of-the-art metaheuristics.

NEFeb 1, 2015
Chemical Reaction Optimization for the Set Covering Problem

James J. Q. Yu, Albert Y. S. Lam, Victor O. K. Li

The set covering problem (SCP) is one of the representative combinatorial optimization problems, having many practical applications. This paper investigates the development of an algorithm to solve SCP by employing chemical reaction optimization (CRO), a general-purpose metaheuristic. It is tested on a wide range of benchmark instances of SCP. The simulation results indicate that this algorithm gives outstanding performance compared with other heuristics and metaheuristics in solving SCP.

NEFeb 1, 2015
An Inter-molecular Adaptive Collision Scheme for Chemical Reaction Optimization

James J. Q. Yu, Victor O. K. Li, Albert Y. S. Lam

Optimization techniques are frequently applied in science and engineering research and development. Evolutionary algorithms, as a kind of general-purpose metaheuristic, have been shown to be very effective in solving a wide range of optimization problems. A recently proposed chemical-reaction-inspired metaheuristic, Chemical Reaction Optimization (CRO), has been applied to solve many global optimization problems. However, the functionality of the inter-molecular ineffective collision operator in the canonical CRO design overlaps that of the on-wall ineffective collision operator, which can potential impair the overall performance. In this paper we propose a new inter-molecular ineffective collision operator for CRO for global optimization. To fully utilize our newly proposed operator, we also design a scheme to adapt the algorithm to optimization problems with different search space characteristics. We analyze the performance of our proposed algorithm with a number of widely used benchmark functions. The simulation results indicate that the new algorithm has superior performance over the canonical CRO.

NEFeb 1, 2015
Optimal V2G Scheduling of Electric Vehicles and Unit Commitment using Chemical Reaction Optimization

James J. Q. Yu, Victor O. K. Li, Albert Y. S. Lam

An electric vehicle (EV) may be used as energy storage which allows the bi-directional electricity flow between the vehicle's battery and the electric power grid. In order to flatten the load profile of the electricity system, EV scheduling has become a hot research topic in recent years. In this paper, we propose a new formulation of the joint scheduling of EV and Unit Commitment (UC), called EVUC. Our formulation considers the characteristics of EVs while optimizing the system total running cost. We employ Chemical Reaction Optimization (CRO), a general-purpose optimization algorithm to solve this problem and the simulation results on a widely used set of instances indicate that CRO can effectively optimize this problem.

NEFeb 1, 2015
Sensor Deployment for Air Pollution Monitoring Using Public Transportation System

James J. Q. Yu, Victor O. K. Li, Albert Y. S. Lam

Air pollution monitoring is a very popular research topic and many monitoring systems have been developed. In this paper, we formulate the Bus Sensor Deployment Problem (BSDP) to select the bus routes on which sensors are deployed, and we use Chemical Reaction Optimization (CRO) to solve BSDP. CRO is a recently proposed metaheuristic designed to solve a wide range of optimization problems. Using the real world data, namely Hong Kong Island bus route data, we perform a series of simulations and the results show that CRO is capable of solving this optimization problem efficiently.

NEFeb 1, 2015
Real-Coded Chemical Reaction Optimization with Different Perturbation Functions

James J. Q. Yu, Albert Y. S. Lam, Victor O. K. Li

Chemical Reaction Optimization (CRO) is a powerful metaheuristic which mimics the interactions of molecules in chemical reactions to search for the global optimum. The perturbation function greatly influences the performance of CRO on solving different continuous problems. In this paper, we study four different probability distributions, namely, the Gaussian distribution, the Cauchy distribution, the exponential distribution, and a modified Rayleigh distribution, for the perturbation function of CRO. Different distributions have different impacts on the solutions. The distributions are tested by a set of well-known benchmark functions and simulation results show that problems with different characteristics have different preference on the distribution function. Our study gives guidelines to design CRO for different types of optimization problems.

NEFeb 1, 2015
Evolutionary Artificial Neural Network Based on Chemical Reaction Optimization

James J. Q. Yu, Albert Y. S. Lam, Victor O. K. Li

Evolutionary algorithms (EAs) are very popular tools to design and evolve artificial neural networks (ANNs), especially to train them. These methods have advantages over the conventional backpropagation (BP) method because of their low computational requirement when searching in a large solution space. In this paper, we employ Chemical Reaction Optimization (CRO), a newly developed global optimization method, to replace BP in training neural networks. CRO is a population-based metaheuristics mimicking the transition of molecules and their interactions in a chemical reaction. Simulation results show that CRO outperforms many EA strategies commonly used to train neural networks.

SYDec 5, 2014
Opportunistic Routing for the Vehicular Energy Network

Albert Y. S. Lam, Victor O. K. Li

Vehicular energy network (VEN) is a vehicular network which can transport energy over a large geographical area by means of electric vehicles (EVs). In the near future, an abundance of EVs, plentiful generation of the renewables, and mature wireless energy transfer and vehicular communication technologies will expedite the realization of VEN. To transmit energy from a source to a destination, we need to establish energy paths, which are composed of segments of vehicular routes, while satisfying various design objectives. In this paper, we develop a method to construct all energy paths for a particular energy source-destination pair, followed by some analytical results of the method. We describe how to utilize the energy paths to develop optimization models for different design goals and propose two solutions. We also develop a heuristic for the power loss minimization problem. We compare the performance of the three solution methods with artificial and real-world traffic networks and provide a comprehensive comparison in terms of solution quality, computation time, solvable problem size, and applicability. This paper lays the foundations of VEN routing.