NEMar 15, 2022
MMES: Mixture Model based Evolution Strategy for Large-Scale OptimizationXiaoyu He, Zibin Zheng, Yuren Zhou
This work provides an efficient sampling method for the covariance matrix adaptation evolution strategy (CMA-ES) in large-scale settings. In contract to the Gaussian sampling in CMA-ES, the proposed method generates mutation vectors from a mixture model, which facilitates exploiting the rich variable correlations of the problem landscape within a limited time budget. We analyze the probability distribution of this mixture model and show that it approximates the Gaussian distribution of CMA-ES with a controllable accuracy. We use this sampling method, coupled with a novel method for mutation strength adaptation, to formulate the mixture model based evolution strategy (MMES) -- a CMA-ES variant for large-scale optimization. The numerical simulations show that, while significantly reducing the time complexity of CMA-ES, MMES preserves the rotational invariance, is scalable to high dimensional problems, and is competitive against the state-of-the-arts in performing global optimization.
LGDec 1, 2022
Clustering and Analysis of GPS Trajectory Data using Distance-based FeaturesZann Koh, Yuren Zhou, Billy Pik Lik Lau et al.
The proliferation of smartphones has accelerated mobility studies by largely increasing the type and volume of mobility data available. One such source of mobility data is from GPS technology, which is becoming increasingly common and helps the research community understand mobility patterns of people. However, there lacks a standardized framework for studying the different mobility patterns created by the non-Work, non-Home locations of Working and Nonworking users on Workdays and Offdays using machine learning methods. We propose a new mobility metric, Daily Characteristic Distance, and use it to generate features for each user together with Origin-Destination matrix features. We then use those features with an unsupervised machine learning method, $k$-means clustering, and obtain three clusters of users for each type of day (Workday and Offday). Finally, we propose two new metrics for the analysis of the clustering results, namely User Commonality and Average Frequency. By using the proposed metrics, interesting user behaviors can be discerned and it helps us to better understand the mobility patterns of the users.
AIJun 17, 2023
DsMtGCN: A Direction-sensitive Multi-task framework for Knowledge Graph CompletionJining Wang, Chuan Chen, Zibin Zheng et al.
To solve the inherent incompleteness of knowledge graphs (KGs), numbers of knowledge graph completion (KGC) models have been proposed to predict missing links from known triples. Among those, several works have achieved more advanced results via exploiting the structure information on KGs with Graph Convolutional Networks (GCN). However, we observe that entity embeddings aggregated from neighbors in different directions are just simply averaged to complete single-tasks by existing GCN based models, ignoring the specific requirements of forward and backward sub-tasks. In this paper, we propose a Direction-sensitive Multi-task GCN (DsMtGCN) to make full use of the direction information, the multi-head self-attention is applied to specifically combine embeddings in different directions based on various entities and sub-tasks, the geometric constraints are imposed to adjust the distribution of embeddings, and the traditional binary cross-entropy loss is modified to reflect the triple uncertainty. Moreover, the competitive experiments results on several benchmark datasets verify the effectiveness of our model.
AIJun 13, 2023
Contextual Dictionary Lookup for Knowledge Graph CompletionJining Wang, Delai Qiu, YouMing Liu et al.
Knowledge graph completion (KGC) aims to solve the incompleteness of knowledge graphs (KGs) by predicting missing links from known triples, numbers of knowledge graph embedding (KGE) models have been proposed to perform KGC by learning embeddings. Nevertheless, most existing embedding models map each relation into a unique vector, overlooking the specific fine-grained semantics of them under different entities. Additionally, the few available fine-grained semantic models rely on clustering algorithms, resulting in limited performance and applicability due to the cumbersome two-stage training process. In this paper, we present a novel method utilizing contextual dictionary lookup, enabling conventional embedding models to learn fine-grained semantics of relations in an end-to-end manner. More specifically, we represent each relation using a dictionary that contains multiple latent semantics. The composition of a given entity and the dictionary's central semantics serves as the context for generating a lookup, thus determining the fine-grained semantics of the relation adaptively. The proposed loss function optimizes both the central and fine-grained semantics simultaneously to ensure their semantic consistency. Besides, we introduce two metrics to assess the validity and accuracy of the dictionary lookup operation. We extend several KGE models with the method, resulting in substantial performance improvements on widely-used benchmark datasets.
LGMay 4, 2021
WiFi Fingerprint Clustering for Urban Mobility AnalysisSumudu HasalaMarakkalage, Billy Pik Lik Lau, Yuren Zhou et al.
In this paper, we present an unsupervised learning approach to identify the user points of interest (POI) by exploiting WiFi measurements from smartphone application data. Due to the lack of GPS positioning accuracy in indoor, sheltered, and high rise building environments, we rely on widely available WiFi access points (AP) in contemporary urban areas to accurately identify POI and mobility patterns, by comparing the similarity in the WiFi measurements. We propose a system architecture to scan the surrounding WiFi AP, and perform unsupervised learning to demonstrate that it is possible to identify three major insights, namely the indoor POI within a building, neighbourhood activity, and micro-mobility of the users. Our results show that it is possible to identify the aforementioned insights, with the fusion of WiFi and GPS, which are not possible to identify by only using GPS.
CLMar 5, 2021
There Once Was a Really Bad Poet, It Was Automated but You Didn't Know ItJianyou Wang, Xiaoxuan Zhang, Yuren Zhou et al.
Limerick generation exemplifies some of the most difficult challenges faced in poetry generation, as the poems must tell a story in only five lines, with constraints on rhyme, stress, and meter. To address these challenges, we introduce LimGen, a novel and fully automated system for limerick generation that outperforms state-of-the-art neural network-based poetry models, as well as prior rule-based poetry models. LimGen consists of three important pieces: the Adaptive Multi-Templated Constraint algorithm that constrains our search to the space of realistic poems, the Multi-Templated Beam Search algorithm which searches efficiently through the space, and the probabilistic Storyline algorithm that provides coherent storylines related to a user-provided prompt word. The resulting limericks satisfy poetic constraints and have thematically coherent storylines, which are sometimes even funny (when we are lucky).
LGDec 22, 2020
Multiple-Perspective Clustering of Passive Wi-Fi Sensing Trajectory DataZann Koh, Yuren Zhou, Billy Pik Lik Lau et al.
Information about the spatiotemporal flow of humans within an urban context has a wide plethora of applications. Currently, although there are many different approaches to collect such data, there lacks a standardized framework to analyze it. The focus of this paper is on the analysis of the data collected through passive Wi-Fi sensing, as such passively collected data can have a wide coverage at low cost. We propose a systematic approach by using unsupervised machine learning methods, namely k-means clustering and hierarchical agglomerative clustering (HAC) to analyze data collected through such a passive Wi-Fi sniffing method. We examine three aspects of clustering of the data, namely by time, by person, and by location, and we present the results obtained by applying our proposed approach on a real-world dataset collected over five months.
MLMar 9, 2020
Stochastic Recursive Momentum for Policy Gradient MethodsHuizhuo Yuan, Xiangru Lian, Ji Liu et al.
In this paper, we propose a novel algorithm named STOchastic Recursive Momentum for Policy Gradient (STORM-PG), which operates a SARAH-type stochastic recursive variance-reduced policy gradient in an exponential moving average fashion. STORM-PG enjoys a provably sharp $O(1/ε^3)$ sample complexity bound for STORM-PG, matching the best-known convergence rate for policy gradient algorithm. In the mean time, STORM-PG avoids the alternations between large batches and small batches which persists in comparable variance-reduced policy gradient methods, allowing considerably simpler parameter tuning. Numerical experiments depicts the superiority of our algorithm over comparative policy gradient algorithms.
SIFeb 5, 2020
Understanding Crowd Behaviors in a Social Event by Passive WiFi Sensing and Data MiningYuren Zhou, Billy Pik Lik Lau, Zann Koh et al.
Understanding crowd behaviors in a large social event is crucial for event management. Passive WiFi sensing, by collecting WiFi probe requests sent from mobile devices, provides a better way to monitor crowds compared with people counters and cameras in terms of free interference, larger coverage, lower cost, and more information on people's movement. In existing studies, however, not enough attention has been paid to the thorough analysis and mining of collected data. Especially, the power of machine learning has not been fully exploited. In this paper, therefore, we propose a comprehensive data analysis framework to fully analyze the collected probe requests to extract three types of patterns related to crowd behaviors in a large social event, with the help of statistics, visualization, and unsupervised machine learning. First, trajectories of the mobile devices are extracted from probe requests and analyzed to reveal the spatial patterns of the crowds' movement. Hierarchical agglomerative clustering is adopted to find the interconnections between different locations. Next, k-means and k-shape clustering algorithms are applied to extract temporal visiting patterns of the crowds by days and locations, respectively. Finally, by combining with time, trajectories are transformed into spatiotemporal patterns, which reveal how trajectory duration changes over the length and how the overall trends of crowd movement change over time. The proposed data analysis framework is fully demonstrated using real-world data collected in a large social event. Results show that one can extract comprehensive patterns from data collected by a network of passive WiFi sensors.
SYAug 22, 2019
Benchmarking air-conditioning energy performance of residential rooms based on regression and clustering techniquesYuren Zhou, Clement Lork, Wen-Tai Li et al.
Air conditioning (AC) accounts for a critical portion of the global energy consumption. To improve its energy performance, it is important to fairly benchmark its energy performance and provide the evaluation feedback to users. However, this task has not been well tackled in the residential sector. In this paper, we propose a data-driven approach to fairly benchmark the AC energy performance of residential rooms. First, regression model is built for each benchmarked room so that its power consumption can be predicted given different weather conditions and AC settings. Then, all the rooms are clustered based on their areas and usual AC temperature set points. Lastly, within each cluster, rooms are benchmarked based on their predicted power consumption under uniform weather conditions and AC settings. A real-world case study was conducted with data collected from 44 residential rooms. Results show that the constructed regression models have an average prediction accuracy of 85.1% in cross-validation tests, and support vector regression with Gaussian kernel is the overall most suitable model structure for building the regression model. In the clustering step, 44 rooms are successfully clustered into seven clusters. By comparing the benchmarking scores generated by the proposed approach with two sets of scores computed from historical power consumption data, we demonstrate that the proposed approach is able to eliminate the influences of room areas, weather conditions, and AC settings on the benchmarking results. Therefore, the proposed benchmarking approach is valid and fair. As a by-product, the approach is also shown to be useful to investigate how room areas, weather conditions, and AC settings affect the AC power consumption of rooms in real life.
NEOct 26, 2018
A Theoretical Framework of Approximation Error Analysis of Evolutionary AlgorithmsJun He, Yu Chen, Yuren Zhou
In the empirical study of evolutionary algorithms, the solution quality is evaluated by either the fitness value or approximation error. The latter measures the fitness difference between an approximation solution and the optimal solution. Since the approximation error analysis is more convenient than the direct estimation of the fitness value, this paper focuses on approximation error analysis. However, it is straightforward to extend all related results from the approximation error to the fitness value. Although the evaluation of solution quality plays an essential role in practice, few rigorous analyses have been conducted on this topic. This paper aims at establishing a novel theoretical framework of approximation error analysis of evolutionary algorithms for discrete optimization. This framework is divided into two parts. The first part is about exact expressions of the approximation error. Two methods, Jordan form and Schur's triangularization, are presented to obtain an exact expression. The second part is about upper bounds on approximation error. Two methods, convergence rate and auxiliary matrix iteration, are proposed to estimate the upper bound. The applicability of this framework is demonstrated through several examples.
CYJan 20, 2017
Power-saving transportation mode identification for large-scale applicationsYuren Zhou, Jin Wang, Peng Shi et al.
Transportation mode detection with personal devices has been investigated for over ten years due to its importance in monitoring ones' activities, understanding human mobility, and assisting traffic management. However, two main limitations are still preventing it from large-scale deployments: high power consumption, and the lack of high-volume and diverse labeled data. In order to reduce power consumption, existing approaches are sampling using fewer sensors and with lower frequency, which however lead to a lower accuracy. A common way to obtain labeled data is recording the ground truth while collecting data, but such method cannot apply to large-scale deployment due to its inefficiency. To address these issues, we adopt a new low-frequency sampling manner with a hierarchical transportation mode identification algorithm and propose an offline data labeling approach with its manual and automatic implementations. Through a real-world large-scale experiment and comparison with related works, our sampling manner and algorithm are proved to consume much less energy while achieving a competitive accuracy around 85%. The new offline data labeling approach is also validated to be efficient and effective in providing ground truth for model training and testing.
NEFeb 12, 2015
Analysis of Solution Quality of a Multiobjective Optimization-based Evolutionary Algorithm for Knapsack ProblemJun He, Yong Wang, Yuren Zhou
Multi-objective optimisation is regarded as one of the most promising ways for dealing with constrained optimisation problems in evolutionary optimisation. This paper presents a theoretical investigation of a multi-objective optimisation evolutionary algorithm for solving the 0-1 knapsack problem. Two initialisation methods are considered in the algorithm: local search initialisation and greedy search initialisation. Then the solution quality of the algorithm is analysed in terms of the approximation ratio.
NESep 3, 2014
Performance Analysis on Evolutionary Algorithms for the Minimum Label Spanning Tree ProblemXinsheng Lai, Yuren Zhou, Jun He et al.
Some experimental investigations have shown that evolutionary algorithms (EAs) are efficient for the minimum label spanning tree (MLST) problem. However, we know little about that in theory. As one step towards this issue, we theoretically analyze the performances of the (1+1) EA, a simple version of EAs, and a multi-objective evolutionary algorithm called GSEMO on the MLST problem. We reveal that for the MLST$_{b}$ problem the (1+1) EA and GSEMO achieve a $\frac{b+1}{2}$-approximation ratio in expected polynomial times of $n$ the number of nodes and $k$ the number of labels. We also show that GSEMO achieves a $(2ln(n))$-approximation ratio for the MLST problem in expected polynomial time of $n$ and $k$. At the same time, we show that the (1+1) EA and GSEMO outperform local search algorithms on three instances of the MLST problem. We also construct an instance on which GSEMO outperforms the (1+1) EA.
NEApr 14, 2014
A Theoretical Assessment of Solution Quality in Evolutionary Algorithms for the Knapsack ProblemJun He, Boris Mitavskiy, Yuren Zhou
Evolutionary algorithms are well suited for solving the knapsack problem. Some empirical studies claim that evolutionary algorithms can produce good solutions to the 0-1 knapsack problem. Nonetheless, few rigorous investigations address the quality of solutions that evolutionary algorithms may produce for the knapsack problem. The current paper focuses on a theoretical investigation of three types of (N+1) evolutionary algorithms that exploit bitwise mutation, truncation selection, plus different repair methods for the 0-1 knapsack problem. It assesses the solution quality in terms of the approximation ratio. Our work indicates that the solution produced by pure strategy and mixed strategy evolutionary algorithms is arbitrarily bad. Nevertheless, the evolutionary algorithm using helper objectives may produce 1/2-approximation solutions to the 0-1 knapsack problem.