Peng Xie

SY
h-index11
24papers
183citations
Novelty49%
AI Score55

24 Papers

LGApr 6, 2022
Spatio-Temporal Dynamic Graph Relation Learning for Urban Metro Flow Prediction

Peng Xie, Minbo Ma, Tianrui Li et al.

Urban metro flow prediction is of great value for metro operation scheduling, passenger flow management and personal travel planning. However, it faces two main challenges. First, different metro stations, e.g. transfer stations and non-transfer stations, have unique traffic patterns. Second, it is challenging to model complex spatio-temporal dynamic relation of metro stations. To address these challenges, we develop a spatio-temporal dynamic graph relational learning model (STDGRL) to predict urban metro station flow. First, we propose a spatio-temporal node embedding representation module to capture the traffic patterns of different stations. Second, we employ a dynamic graph relationship learning module to learn dynamic spatial relationships between metro stations without a predefined graph adjacency matrix. Finally, we provide a transformer-based long-term relationship prediction module for long-term metro flow prediction. Extensive experiments are conducted based on metro data in Beijing, Shanghai, Chongqing and Hangzhou. Experimental results show the advantages of our method beyond 11 baselines for urban metro flow prediction.

SDOct 27, 2023Code
Developing a Multilingual Dataset and Evaluation Metrics for Code-Switching: A Focus on Hong Kong's Polylingual Dynamics

Peng Xie, Kani Chen

The existing audio datasets are predominantly tailored towards single languages, overlooking the complex linguistic behaviors of multilingual communities that engage in code-switching. This practice, where individuals frequently mix two or more languages in their daily interactions, is particularly prevalent in multilingual regions such as Hong Kong, China. To bridge this gap, we have developed a 34.8-hour dataset of Mixed Cantonese and English (MCE) audio using our Multi-Agent Data Generation Framework (MADGF). We fine-tuned the open-source multilingual Automatic Speech Recognition (ASR) model, Whisper, with the MCE dataset, leading to impressive zero-shot performance. The traditional metrics overlook important factors such as latency in real-world applications and code-switching scenarios. We have introduced a novel evaluation metric called Fidelity to the Original Audio, Accuracy, and Latency (FAL). This metric aims to overcome the limitations of traditional metrics used to assess ASR systems.

LGSep 8, 2024
UMOD: A Novel and Effective Urban Metro Origin-Destination Flow Prediction Method

Peng Xie, Minbo Ma, Bin Wang et al.

Accurate prediction of metro Origin-Destination (OD) flow is essential for the development of intelligent transportation systems and effective urban traffic management. Existing approaches typically either predict passenger outflow of departure stations or inflow of destination stations. However, we argue that travelers generally have clearly defined departure and arrival stations, making these OD pairs inherently interconnected. Consequently, considering OD pairs as a unified entity more accurately reflects actual metro travel patterns and allows for analyzing potential spatio-temporal correlations between different OD pairs. To address these challenges, we propose a novel and effective urban metro OD flow prediction method (UMOD), comprising three core modules: a data embedding module, a temporal relation module, and a spatial relation module. The data embedding module projects raw OD pair inputs into hidden space representations, which are subsequently processed by the temporal and spatial relation modules to capture both inter-pair and intra-pair spatio-temporal dependencies. Experimental results on two real-world urban metro OD flow datasets demonstrate that adopting the OD pairs perspective is critical for accurate metro OD flow prediction. Our method outperforms existing approaches, delivering superior predictive performance.

ARApr 28
How Can Reinforcement Learning Achieve Expert-level Placement?

Ruo-Tong Chen, Ke Xue, Chengrui Gao et al.

Chip placement is a critical step in physical design. While reinforcement learning (RL)-based methods have recently emerged, their training primarily focuses on wirelength optimization, and therefore often fail to achieve expert-quality layouts. We identify the reward design as the primary cause for the performance gap with experts, and instead of formalizing intricate processes, we circumvent this by directly learning from expert layouts to derive a reward model. Our approach starts from the final expert layouts to infer step-by-step expert trajectories. Using these trajectories as demonstrations or preferences, we train a model that captures the latent implicit rewards in expert results. Experiments show that our framework can efficiently learn from even a single design and generalize well to unseen cases.

SYApr 15
Data-Driven Reachability Analysis Using Matrix Perturbation Theory

Peng Xie, Abdulla Fawzy, Zhen Zhang et al.

We propose a matrix zonotope perturbation framework that leverages matrix perturbation theory to characterize how noise-induced distortions alter the dynamics within sets of models. The framework derives interpretable Cai-Zhang bounds for matrix zonotopes (MZs) and extends them to constrained matrix zonotopes (CMZs). Motivated by this analysis and the computational burden of CMZ-based reachable-set propagation, we introduce a coefficient-space approximation in which the constrained coefficient space of the CMZ is over-approximated by an unconstrained zonotope. Replacing CMZ-constrained-zonotope (CZ) products with unconstrained MZ-zonotope multiplication yields a simpler and more scalable reachable-set update. Experimental results demonstrate that the proposed method is substantially faster than the standard CMZ approach while producing reachable sets that are less conservative than those obtained with existing MZ-based methods, advancing practical, accurate, and real-time data-driven reachability analysis.

SYApr 15
Orthogonal Transformations for Efficient Data-Driven Reachability Analysis

Peng Xie, Amr Alanwar

Data-driven reachability analysis using matrix zonotopes faces a fundamental challenge: the number of generators in the reachable set grows exponentially during propagation, while current order reduction yields overly conservative approximations in data-driven settings. This paper introduces an orthogonal matrix-based framework that appropriately transfers the coordinate system before reducing the generators of the reachable set, dramatically reducing reachable set volumes. By exploiting the factorized structure of data-driven matrix zonotope generators, we develop several efficient algorithms to solve the problem. Numerical experiments demonstrate order-of-magnitude volume reductions compared to traditional methods, while maintaining comparable generator numbers. Our method provides a practical solution to improve precision in data-driven safety verification.

SYMar 12
Conformalized Data-Driven Reachability Analysis with PAC Guarantees

Yanliang Huang, Zhen Zhang, Peng Xie et al.

Data-driven reachability analysis computes over-approximations of reachable sets directly from noisy data. Existing deterministic methods require either known noise bounds or system-specific structural parameters such as Lipschitz constants. We propose Conformalized Data-Driven Reachability (CDDR), a framework that provides Probably Approximately Correct (PAC) coverage guarantees through the Learn Then Test (LTT) calibration procedure, requiring only that calibration trajectories be independently and identically distributed. CDDR is developed for three settings: linear time-invariant (LTI) systems with unknown process noise distributions, LTI systems with bounded measurement noise, and general nonlinear systems including non-Lipschitz dynamics. Experiments on a 5-dimensional LTI system under Gaussian and heavy-tailed Student-t noise and on a 2-dimensional non-Lipschitz system with fractional damping demonstrate that CDDR achieves valid coverage where deterministic methods do not provide formal guarantees. Under anisotropic noise, a normalized score function reduces the reachable set volume while preserving the PAC guarantee.

ROApr 5
Informed Hybrid Zonotope-based Motion Planning Algorithm

Peng Xie, Johannes Betz, Amr Alanwar

Optimal path planning in nonconvex free spaces poses substantial computational challenges. A common approach formulates such problems as mixed-integer linear programs (MILPs); however, solving general MILPs is computationally intractable and severely limits scalability. To address these limitations, we propose HZ-MP, an informed Hybrid Zonotope-based Motion Planner, which decomposes the obstacle-free space and performs low-dimensional face sampling guided by an ellipsotope heuristic, thereby concentrating exploration on promising transition regions. This structured exploration mitigates the excessive wasted sampling that degrades existing informed planners in narrow-passage or enclosed-goal scenarios. We prove that HZ-MP is probabilistically complete and asymptotically optimal, and demonstrate empirically that it converges to high-quality trajectories within a small number of iterations.

SYMar 31
Certified Set Convergence for Piecewise Affine Systems via Neural Lyapunov Functions

Yanliang Huang, Peng Xie, Zhen Zhang et al.

Safety-critical control of piecewise affine (PWA) systems under bounded additive disturbances requires guarantees not for individual states but for entire state sets simultaneously: a single control action must steer every state in the set toward a target, even as sets crossing mode boundaries split and evolve under distinct affine dynamics. Certifying such set convergence via neural Lyapunov functions couples the Lipschitz constants of the value function and the policy, yet certified bounds for expressive networks exceed true values by orders of magnitude, creating a certification barrier. We resolve this through a three-stage pipeline that decouples verification from the policy. A value function from Hamilton-Jacobi backward reachability, trained via reinforcement learning, is the Lyapunov candidate. A permutation-invariant Deep Sets controller, distilled via regret minimization, produces a common action. Verification propagates zonotopes through the value network, yielding verified Lyapunov upper bounds over entire sets without bounding the policy Lipschitz constant. On four benchmarks up to dimension six, including systems with per-mode operator norms exceeding unity, the framework certifies set convergence with positive margin on every system. A spectrally constrained local certificate completes the terminal guarantee, and the set-actor is the only tested method to achieve full strict set containment, at constant-time online cost.

SYMar 31
Data-Driven Reachability Analysis via Diffusion Models with PAC Guarantees

Yanliang Huang, Peng Xie, Wenyuan Wu et al.

We present a data-driven framework for reachability analysis of nonlinear dynamical systems that requires no explicit model. A denoising diffusion probabilistic model learns the time-evolving state distribution of a dynamical system from trajectory data alone. The predicted reachable set takes the form of a sublevel set of a nonconformity score derived from the reconstruction error, with the threshold calibrated via the Learn Then Test procedure so that the probability of excluding a reachable state is bounded with high probability. Experiments on three nonlinear systems, a forced Duffing oscillator, a planar quadrotor, and a high-dimensional reaction-diffusion system, confirm that the empirical miss rate remains below the Probably Approximately Correct (PAC) bound while scaling to state dimensions beyond the reach of classical grid-based and polynomial methods.

CVNov 24, 2024Code
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Peng Xie, Yequan Bie, Jianda Mao et al.

Pre-trained vision-language models (VLMs) have showcased remarkable performance in image and natural language understanding, such as image captioning and response generation. As the practical applications of vision-language models become increasingly widespread, their potential safety and robustness issues raise concerns that adversaries may evade the system and cause these models to generate toxic content through malicious attacks. Therefore, evaluating the robustness of open-source VLMs against adversarial attacks has garnered growing attention, with transfer-based attacks as a representative black-box attacking strategy. However, most existing transfer-based attacks neglect the importance of the semantic correlations between vision and text modalities, leading to sub-optimal adversarial example generation and attack performance. To address this issue, we present Chain of Attack (CoA), which iteratively enhances the generation of adversarial examples based on the multi-modal semantic update using a series of intermediate attacking steps, achieving superior adversarial transferability and efficiency. A unified attack success rate computing method is further proposed for automatic evasion evaluation. Extensive experiments conducted under the most realistic and high-stakes scenario, demonstrate that our attacking strategy can effectively mislead models to generate targeted responses using only black-box attacks without any knowledge of the victim models. The comprehensive robustness evaluation in our paper provides insight into the vulnerabilities of VLMs and offers a reference for the safety considerations of future model developments.

ARApr 26
FlowPlace: Flow Matching for Chip Placement

Peng Xie, Ke Xue, Yunqi Shi et al.

Chip placement plays an important role in physical design. While generative models like diffusion models offer promising learning-based solutions, current methods have the following limitations: they use random synthetic data for pre-training, require long sampling times, and often result in overlaps due to their dependence on gradient-based solvers during the sampling process. To overcome these issues, we propose FlowPlace, which features mask-guided synthetic data generation, flow-based efficient training with flexible prior injection, and hard constraint sampling for overlap-free layouts. Experiments on OpenROAD and ICCAD 2015 benchmarks show FlowPlace achieves better PPA metrics, 10-50$\times$ faster sampling efficiency, and zero overlaps.

CLMay 30, 2025
SwitchLingua: The First Large-Scale Multilingual and Multi-Ethnic Code-Switching Dataset

Peng Xie, Xingyuan Liu, Tsz Wai Chan et al.

Code-switching (CS) is the alternating use of two or more languages within a conversation or utterance, often influenced by social context and speaker identity. This linguistic phenomenon poses challenges for Automatic Speech Recognition (ASR) systems, which are typically designed for a single language and struggle to handle multilingual inputs. The growing global demand for multilingual applications, including Code-Switching ASR (CSASR), Text-to-Speech (CSTTS), and Cross-Lingual Information Retrieval (CLIR), highlights the inadequacy of existing monolingual datasets. Although some code-switching datasets exist, most are limited to bilingual mixing within homogeneous ethnic groups, leaving a critical need for a large-scale, diverse benchmark akin to ImageNet in computer vision. To bridge this gap, we introduce \textbf{LinguaMaster}, a multi-agent collaboration framework specifically designed for efficient and scalable multilingual data synthesis. Leveraging this framework, we curate \textbf{SwitchLingua}, the first large-scale multilingual and multi-ethnic code-switching dataset, including: (1) 420K CS textual samples across 12 languages, and (2) over 80 hours of audio recordings from 174 speakers representing 18 countries/regions and 63 racial/ethnic backgrounds, based on the textual data. This dataset captures rich linguistic and cultural diversity, offering a foundational resource for advancing multilingual and multicultural research. Furthermore, to address the issue that existing ASR evaluation metrics lack sensitivity to code-switching scenarios, we propose the \textbf{Semantic-Aware Error Rate (SAER)}, a novel evaluation metric that incorporates semantic information, providing a more accurate and context-aware assessment of system performance.

SYApr 7
From Points to Sets: Set-Based Safety Verification in the Latent Space

Wenyuan Wu, Peng Xie, Zhen Zhang et al.

We extend latent representation methods for safety control design to set-valued states. Recent work has shown that barrier functions designed in a learned latent space can transfer safety guarantees back to the original system, but these methods evaluate certificates at single state points, ignoring state uncertainty. A fixed safety margin can partially address this but cannot adapt to the anisotropic and time-varying nature of the uncertainty gap across different safety constraints. We instead represent the system state as a zonotope, propagate it through the encoder to obtain a latent zonotope, and evaluate certificates over the worst case of the entire set. On a 16-dimensional quadrotor suspended-load gate passage task, set-valued evaluation achieves 5/5 collision-free passages, compared to 1/5 for point-based evaluation and 2/5 for a fixed-margin baseline. Set evaluation reports safety in 44.4% of per-head evaluations versus 48.5% for point-based, and this greater conservatism detects 4.1% blind spots where point evaluation falsely certifies safety, enabling earlier corrective control. The safety gap between point and set evaluation varies up to $12\times$ across certificate heads, explaining why no single fixed margin suffices and confirming the need for per-head, per-timestep adaptation, which set evaluation provides by construction.

SYApr 6
Bridging Data-Driven Reachability Analysis and Statistical Estimation via Constrained Matrix Convex Generators

Peng Xie, Zhen Zhang, Rolf Findeisen et al.

Data-driven reachability analysis enables safety verification when first-principles models are unavailable. This requires constructing sets of system models consistent with measured trajectories and noise assumptions. Existing approaches rely on zonotopic or box-based approximations, which do not fit the geometry of common noise distributions such as Gaussian disturbances and can lead to significant conservatism, especially in high-dimensional settings. This paper builds on ellipsotope-based representations to introduce mixed-norm uncertainty sets for data-driven reachability. The highest-density region defines the exact minimum-volume noise confidence set, while Constrained Convex Generators (CCG) and their matrix counterpart (CMCG) provide compatible geometric representations at the noise and parameter level. We show that the resulting CMCG coincides with the maximum-likelihood confidence ellipsoid for Gaussian disturbances, while remaining strictly tighter than constrained matrix zonotopes for mixed bounded-Gaussian noise. For non-convex noise distributions such as Gaussian mixtures, a minimum-volume enclosing ellipsoid provides a tractable convex surrogate. We further prove containment of the CMCG times CCG product and bound the conservatism of the Gaussian-Gaussian interaction. Numerical examples demonstrate substantially tighter reachable sets compared to box-based approximations of Gaussian disturbances. These results enable less conservative safety verification and improve the accuracy of uncertainty-aware control design.

SYApr 6
Data-Driven Reachability Analysis with Optimal Input Design

Peng Xie, Davide M. Raimondo, Rolf Findeisen et al.

This paper addresses the conservatism in data-driven reachability analysis for discrete-time linear systems subject to bounded process noise, where the system matrices are unknown and only input--state trajectory data are available. Building on the constrained matrix zonotope (CMZ) framework, two complementary strategies are proposed to reduce conservatism in reachable-set over-approximations. First, the standard Moore--Penrose pseudoinverse is replaced with a row-norm-minimizing right inverse computed via a second-order cone program (SOCP), which directly reduces the size of the resulting model set, yielding tighter generators and less conservative reachable sets. Second, an online A-optimal input design strategy is introduced to improve the informativeness of the collected data and the conditioning of the resulting model set, thereby reducing uncertainty. The proposed framework extends naturally to piecewise affine systems through mode-dependent data partitioning. Numerical results on a five-dimensional stable LTI system and a two-dimensional piecewise affine system demonstrate that combining designed inputs with the row-norm right inverse significantly reduces conservatism compared to a baseline using random inputs and the pseudoinverse, leading to tighter reachable sets for safety verification.

SYApr 2
Transformer-Accelerated Interpolated Data-Driven Reachability Analysis from Noisy Data

Zhen Zhang, Ahmad Hafez, Peng Xie et al.

Data-driven reachability analysis provides guaranteed outer approximations of reachable sets from input-state measurements, yet each propagation step requires a matrix-zonotope multiplication whose cost grows with the horizon length, limiting scalability. We observe that data-driven propagation is inherently step-size sensitive, in the sense that set-valued operators at different discretization resolutions yield non-equivalent reachable sets at the same physical time, a property absent in model-based propagation. Exploiting this multi-resolution structure, we propose Interpolated Reachability Analysis (IRA), which computes a sparse chain of coarse anchor sets sequentially and reconstructs fine-resolution intermediate sets in parallel across coarse intervals. We derive a fully data-driven coarse-noise over-approximation that removes the need for continuous-time system knowledge, prove deterministic outer-approximation guarantees for all interpolated sets, and establish conditional tightness relative to the fine-resolution chain. To replace the remaining matrix-zonotope multiplications in the fine phase, we further develop Transformer-Accelerated IRA (TA-IRA), where an encoder-decoder Transformer is calibrated via split conformal prediction to provide finite-sample pointwise and path-wise coverage certificates. Numerical experiments on a five-dimensional linear system confirm the theoretical guarantees and demonstrate significant computational savings.

SYApr 2
Transformer-Enhanced Data-Driven Output Reachability with Conformal Coverage Guarantees

Zhen Zhang, Peng Xie, Wenyuan Wu et al.

This paper considers output reachability analysis for linear time-invariant systems with unknown state-space matrices and unknown observation map, given only noisy input-output measurements. The Cayley--Hamilton theorem is applied to eliminate the latent state algebraically, producing an autoregressive input-output model whose parameter uncertainty is enclosed in a matrix zonotope. Set-valued propagation of this model yields output reachable sets with deterministic containment guarantees under a bounded aggregated residual assumption. The conservatism inherent in the lifted matrix-zonotope product is then mitigated by a decoder-only Transformer trained on labels obtained through directional contraction of the formal envelope via an exterior non-reachability certificate. Split conformal prediction restores distribution-free coverage at both per-step and trajectory levels without access to the true reachable-set hull. The framework is validated on a five-dimensional system with multiple unknown observation matrices.

ROMar 12
GNN-DIP: Neural Corridor Selection for Decomposition-Based Motion Planning

Peng Xie, Yanlinag Huang, Wenyuan Wu et al.

Motion planning through narrow passages remains a core challenge: sampling-based planners rarely place samples inside these narrow but critical regions, and even when samples land inside a passage, the straight-line connections between them run close to obstacle boundaries and are frequently rejected by collision checking. Decomposition-based planners resolve both issues by partitioning free space into convex cells -- every passage is captured exactly as a cell boundary, and any path within a cell is collision-free by construction. However, the number of candidate corridors through the cell graph grows combinatorially with environment complexity, creating a bottleneck in corridor selection. We present GNN-DIP, a framework that addresses this by integrating a Graph Neural Network (GNN) with a two-phase Decomposition-Informed Planner (DIP). The GNN predicts portal scores on the cell adjacency graph to bias corridor search toward near-optimal regions while preserving completeness. In 2D, Constrained Delaunay Triangulation (CDT) with the Funnel algorithm yields exact shortest paths within corridors; in 3D, Slab convex decomposition with portal-face sampling provides near-optimal path evaluation. Benchmarks on 2D narrow-passage scenarios, 3D bottleneck environments with up to 246 obstacles, and dynamic 2D settings show that GNN-DIP achieves 99--100% success rates with 2--280 times speedup over sampling-based baselines.

BMDec 28, 2023
AdaMR: Adaptable Molecular Representation for Unified Pre-training Strategy

Yan Ding, Hao Cheng, Ziliang Ye et al.

We propose Adjustable Molecular Representation (AdaMR), a new large-scale uniform pre-training strategy for small-molecule drugs, as a novel unified pre-training strategy. AdaMR utilizes a granularity-adjustable molecular encoding strategy, which is accomplished through a pre-training job termed molecular canonicalization, setting it apart from recent large-scale molecular models. This adaptability in granularity enriches the model's learning capability at multiple levels and improves its performance in multi-task scenarios. Specifically, the substructure-level molecular representation preserves information about specific atom groups or arrangements, influencing chemical properties and functionalities. This proves advantageous for tasks such as property prediction. Simultaneously, the atomic-level representation, combined with generative molecular canonicalization pre-training tasks, enhances validity, novelty, and uniqueness in generative tasks. All of these features work together to give AdaMR outstanding performance on a range of downstream tasks. We fine-tuned our proposed pre-trained model on six molecular property prediction tasks (MoleculeNet datasets) and two generative tasks (ZINC250K datasets), achieving state-of-the-art (SOTA) results on five out of eight tasks.

LGJan 22, 2022
HiSTGNN: Hierarchical Spatio-temporal Graph Neural Networks for Weather Forecasting

Minbo Ma, Peng Xie, Fei Teng et al.

Weather Forecasting is an attractive challengeable task due to its influence on human life and complexity in atmospheric motion. Supported by massive historical observed time series data, the task is suitable for data-driven approaches, especially deep neural networks. Recently, the Graph Neural Networks (GNNs) based methods have achieved excellent performance for spatio-temporal forecasting. However, the canonical GNNs-based methods only individually model the local graph of meteorological variables per station or the global graph of whole stations, lacking information interaction between meteorological variables in different stations. In this paper, we propose a novel Hierarchical Spatio-Temporal Graph Neural Network (HiSTGNN) to model cross-regional spatio-temporal correlations among meteorological variables in multiple stations. An adaptive graph learning layer and spatial graph convolution are employed to construct self-learning graph and study hidden dependency among nodes of variable-level and station-level graph. For capturing temporal pattern, the dilated inception as the backbone of gate temporal convolution is designed to model long and various meteorological trends. Moreover, a dynamic interaction learning is proposed to build bidirectional information passing in hierarchical graph. Experimental results on three real-world meteorological datasets demonstrate the superior performance of HiSTGNN beyond 7 baselines and it reduces the errors by 4.2% to 11.6% especially compared to state-of-the-art weather forecasting method.

LGApr 24, 2021
Exploring Multi-dimensional Data via Subset Embedding

Peng Xie, Wenyuan Tao, Jie Li et al.

Multi-dimensional data exploration is a classic research topic in visualization. Most existing approaches are designed for identifying record patterns in dimensional space or subspace. In this paper, we propose a visual analytics approach to exploring subset patterns. The core of the approach is a subset embedding network (SEN) that represents a group of subsets as uniformly-formatted embeddings. We implement the SEN as multiple subnets with separate loss functions. The design enables to handle arbitrary subsets and capture the similarity of subsets on single features, thus achieving accurate pattern exploration, which in most cases is searching for subsets having similar values on few features. Moreover, each subnet is a fully-connected neural network with one hidden layer. The simple structure brings high training efficiency. We integrate the SEN into a visualization system that achieves a 3-step workflow. Specifically, analysts (1) partition the given dataset into subsets, (2) select portions in a projected latent space created using the SEN, and (3) determine the existence of patterns within selected subsets. Generally, the system combines visualizations, interactions, automatic methods, and quantitative measures to balance the exploration flexibility and operation efficiency, and improve the interpretability and faithfulness of the identified patterns. Case studies and quantitative experiments on multiple open datasets demonstrate the general applicability and effectiveness of our approach.

CVMar 19, 2020
Quality Control of Neuron Reconstruction Based on Deep Learning

Donghuan Lu, Sujun Zhao, Peng Xie et al.

Neuron reconstruction is essential to generate exquisite neuron connectivity map for understanding brain function. Despite the significant amount of effect that has been made on automatic reconstruction methods, manual tracing by well-trained human annotators is still necessary. To ensure the quality of reconstructed neurons and provide guidance for annotators to improve their efficiency, we propose a deep learning based quality control method for neuron reconstruction in this paper. By formulating the quality control problem into a binary classification task regarding each single point, the proposed approach overcomes the technical difficulties resulting from the large image size and complex neuron morphology. Not only it provides the evaluation of reconstruction quality, but also can locate exactly where the wrong tracing begins. This work presents one of the first comprehensive studies for whole-brain scale quality control of neuron reconstructions. Experiments on five-fold cross validation with a large dataset demonstrate that the proposed approach can detect 74.7% errors with only 1.4% false alerts.

LGAug 26, 2019
Urban flows prediction from spatial-temporal data using machine learning: A survey

Peng Xie, Tianrui Li, Jia Liu et al.

Urban spatial-temporal flows prediction is of great importance to traffic management, land use, public safety, etc. Urban flows are affected by several complex and dynamic factors, such as patterns of human activities, weather, events and holidays. Datasets evaluated the flows come from various sources in different domains, e.g. mobile phone data, taxi trajectories data, metro/bus swiping data, bike-sharing data and so on. To summarize these methodologies of urban flows prediction, in this paper, we first introduce four main factors affecting urban flows. Second, in order to further analysis urban flows, a preparation process of multi-sources spatial-temporal data related with urban flows is partitioned into three groups. Third, we choose the spatial-temporal dynamic data as a case study for the urban flows prediction task. Fourth, we analyze and compare some well-known and state-of-the-art flows prediction methods in detail, classifying them into five categories: statistics-based, traditional machine learning-based, deep learning-based, reinforcement learning-based and transfer learning-based methods. Finally, we give open challenges of urban flows prediction and an outlook in the future of this field. This paper will facilitate researchers find suitable methods and open datasets for addressing urban spatial-temporal flows forecast problems.