SYNov 19, 2017
Stability Analysis of DC Microgrids with Constant Power Load under Distributed Control MethodZhangjie Liu, Mei Su, Yao Sun et al.
DC microgrids are becoming popular as effective means to integrate various renewable energy resources. Constant power loads (CPLs) may yield instability due to the negative impedance characteristic. This paper analyzes the stability of the DC microgrid in presence of CPL. Distributed generations (DGs) are controlled by using a distributed controller which aims at current sharing and voltage recovery. For simplicity, a reduced order model is derived on the fundamental of neglecting the transient state of the DC/DC converter. The purpose of this paper is to analyze stability conditions and give the suggestions to design control parameters. The stability conditions are obtained by using inertia theorem. Moreover, this paper makes a further detailed research based on the existed theorems. Simulation results are provided to verify the effectiveness and validity of the proposed theorem.
NIOct 26, 2023
A Wireless AI-Generated Content (AIGC) Provisioning Framework Empowered by Semantic CommunicationRunze Cheng, Yao Sun, Dusit Niyato et al.
With the significant advances in AI-generated content (AIGC) and the proliferation of mobile devices, providing high-quality AIGC services via wireless networks is becoming the future direction. However, the primary challenges of AIGC services provisioning in wireless networks lie in unstable channels, limited bandwidth resources, and unevenly distributed computational resources. To this end, this paper proposes a semantic communication (SemCom)-empowered AIGC (SemAIGC) generation and transmission framework, where only semantic information of the content rather than all the binary bits should be generated and transmitted by using SemCom. Specifically, SemAIGC integrates diffusion models within the semantic encoder and decoder to design a workload-adjustable transceiver thereby allowing adjustment of computational resource utilization in edge and local. In addition, a Resource-aware wOrklOad Trade-off (ROOT) scheme is devised to intelligently make workload adaptation decisions for the transceiver, thus efficiently generating, transmitting, and fine-tuning content as per dynamic wireless channel conditions and service requirements. Simulations verify the superiority of our proposed SemAIGC framework in terms of latency and content quality compared to conventional approaches.
SYJan 4, 2018
On existence and stability of equilibria of DC Microgrid with constant power loadsZhangjie Liu, Mei Su, Yao Sun et al.
The problem of existence and stability of equilibria of DC microgirds with constant power loads methods is addressed in this paper. Constant power loads (CPLs) often cause instability due to its negative impedance characteristics. What is the worse, CPLs may cause that the system admits no equilibria for its nonlinearity. The purpose of paper can be summarized as: A) designing a controller to overcome CPLs instability; B) obtaining the sufficient conditions to guarantee the existence of the equilibria; C) Under the conditions of equilibria existing, obtaining the sufficient conditions of local stability. For the first objective, the method based on virtual resistance and virtual inductance is proposed. For the second objective, we transform the problem of solvability of quadratic equations into the existence of the fixed point of a mapping. Thus, the analytical sufficient conditions is obtained based on Tarski fixed point theorem. Moreover, a numerical method is also provided to reduce conservatism. For the third objective, the small-signal model is established to predict the system qualitative behavior around equilibria. The stability conditions are obtained by using quadratic eigenvalue problem (GEP) theorem. These explicit conditions are obtained as a function of system parameters and it lead to build reliable microgrid.
ITOct 4, 2022
Beam Management in Ultra-dense mmWave Network via Federated Reinforcement Learning: An Intelligent and Secure ApproachQing Xue, Yi-Jing Liu, Yao Sun et al.
Deploying ultra-dense networks that operate on millimeter wave (mmWave) band is a promising way to address the tremendous growth on mobile data traffic. However, one key challenge of ultra-dense mmWave network (UDmmN) is beam management due to the high propagation delay, limited beam coverage as well as numerous beams and users. In this paper, a novel systematic beam control scheme is presented to tackle the beam management problem which is difficult due to the nonconvex objective function. We employ double deep Q-network (DDQN) under a federated learning (FL) framework to address the above optimization problem, and thereby fulfilling adaptive and intelligent beam management in UDmmN. In the proposed beam management scheme based on FL (BMFL), the non-rawdata aggregation can theoretically protect user privacy while reducing handoff cost. Moreover, we propose to adopt a data cleaning technique in the local model training for BMFL, with the aim to further strengthen the privacy protection of users while improving the learning convergence speed. Simulation results demonstrate the performance gain of our proposed scheme.
SYApr 26, 2017
Optimal Decentralized Economical-sharing Scheme in Islanded AC Microgrids with Cascaded InvertersLang Li, Huawen Ye, Yao Sun et al.
To address the economical dispatch problem without communications in islanded AC microgrids consisting of cascaded inverters, this paper proposes an optimal decentralized economical-sharing scheme. In proposed scheme, optimal sharing function of the current is applied to generate the reference voltages. And the frequency is used to drive all distributed generators (DGs) synchronize operation in microgrids. When the microgrid is in steady state, DGs share a single common frequency and current in terms of the proposed scheme. Thus the potential advantages of simplicity and decentralized manner are retained. The AC microgrid model has been developed through simulations and experiments to verify the effectiveness and performance of the proposed scheme.
CVSep 14, 2023Code
Towards Large-scale Building Attribute Mapping using Crowdsourced Images: Scene Text Recognition on Flickr and Problems to be SolvedYao Sun, Anna Kruspe, Liqiu Meng et al.
Crowdsourced platforms provide huge amounts of street-view images that contain valuable building information. This work addresses the challenges in applying Scene Text Recognition (STR) in crowdsourced street-view images for building attribute mapping. We use Flickr images, particularly examining texts on building facades. A Berlin Flickr dataset is created, and pre-trained STR models are used for text detection and recognition. Manual checking on a subset of STR-recognized images demonstrates high accuracy. We examined the correlation between STR results and building functions, and analysed instances where texts were recognized on residential buildings but not on commercial ones. Further investigation revealed significant challenges associated with this task, including small text regions in street-view images, the absence of ground truth labels, and mismatches in buildings in Flickr images and building footprints in OpenStreetMap (OSM). To develop city-wide mapping beyond urban hotspot locations, we suggest differentiating the scenarios where STR proves effective while developing appropriate algorithms or bringing in additional data for handling other cases. Furthermore, interdisciplinary collaboration should be undertaken to understand the motivation behind building photography and labeling. The STR-on-Flickr results are publicly available at https://github.com/ya0-sun/STR-Berlin.
IVNov 2, 2022
WiserVR: Semantic Communication Enabled Wireless Virtual Reality DeliveryLe Xia, Yao Sun, Chengsi Liang et al.
Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using semantic communication, a new paradigm that promises to significantly ease the resource pressure, for efficient VR delivery. To this end, we propose a novel framework, namely WIreless SEmantic deliveRy for VR (WiserVR), for delivering consecutive 360° video frames to VR users. Specifically, deep learning-based multiple modules are well-devised for the transceiver in WiserVR to realize high-performance feature extraction and semantic recovery. Among them, we dedicatedly develop a concept of semantic location graph and leverage the joint-semantic-channel-coding method with knowledge sharing to not only substantially reduce communication latency, but also to guarantee adequate transmission reliability and resilience under various channel states. Moreover, implementation of WiserVR is presented, followed by corresponding initial simulations for performance evaluation compared with benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of WiserVR.
CVJun 9, 2023
DeepLCZChange: A Remote Sensing Deep Learning Model Architecture for Urban Climate ResilienceWenlu Sun, Yao Sun, Chenying Liu et al.
Urban land use structures impact local climate conditions of metropolitan areas. To shed light on the mechanism of local climate wrt. urban land use, we present a novel, data-driven deep learning architecture and pipeline, DeepLCZChange, to correlate airborne LiDAR data statistics with the Landsat 8 satellite's surface temperature product. A proof-of-concept numerical experiment utilizes corresponding remote sensing data for the city of New York to verify the cooling effect of urban forests.
IVDec 11, 2023Code
QuickQuakeBuildings: Post-earthquake SAR-Optical Dataset for Quick Damaged-building DetectionYao Sun, Yi Wang, Michael Eineder
Quick and automated earthquake-damaged building detection from post-event satellite imagery is crucial, yet it is challenging due to the scarcity of training data required to develop robust algorithms. This letter presents the first dataset dedicated to detecting earthquake-damaged buildings from post-event very high resolution (VHR) Synthetic Aperture Radar (SAR) and optical imagery. Utilizing open satellite imagery and annotations acquired after the 2023 Turkey-Syria earthquakes, we deliver a dataset of coregistered building footprints and satellite image patches of both SAR and optical data, encompassing more than four thousand buildings. The task of damaged building detection is formulated as a binary image classification problem, that can also be treated as an anomaly detection problem due to extreme class imbalance. We provide baseline methods and results to serve as references for comparison. Researchers can utilize this dataset to expedite algorithm development, facilitating the rapid detection of damaged buildings in response to future events. The dataset and codes together with detailed explanations and visualization are made publicly available at \url{https://github.com/ya0-sun/PostEQ-SARopt-BuildingDamage}.
IVJan 29
A Survey on Semantic Communication for Vision: Categories, Frameworks, Enabling Techniques, and ApplicationsRunze Cheng, Yao Sun, Ahmad Taha et al.
Semantic communication (SemCom) emerges as a transformative paradigm for traffic-intensive visual data transmission, shifting focus from raw data to meaningful content transmission and relieving the increasing pressure on communication resources. However, to achieve SemCom, challenges are faced in accurate semantic quantization for visual data, robust semantic extraction and reconstruction under diverse tasks and goals, transceiver coordination with effective knowledge utilization, and adaptation to unpredictable wireless communication environments. In this paper, we present a systematic review of SemCom for visual data transmission (SemCom-Vision), wherein an interdisciplinary analysis integrating computer vision (CV) and communication engineering is conducted to provide comprehensive guidelines for the machine learning (ML)-empowered SemCom-Vision design. Specifically, this survey first elucidates the basics and key concepts of SemCom. Then, we introduce a novel classification perspective to categorize existing SemCom-Vision approaches as semantic preservation communication (SPC), semantic expansion communication (SEC), and semantic refinement communication (SRC) based on communication goals interpreted through semantic quantization schemes. Moreover, this survey articulates the ML-based encoder-decoder models and training algorithms for each SemCom-Vision category, followed by knowledge structure and utilization strategies. Finally, we discuss potential SemCom-Vision applications.
CVMay 23, 2025Code
Building Floor Number Estimation from Crowdsourced Street-Level Images: Munich Dataset and Baseline MethodYao Sun, Sining Chen, Yifan Tian et al.
Accurate information on the number of building floors, or above-ground storeys, is essential for household estimation, utility provision, risk assessment, evacuation planning, and energy modeling. Yet large-scale floor-count data are rarely available in cadastral and 3D city databases. This study proposes an end-to-end deep learning framework that infers floor numbers directly from unrestricted, crowdsourced street-level imagery, avoiding hand-crafted features and generalizing across diverse facade styles. To enable benchmarking, we release the Munich Building Floor Dataset, a public set of over 6800 geo-tagged images collected from Mapillary and targeted field photography, each paired with a verified storey label. On this dataset, the proposed classification-regression network attains 81.2% exact accuracy and predicts 97.9% of buildings within +/-1 floor. The method and dataset together offer a scalable route to enrich 3D city models with vertical information and lay a foundation for future work in urban informatics, remote sensing, and geographic information science. Source code and data will be released under an open license at https://github.com/ya0-sun/Munich-SVI-Floor-Benchmark.
CVApr 8, 2025Code
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised SegmentationXiao Zhang, Xiangyu Han, Xiwen Lai et al.
Today's unsupervised image segmentation algorithms often segment suboptimally. Modern graph-cut based approaches rely on high-dimensional attention maps from Transformer-based foundation models, typically employing a relaxed Normalized Cut solved recursively via the Fiedler vector (the eigenvector of the second smallest eigenvalue). Consequently, they still lag behind supervised methods in both mask generation speed and segmentation accuracy. We present a regularized fractional alternating cut (Falcon), an optimization-based K-way Normalized Cut without relying on recursive eigenvector computations, achieving substantially improved speed and accuracy. Falcon operates in two stages: (1) a fast K-way Normalized Cut solved by extending into a fractional quadratic transformation, with an alternating iterative procedure and regularization to avoid local minima; and (2) refinement of the resulting masks using complementary low-level information, producing high-quality pixel-level segmentations. Experiments show that Falcon not only surpasses existing state-of-the-art methods by an average of 2.5% across six widely recognized benchmarks (reaching up to 4.3\% improvement on Cityscapes), but also reduces runtime by around 30% compared to prior graph-based approaches. These findings demonstrate that the semantic information within foundation-model attention can be effectively harnessed by a highly parallelizable graph cut framework. Consequently, Falcon can narrow the gap between unsupervised and supervised segmentation, enhancing scalability in real-world applications and paving the way for dense prediction-based vision pre-training in various downstream tasks. The code is released in https://github.com/KordingLab/Falcon.
LGApr 8
Time-Series Classification with Multivariate Statistical Dependence FeaturesYao Sun, Bo Hu, Jose Principe
In this paper, we propose a novel framework for non-stationary time-series analysis that replaces conventional correlation-based statistics with direct estimation of statistical dependence in the normalized joint density of input and target signals, the cross density ratio (CDR). Unlike windowed correlation estimates, this measure is independent of sample order and robust to regime changes. The method builds on the functional maximal correlation algorithm (FMCA), which constructs a projection space by decomposing the eigenspectrum of the CDR. Multiscale features from this eigenspace are classified using a lightweight single-hidden-layer perceptron. On the TI-46 digit speech corpus, our approach outperforms hidden Markov models (HMMs) and state-of-the-art spiking neural networks, achieving higher accuracy with fewer than 10 layers and a storage footprint under 5 MB.
LGFeb 7, 2024
Blockchain-enabled Clustered and Scalable Federated Learning (BCS-FL) Framework in UAV NetworksSana Hafeez, Lina Mohjazi, Muhammad Ali Imran et al.
Privacy, scalability, and reliability are significant challenges in unmanned aerial vehicle (UAV) networks as distributed systems, especially when employing machine learning (ML) technologies with substantial data exchange. Recently, the application of federated learning (FL) to UAV networks has improved collaboration, privacy, resilience, and adaptability, making it a promising framework for UAV applications. However, implementing FL for UAV networks introduces drawbacks such as communication overhead, synchronization issues, scalability limitations, and resource constraints. To address these challenges, this paper presents the Blockchain-enabled Clustered and Scalable Federated Learning (BCS-FL) framework for UAV networks. This improves the decentralization, coordination, scalability, and efficiency of FL in large-scale UAV networks. The framework partitions UAV networks into separate clusters, coordinated by cluster head UAVs (CHs), to establish a connected graph. Clustering enables efficient coordination of updates to the ML model. Additionally, hybrid inter-cluster and intra-cluster model aggregation schemes generate the global model after each training round, improving collaboration and knowledge sharing among clusters. The numerical findings illustrate the achievement of convergence while also emphasizing the trade-offs between the effectiveness of training and communication efficiency.
CVMay 12, 2025
TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark DatasetOlaf Wysocki, Benedikt Schwab, Manoj Kumar Biswanath et al.
Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually limited to one part of the processing chain, hampering comprehensive UDTs validation. To address these challenges, we introduce the first comprehensive multimodal Urban Digital Twin benchmark dataset: TUM2TWIN. This dataset includes georeferenced, semantically aligned 3D models and networks along with various terrestrial, mobile, aerial, and satellite observations boasting 32 data subsets over roughly 100,000 $m^2$ and currently 767 GB of data. By ensuring georeferenced indoor-outdoor acquisition, high accuracy, and multimodal data integration, the benchmark supports robust analysis of sensors and the development of advanced reconstruction methods. Additionally, we explore downstream tasks demonstrating the potential of TUM2TWIN, including novel view synthesis of NeRF and Gaussian Splatting, solar potential analysis, point cloud semantic segmentation, and LoD3 building reconstruction. We are convinced this contribution lays a foundation for overcoming current limitations in UDT creation, fostering new research directions and practical solutions for smarter, data-driven urban environments. The project is available under: https://tum2t.win
LGMay 13, 2024
Fighter flight trajectory prediction based on spatio-temporal graphcial attention networkYao Sun, Tengyu Jing, Jiapeng Wang et al.
Quickly and accurately predicting the flight trajectory of a blue army fighter in close-range air combat helps a red army fighter gain a dominant situation, which is the winning factor in later air combat. However,due to the high speed and even hypersonic capabilities of advanced fighters, the diversity of tactical maneuvers,and the instantaneous nature of situational transitions,it is difficult to meet the requirements of practical combat applications in terms of prediction accuracy.To improve prediction accuracy,this paper proposes a spatio-temporal graph attention network (ST-GAT) using encoding and decoding structures to predict the flight trajectory. The encoder adopts a parallel structure of Transformer and GAT branches embedded with the multi-head self-attention mechanism in each front end. The Transformer branch network is used to extract the temporal characteristics of historical trajectories and capture the impact of the fighter's historical state on future trajectories, while the GAT branch network is used to extract spatial features in historical trajectories and capture potential spatial correlations between fighters.Then we concatenate the outputs of the two branches into a new feature vector and input it into a decoder composed of a fully connected network to predict the future position coordinates of the blue army fighter.The computer simulation results show that the proposed network significantly improves the prediction accuracy of flight trajectories compared to the enhanced CNN-LSTM network (ECNN-LSTM), with improvements of 47% and 34% in both ADE and FDE indicators,providing strong support for subsequent autonomous combat missions.
LGMar 24, 2025
A semantic communication-based workload-adjustable transceiver for wireless AI-generated content (AIGC) deliveryRunze Cheng, Yao Sun, Lan Zhang et al.
With the significant advances in generative AI (GAI) and the proliferation of mobile devices, providing high-quality AI-generated content (AIGC) services via wireless networks is becoming the future direction. However, the primary challenges of AIGC service delivery in wireless networks lie in unstable channels, limited bandwidth resources, and unevenly distributed computational resources. In this paper, we employ semantic communication (SemCom) in diffusion-based GAI models to propose a Resource-aware wOrkload-adjUstable TransceivEr (ROUTE) for AIGC delivery in dynamic wireless networks. Specifically, to relieve the communication resource bottleneck, SemCom is utilized to prioritize semantic information of the generated content. Then, to improve computational resource utilization in both edge and local and reduce AIGC semantic distortion in transmission, modified diffusion-based models are applied to adjust the computing workload and semantic density in cooperative content generation. Simulations verify the superiority of our proposed ROUTE in terms of latency and content quality compared to conventional AIGC approaches.
NIFeb 15, 2022
Wireless Resource Management in Intelligent Semantic Communication NetworksLe Xia, Yao Sun, Xiaoqian Li et al.
The prosperity of artificial intelligence (AI) has laid a promising paradigm of communication system, i.e., intelligent semantic communication (ISC), where semantic contents, instead of traditional bit sequences, are coded by AI models for efficient communication. Due to the unique demand of background knowledge for semantic recovery, wireless resource management faces new challenges in ISC. In this paper, we address the user association (UA) and bandwidth allocation (BA) problems in an ISC-enabled heterogeneous network (ISC-HetNet). We first introduce the auxiliary knowledge base (KB) into the system model, and develop a new performance metric for the ISC-HetNet, named system throughput in message (STM). Joint optimization of UA and BA is then formulated with the aim of STM maximization subject to KB matching and wireless bandwidth constraints. To this end, we propose a two-stage solution, including a stochastic programming method in the first stage to obtain a deterministic objective with semantic confidence, and a heuristic algorithm in the second stage to reach the optimality of UA and BA. Numerical results show great superiority and reliability of our proposed solution on the STM performance when compared with two baseline algorithms.
IVNov 18, 2021
Large-scale Building Height Retrieval from Single SAR Imagery based on Bounding Box Regression NetworksYao Sun, Lichao Mou, Yuanyuan Wang et al.
Building height retrieval from synthetic aperture radar (SAR) imagery is of great importance for urban applications, yet highly challenging owing to the complexity of SAR data. This paper addresses the issue of building height retrieval in large-scale urban areas from a single TerraSAR-X spotlight or stripmap image. Based on the radar viewing geometry, we propose that this problem can be formulated as a bounding box regression problem and therefore allows for integrating height data from multiple data sources in generating ground truth on a larger scale. We introduce building footprints from geographic information system (GIS) data as complementary information and propose a bounding box regression network that exploits the location relationship between a building's footprint and its bounding box, allowing for fast computation. This is important for large-scale applications. The method is validated on four urban data sets using TerraSAR-X images in both high-resolution spotlight and stripmap modes. Experimental results show that the proposed network can reduce the computation cost significantly while keeping the height accuracy of individual buildings compared to a Faster R-CNN based method. Moreover, we investigate the impact of inaccurate GIS data on our proposed network, and this study shows that the bounding box regression network is robust against positioning errors in GIS data. The proposed method has great potential to be applied to regional or even global scales.
CRApr 27, 2021
Secure and Efficient Federated Learning Through Layering and Sharding BlockchainShuo Yuan, Bin Cao, Yao Sun et al.
Introducing blockchain into Federated Learning (FL) to build a trusted edge computing environment for transmission and learning has attracted widespread attention as a new decentralized learning pattern. However, traditional consensus mechanisms and architectures of blockchain systems face significant challenges in handling large-scale FL tasks, especially on Internet of Things (IoT) devices, due to their substantial resource consumption, limited transaction throughput, and complex communication requirements. To address these challenges, this paper proposes ChainFL, a novel two-layer blockchain-driven FL system. It splits the IoT network into multiple shards within the subchain layer, effectively reducing the scale of information exchange, and employs a Direct Acyclic Graph (DAG)-based mainchain as the mainchain layer, enabling parallel and asynchronous cross-shard validation. Furthermore, the FL procedure is customized to integrate deeply with blockchain technology, and a modified DAG consensus mechanism is designed to mitigate distortion caused by abnormal models. To provide a proof-of-concept implementation and evaluation, multiple subchains based on Hyperledger Fabric and a self-developed DAG-based mainchain are deployed. Extensive experiments demonstrate that ChainFL significantly surpasses conventional FL systems, showing up to a 14% improvement in training efficiency and a threefold increase in robustness.
NIApr 9, 2021
Smart and Secure CAV Networks Empowered by AI-Enabled Blockchain: The Next Frontier for Intelligent Safe Driving AssessmentLe Xia, Yao Sun, Rafiq Swash et al.
Securing safe driving for connected and autonomous vehicles (CAVs) continues to be a widespread concern, despite various sophisticated functions delivered by artificial intelligence for in-vehicle devices. Diverse malicious network attacks are ubiquitous, along with the worldwide implementation of the Internet of Vehicles, which exposes a range of reliability and privacy threats for managing data in CAV networks. Combined with the fact that the capability of existing CAVs in handling intensive computation tasks is limited, this implies a need for designing an efficient assessment system to guarantee autonomous driving safety without compromising data security. In this article we propose a novel framework, namely Blockchain-enabled intElligent Safe-driving assessmenT (BEST), which offers a smart and reliable approach for conducting safe driving supervision while protecting vehicular information. Specifically, a promising solution that exploits a long short-term memory model is introduced to assess the safety level of the moving CAVs. Then we investigate how a distributed blockchain obtains adequate trustworthiness and robustness for CAV data by adopting a byzantine fault tolerance-based delegated proof-of-stake consensus mechanism. Simulation results demonstrate that our presented BEST gains better data credibility with a higher prediction accuracy for vehicular safety assessment when compared with existing schemes. Finally, we discuss several open challenges that need to be addressed in future CAV networks.
IVNov 17, 2020
CG-Net: Conditional GIS-aware Network for Individual Building Segmentation in VHR SAR ImagesYao Sun, Yuansheng Hua, Lichao Mou et al.
Object retrieval and reconstruction from very high resolution (VHR) synthetic aperture radar (SAR) images are of great importance for urban SAR applications, yet highly challenging owing to the complexity of SAR data. This paper addresses the issue of individual building segmentation from a single VHR SAR image in large-scale urban areas. To achieve this, we introduce building footprints from GIS data as complementary information and propose a novel conditional GIS-aware network (CG-Net). The proposed model learns multi-level visual features and employs building footprints to normalize the features for predicting building masks in the SAR image. We validate our method using a high resolution spotlight TerraSAR-X image collected over Berlin. Experimental results show that the proposed CG-Net effectively brings improvements with variant backbones. We further compare two representations of building footprints, namely complete building footprints and sensor-visible footprint segments, for our task, and conclude that the use of the former leads to better segmentation results. Moreover, we investigate the impact of inaccurate GIS data on our CG-Net, and this study shows that CG-Net is robust against positioning errors in GIS data. In addition, we propose an approach of ground truth generation of buildings from an accurate digital elevation model (DEM), which can be used to generate large-scale SAR image datasets. The segmentation results can be applied to reconstruct 3D building models at level-of-detail (LoD) 1, which is demonstrated in our experiments.
CVJun 6, 2020
Instance segmentation of buildings using keypointsQingyu Li, Lichao Mou, Yuansheng Hua et al.
Building segmentation is of great importance in the task of remote sensing imagery interpretation. However, the existing semantic segmentation and instance segmentation methods often lead to segmentation masks with blurred boundaries. In this paper, we propose a novel instance segmentation network for building segmentation in high-resolution remote sensing images. More specifically, we consider segmenting an individual building as detecting several keypoints. The detected keypoints are subsequently reformulated as a closed polygon, which is the semantic boundary of the building. By doing so, the sharp boundary of the building could be preserved. Experiments are conducted on selected Aerial Imagery for Roof Segmentation (AIRS) dataset, and our method achieves better performance in both quantitative and qualitative results with comparison to the state-of-the-art methods. Our network is a bottom-up instance segmentation method that could well preserve geometric details.
CVDec 19, 2019
So2Sat LCZ42: A Benchmark Dataset for Global Local Climate Zones ClassificationXiao Xiang Zhu, Jingliang Hu, Chunping Qiu et al.
Access to labeled reference data is one of the grand challenges in supervised machine learning endeavors. This is especially true for an automated analysis of remote sensing images on a global scale, which enables us to address global challenges such as urbanization and climate change using state-of-the-art machine learning techniques. To meet these pressing needs, especially in urban research, we provide open access to a valuable benchmark dataset named "So2Sat LCZ42," which consists of local climate zone (LCZ) labels of about half a million Sentinel-1 and Sentinel-2 image patches in 42 urban agglomerations (plus 10 additional smaller areas) across the globe. This dataset was labeled by 15 domain experts following a carefully designed labeling work flow and evaluation process over a period of six months. As rarely done in other labeled remote sensing dataset, we conducted rigorous quality assessment by domain experts. The dataset achieved an overall confidence of 85%. We believe this LCZ dataset is a first step towards an unbiased globallydistributed dataset for urban growth monitoring using machine learning methods, because LCZ provide a rather objective measure other than many other semantic land use and land cover classifications. It provides measures of the morphology, compactness, and height of urban areas, which are less dependent on human and culture. This dataset can be accessed from http://doi.org/10.14459/2018mp1483140.
CRNov 21, 2019
An Interleaving Hybrid Consensus ProtocolYao Sun, Aayush Rajasekaran
We introduce Unity Interleave, a new consensus algorithm for public blockchain settings. It is an eventual consistency protocol merging the Proof-of-Work (PoW) and Proof-of-Stake (PoS) into a coherent stochastic process. It builds upon research previously done for the Unity protocol, improving security while maintaining fairness and scalability.
CRJun 7, 2019
A Unifying Hybrid Consensus ProtocolYulong Wu, Yunfei Zha, Yao Sun
We introduce Unity, a new consensus algorithm for public blockchain settings. Unity is an eventual consistency protocol merging the Proof-of-Work (PoW) and Proof-of-Stake (PoS) into a coherent stochastic process. It encompasses hardware and economic security without sacrificing availability, unpredictability and decentralization. Empirical results indicate that the proposed protocol is fair and scalable to an arbitrary number of miners and stakers.
LGJan 6, 2019
Efforts estimation of doctors annotating medical imageYang Deng, Yao Sun, Yongpei Zhu et al.
Accurate annotation of medical image is the crucial step for image AI clinical application. However, annotating medical image will incur a great deal of annotation effort and expense due to its high complexity and needing experienced doctors. To alleviate annotation cost, some active learning methods are proposed. But such methods just cut the number of annotation candidates and do not study how many efforts the doctor will exactly take, which is not enough since even annotating a small amount of medical data will take a lot of time for the doctor. In this paper, we propose a new criterion to evaluate efforts of doctors annotating medical image. First, by coming active learning and U-shape network, we employ a suggestive annotation strategy to choose the most effective annotation candidates. Then we exploit a fine annotation platform to alleviate annotating efforts on each candidate and first utilize a new criterion to quantitatively calculate the efforts taken by doctors. In our work, we take MR brain tissue segmentation as an example to evaluate the proposed method. Extensive experiments on the well-known IBSR18 dataset and MRBrainS18 Challenge dataset show that, using proposed strategy, state-of-the-art segmentation performance can be achieved by using only 60% annotation candidates and annotation efforts can be alleviated by at least 44%, 44%, 47% on CSF, GM, WM separately.
CVAug 1, 2018
A Multi-channel Network with Image Retrieval for Accurate Brain Tissue SegmentationYao Sun, Yang Deng, Yue Xu et al.
Magnetic Resonance Imaging (MRI) is widely used in the pathological and functional studies of the brain, such as epilepsy, tumor diagnosis, etc. Automated accurate brain tissue segmentation like cerebro-spinal fluid (CSF), gray matter (GM), white matter (WM) is the basis of these studies and many researchers are seeking it to the best. Based on the truth that multi-channel segmentation network with its own ground truth achieves up to average dice ratio 0.98, we propose a novel method that we add a fourth channel with the ground truth of the most similar image's obtained by CBIR from the database. The results show that the method improves the segmentation performance, as measured by average dice ratio, by approximately 0.01 in the MRBrainS18 database. In addition, our method is concise and robust, which can be used to any network architecture that needs not be modified a lot.
CVJul 23, 2018
DASN:Data-Aware Skilled Network for Accurate MR Brain Tissue SegmentationYang Deng, Yao Sun, Yongpei Zhu et al.
Accurate segmentation of MR brain tissue is a crucial step for diagnosis, surgical planning, and treatment of brain abnormalities. Automatic and reliable segmenta-tion methods are required to assist doctor. Over the last few years, deep learning especially deep convolutional neural networks (CNNs) have emerged as one of the most prominent approaches for image recognition problems in various do-mains. But the improvement of deep networks always needs inspiration, which is rare for the ordinary. Until now,there have been reasonable MR brain tissue segmentation methods,all of which can achieve promising performance. These different methods have their own characteristic and are distinctive for data sets. In other words, different models performance vary widely on the same data sets and each model has what it is skilled in. It is on the basis of this, we propose a judgement to distinguish data sets that different models are good at. With our method, the segmentation accuracy can be improved easily based on the existing models, neither without increasing training data nor improving the network. We validate our method on the widely used IBSR 18 dataset and obtain average dice ratio of 88.06%,while it is 85.82% and 86.92% when only using separate one model respectively.
CVJul 19, 2018
A Strategy of MR Brain Tissue Images' Suggestive Annotation Based on Modified U-NetYang Deng, Yao Sun, Yongpei Zhu et al.
Accurate segmentation of MR brain tissue is a crucial step for diagnosis,surgical planning, and treatment of brain abnormalities. However,it is a time-consuming task to be performed by medical experts. So, automatic and reliable segmentation methods are required. How to choose appropriate training dataset from limited labeled dataset rather than the whole also has great significance in saving training time. In addition, medical data labeled is too rare and expensive to obtain extensively, so choosing appropriate unlabeled dataset instead of all the datasets to annotate, which can attain at least same performance, is also very meaningful. To solve the problem above, we design an automatic segmentation method based on U-shaped deep convolutional network and obtain excellent result with average DSC metric of 0.8610, 0.9131, 0.9003 for Cerebrospinal Fluid (CSF), Gray Matter (GM) and White Matter (WM) respectively on the well-known IBSR18 dataset. We use bootstrapping algorithm for selecting the most effective training data and get more state-of-the-art segmentation performance by using only 50% of training data. Moreover, we propose a strategy of MR brain tissue images' suggestive annotation for unlabeled medical data based on the modified U-net. The proposed method performs fast and can be used in clinical.
CVFeb 1, 2018
Face Aging with Contextual Generative Adversarial NetsSi Liu, Yao Sun, Defa Zhu et al.
Face aging, which renders aging faces for an input face, has attracted extensive attention in the multimedia research. Recently, several conditional Generative Adversarial Nets (GANs) based methods have achieved great success. They can generate images fitting the real face distributions conditioned on each individual age group. However, these methods fail to capture the transition patterns, e.g., the gradual shape and texture changes between adjacent age groups. In this paper, we propose a novel Contextual Generative Adversarial Nets (C-GANs) to specifically take it into consideration. The C-GANs consists of a conditional transformation network and two discriminative networks. The conditional transformation network imitates the aging procedure with several specially designed residual blocks. The age discriminative network guides the synthesized face to fit the real conditional distribution. The transition pattern discriminative network is novel, aiming to distinguish the real transition patterns with the fake ones. It serves as an extra regularization term for the conditional transformation network, ensuring the generated image pairs to fit the corresponding real transition pattern distribution. Experimental results demonstrate the proposed framework produces appealing results by comparing with the state-of-the-art and ground truth. We also observe performance gain for cross-age face verification.
CVJan 4, 2018
Cross-domain Human Parsing via Adversarial Feature and Label AdaptationSi Liu, Yao Sun, Defa Zhu et al.
Human parsing has been extensively studied recently due to its wide applications in many important scenarios. Mainstream fashion parsing models focus on parsing the high-resolution and clean images. However, directly applying the parsers trained on benchmarks to a particular application scenario in the wild, e.g., a canteen, airport or workplace, often gives non-satisfactory performance due to domain shift. In this paper, we explore a new and challenging cross-domain human parsing problem: taking the benchmark dataset with extensive pixel-wise labeling as the source domain, how to obtain a satisfactory parser on a new target domain without requiring any additional manual labeling? To this end, we propose a novel and efficient cross-domain human parsing model to bridge the cross-domain differences in terms of visual appearance and environment conditions and fully exploit commonalities across domains. Our proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain differences. A discriminative feature adversarial network is introduced to supervise the feature compensation to effectively reduce the discrepancy between feature distributions of two domains. Besides, our model also introduces a structured label adversarial network to guide the parsing results of the target domain to follow the high-order relationships of the structured labels shared across domains. The proposed framework is end-to-end trainable, practical and scalable in real applications. Extensive experiments are conducted where LIP dataset is the source domain and 4 different datasets including surveillance videos, movies and runway shows are evaluated as target domains. The results consistently confirm data efficiency and performance advantages of the proposed method for the cross-domain human parsing problem.
SYSep 9, 2017
Optimal Decentralized Economical-sharing Criterion and Scheme for MicrogridZhangjie Liu, Mei Su, Yao Sun et al.
In order to address the economical dispatch problem in islanded microgrid, this letter proposes an optimal criterion and two decentralized economical-sharing schemes. The criterion is to judge whether global optimal economical-sharing can be realized via a decentralized manner. On the one hand, if the system cost functions meet this criterion, the corresponding decentralized droop method is proposed to achieve the global optimal dispatch. Otherwise, if the system does not meet this criterion, a modified method to achieve suboptimal dispatch is presented. The advantages of these methods are convenient,effective and communication-less.
SYSep 9, 2017
A Fully Decentralized Control of Grid-Connected Cascaded InvertersYao Sun, Xiaochao Hou, Hua Han et al.
This letter proposes a decentralized control scheme for grid-connected cascaded modular inverters without any communication, and each module makes decisions based on its own local information. In contrast, the conventional methods are usually centralized control and depend on a real-time communication. Thus, the proposed scheme has advantages of improved reliability and decreased costs. The overall system stability is analyzed, and the stability condition is derived as well. The feasibility of the proposed method is verified by simulation.
ITMay 18, 2017
Protecting Against Untrusted Relays: An Information Self-encrypted ApproachHao Niu, Yao Sun, Kaoru Sezaki
The reliability and transmission distance are generally limited for the wireless communications due to the severe channel fading. As an effective way to resist the channel fading, cooperative relaying is usually adopted in wireless networks where neighbouring nodes act as relays to help the transmission between the source and the destination. Most research works simply regard these cooperative nodes trustworthy, which may be not practical in some cases especially when transmitting confidential information. In this paper, we consider the issue of untrusted relays in cooperative communications and propose an information self-encrypted approach to protect against these relays. Specifically, the original packets of the information are used to encrypt each other as the secret keys such that the information cannot be recovered before all of the encrypted packets have been received. The information is intercepted only when the relays obtain all of these encrypted packets. It is proved that the intercept probability is reduced to zero exponentially with the number of the original packets. However, the security performance is still not satisfactory for a large number of relays. Therefore, the combination of destination-based jamming is further adopted to confuse the relays, which makes the security performance acceptable even for a large number of relays. Finally, the simulation results are provided to confirm the theoretical analysis and the superiority of the proposed scheme.