Xin-Zheng Lu

CL
h-index8
9papers
162citations
Novelty36%
AI Score35

9 Papers

CLMar 9, 2022
Pretrained Domain-Specific Language Model for General Information Retrieval Tasks in the AEC Domain

Zhe Zheng, Xin-Zheng Lu, Ke-Yin Chen et al.

As an essential task for the architecture, engineering, and construction (AEC) industry, information retrieval (IR) from unstructured textual data based on natural language processing (NLP) is gaining increasing attention. Although various deep learning (DL) models for IR tasks have been investigated in the AEC domain, it is still unclear how domain corpora and domain-specific pretrained DL models can improve performance in various IR tasks. To this end, this work systematically explores the impacts of domain corpora and various transfer learning techniques on the performance of DL models for IR tasks and proposes a pretrained domain-specific language model for the AEC domain. First, both in-domain and close-domain corpora are developed. Then, two types of pretrained models, including traditional wording embedding models and BERT-based models, are pretrained based on various domain corpora and transfer learning strategies. Finally, several widely used DL models for IR tasks are further trained and tested based on various configurations and pretrained models. The result shows that domain corpora have opposite effects on traditional word embedding models for text classification and named entity recognition tasks but can further improve the performance of BERT-based models in all tasks. Meanwhile, BERT-based models dramatically outperform traditional methods in all IR tasks, with maximum improvements of 5.4% and 10.1% in the F1 score, respectively. This research contributes to the body of knowledge in two ways: 1) demonstrating the advantages of domain corpora and pretrained DL models and 2) opening the first domain-specific dataset and pretrained language model for the AEC domain, to the best of our knowledge. Thus, this work sheds light on the adoption and application of pretrained models in the AEC domain.

CLSep 24, 2023
A Text Classification-Based Approach for Evaluating and Enhancing the Machine Interpretability of Building Codes

Zhe Zheng, Yu-Cheng Zhou, Ke-Yin Chen et al.

Interpreting regulatory documents or building codes into computer-processable formats is essential for the intelligent design and construction of buildings and infrastructures. Although automated rule interpretation (ARI) methods have been investigated for years, most of them highly depend on the early and manual filtering of interpretable clauses from a building code. While few of them considered machine interpretability, which represents the potential to be transformed into a computer-processable format, from both clause- and document-level. Therefore, this research aims to propose a novel approach to automatically evaluate and enhance the machine interpretability of single clause and building codes. First, a few categories are introduced to classify each clause in a building code considering the requirements for rule interpretation, and a dataset is developed for model training. Then, an efficient text classification model is developed based on a pretrained domain-specific language model and transfer learning techniques. Finally, a quantitative evaluation method is proposed to assess the overall interpretability of building codes. Experiments show that the proposed text classification algorithm outperforms the existing CNN- or RNN-based methods, improving the F1-score from 72.16% to 93.60%. It is also illustrated that the proposed classification method can enhance downstream ARI methods with an improvement of 4%. Furthermore, analyzing the results of more than 150 building codes in China showed that their average interpretability is 34.40%, which implies that it is still hard to fully transform the entire regulatory document into computer-processable formats. It is also argued that the interpretability of building codes should be further improved both from the human side and the machine side.

AIAug 17, 2023
Translating Regulatory Clauses into Executable Codes for Building Design Checking via Large Language Model Driven Function Matching and Composing

Zhe Zheng, Jin Han, Ke-Yin Chen et al.

Translating clauses into executable code is a vital stage of automated rule checking (ARC) and is essential for effective building design compliance checking, particularly for rules with implicit properties or complex logic requiring domain knowledge. Thus, by systematically analyzing building clauses, 66 atomic functions are defined first to encapsulate common computational logics. Then, LLM-FuncMapper is proposed, a large language model (LLM)-based approach with rule-based adaptive prompts that match clauses to atomic functions. Finally, executable code is generated by composing functions through the LLMs. Experiments show LLM-FuncMapper outperforms fine-tuning methods by 19% in function matching while significantly reducing manual annotation efforts. Case study demonstrates that LLM-FuncMapper can automatically compose multiple atomic functions to generate executable code, boosting rule-checking efficiency. To our knowledge, this research represents the first application of LLMs for interpreting complex design clauses into executable code, which may shed light on further adoption of LLMs in the construction domain.

IRDec 12, 2022
Text Mining-Based Patent Analysis for Automated Rule Checking in AEC

Zhe Zheng, Bo-Rui Kang, Qi-Tian Yuan et al.

Automated rule checking (ARC), which is expected to promote the efficiency of the compliance checking process in the architecture, engineering, and construction (AEC) industry, is gaining increasing attention. Throwing light on the ARC application hotspots and forecasting its trends are useful to the related research and drive innovations. Therefore, this study takes the patents from the database of the Derwent Innovations Index database (DII) and China national knowledge infrastructure (CNKI) as data sources and then carried out a three-step analysis including (1) quantitative characteristics (i.e., annual distribution analysis) of patents, (2) identification of ARC topics using a latent Dirichlet allocation (LDA) and, (3) SNA-based co-occurrence analysis of ARC topics. The results show that the research hotspots and trends of Chinese and English patents are different. The contributions of this study have three aspects: (1) an approach to a comprehensive analysis of patents by integrating multiple text mining methods (i.e., SNA and LDA) is introduced ; (2) the application hotspots and development trends of ARC are reviewed based on patent analysis; and (3) a signpost for technological development and innovation of ARC is provided.

CLDec 12, 2022
Earthquake Impact Analysis Based on Text Mining and Social Media Analytics

Zhe Zheng, Hong-Zheng Shi, Yu-Cheng Zhou et al.

Earthquakes have a deep impact on wide areas, and emergency rescue operations may benefit from social media information about the scope and extent of the disaster. Therefore, this work presents a text miningbased approach to collect and analyze social media data for early earthquake impact analysis. First, disasterrelated microblogs are collected from the Sina microblog based on crawler technology. Then, after data cleaning a series of analyses are conducted including (1) the hot words analysis, (2) the trend of the number of microblogs, (3) the trend of public opinion sentiment, and (4) a keyword and rule-based text classification for earthquake impact analysis. Finally, two recent earthquakes with the same magnitude and focal depth in China are analyzed to compare their impacts. The results show that the public opinion trend analysis and the trend of public opinion sentiment can estimate the earthquake's social impact at an early stage, which will be helpful to decision-making and rescue management.

LGSep 14, 2025
BIGNet: Pretrained Graph Neural Network for Embedding Semantic, Spatial, and Topological Data in BIM Models

Jin Han, Xin-Zheng Lu, Jia-Rui Lin

Large Foundation Models (LFMs) have demonstrated significant advantages in civil engineering, but they primarily focus on textual and visual data, overlooking the rich semantic, spatial, and topological features in BIM (Building Information Modelling) models. Therefore, this study develops the first large-scale graph neural network (GNN), BIGNet, to learn, and reuse multidimensional design features embedded in BIM models. Firstly, a scalable graph representation is introduced to encode the "semantic-spatial-topological" features of BIM components, and a dataset with nearly 1 million nodes and 3.5 million edges is created. Subsequently, BIGNet is proposed by introducing a new message-passing mechanism to GraphMAE2 and further pretrained with a node masking strategy. Finally, BIGNet is evaluated in various transfer learning tasks for BIM-based design checking. Results show that: 1) homogeneous graph representation outperforms heterogeneous graph in learning design features, 2) considering local spatial relationships in a 30 cm radius enhances performance, and 3) BIGNet with GAT (Graph Attention Network)-based feature extraction achieves the best transfer learning results. This innovation leads to a 72.7% improvement in Average F1-score over non-pretrained models, demonstrating its effectiveness in learning and transferring BIM design features and facilitating their automated application in future design and lifecycle management.

LGOct 22, 2025
Learning and Simulating Building Evacuation Patterns for Enhanced Safety Design Using Generative Models

Jin Han, Zhe Zheng, Yi Gu et al.

Evacuation simulation is essential for building safety design, ensuring properly planned evacuation routes. However, traditional evacuation simulation relies heavily on refined modeling with extensive parameters, making it challenging to adopt such methods in a rapid iteration process in early design stages. Thus, this study proposes DiffEvac, a novel method to learn building evacuation patterns based on Generative Models (GMs), for efficient evacuation simulation and enhanced safety design. Initially, a dataset of 399 diverse functional layouts and corresponding evacuation heatmaps of buildings was established. Then, a decoupled feature representation is proposed to embed physical features like layouts and occupant density for GMs. Finally, a diffusion model based on image prompts is proposed to learn evacuation patterns from simulated evacuation heatmaps. Compared to existing research using Conditional GANs with RGB representation, DiffEvac achieves up to a 37.6% improvement in SSIM, 142% in PSNR, and delivers results 16 times faster, thereby cutting simulation time to 2 minutes. Case studies further demonstrate that the proposed method not only significantly enhances the rapid design iteration and adjustment process with efficient evacuation simulation but also offers new insights and technical pathways for future safety optimization in intelligent building design. The research implication is that the approach lowers the modeling burden, enables large-scale what-if exploration, and facilitates coupling with multi-objective design tools.

CEMay 8, 2025
Unified Network-Based Representation of BIM Models for Embedding Semantic, Spatial, and Topological Data

Jin Han, Xin-Zheng Lu, Jia-Rui Lin

Building Information Modeling (BIM) has revolutionized the construction industry by providing a comprehensive digital representation of building structures throughout their lifecycle. However, existing research lacks effective methods for capturing the complex spatial and topological relationships between components in BIM models, which are essential for understanding design patterns and enhancing decision-making. This study proposes a unified network-based representation method that integrates the "semantic-spatial-topological" multi-dimensional design features of BIM models. By extending the IFC (Industry Foundation Classes) standard, we introduce local spatial relationships and topological connections between components to enrich the network structure. This representation method enables a more detailed understanding of component interactions, dependencies, and implicit design patterns, effectively capturing the semantic, topological, and spatial relationships in BIM, and holds significant potential for the representation and learning of design patterns.

CLMay 6, 2024
QuakeBERT: Accurate Classification of Social Media Texts for Rapid Earthquake Impact Assessment

Jin Han, Zhe Zheng, Xin-Zheng Lu et al.

Social media aids disaster response but suffers from noise, hindering accurate impact assessment and decision making for resilient cities, which few studies considered. To address the problem, this study proposes the first domain-specific LLM model and an integrated method for rapid earthquake impact assessment. First, a few categories are introduced to classify and filter microblogs considering their relationship to the physical and social impacts of earthquakes, and a dataset comprising 7282 earthquake-related microblogs from twenty earthquakes in different locations is developed as well. Then, with a systematic analysis of various influential factors, QuakeBERT, a domain-specific large language model (LLM), is developed and fine-tuned for accurate classification and filtering of microblogs. Meanwhile, an integrated method integrating public opinion trend analysis, sentiment analysis, and keyword-based physical impact quantification is introduced to assess both the physical and social impacts of earthquakes based on social media texts. Experiments show that data diversity and data volume dominate the performance of QuakeBERT and increase the macro average F1 score by 27%, while the best classification model QuakeBERT outperforms the CNN- or RNN-based models by improving the macro average F1 score from 60.87% to 84.33%. Finally, the proposed approach is applied to assess two earthquakes with the same magnitude and focal depth. Results show that the proposed approach can effectively enhance the impact assessment process by accurate detection of noisy microblogs, which enables effective post-disaster emergency responses to create more resilient cities.