Cheng-Chun Lee

h-index117

6papers

4,636citations

Novelty57%

AI Score41

Ranked #91,640 of 205,806 authors (top 45%)#16,944 in CL (top 52%)

6 Papers

CVAug 3, 2022

Large-scale Building Damage Assessment using a Novel Hierarchical Transformer Architecture on Satellite Images

Navjot Kaur, Cheng-Chun Lee, Ali Mostafavi et al.

This paper presents \dahitra, a novel deep-learning model with hierarchical transformers to classify building damages based on satellite images in the aftermath of natural disasters. Satellite imagery provides real-time and high-coverage information and offers opportunities to inform large-scale post-disaster building damage assessment, which is critical for rapid emergency response. In this work, a novel transformer-based network is proposed for assessing building damage. This network leverages hierarchical spatial features of multiple resolutions and captures the temporal differences in the feature domain after applying a transformer encoder on the spatial features. The proposed network achieves state-of-the-art performance when tested on a large-scale disaster damage dataset (xBD) for building localization and damage classification, as well as on LEVIR-CD dataset for change detection tasks. In addition, this work introduces a new high-resolution satellite imagery dataset, Ida-BD (related to 2021 Hurricane Ida in Louisiana in 2021) for domain adaptation. Further, it demonstrates an approach of using this dataset by adapting the model with limited fine-tuning and hence applying the model to newly damaged areas with scarce data.

CVJun 5, 2023

ELEV-VISION: Automated Lowest Floor Elevation Estimation from Segmenting Street View Images

Yu-Hsuan Ho, Cheng-Chun Lee, Nicholas D. Diaz et al.

We propose an automated lowest floor elevation (LFE) estimation algorithm based on computer vision techniques to leverage the latent information in street view images. Flood depth-damage models use a combination of LFE and flood depth for determining flood risk and extent of damage to properties. We used image segmentation for detecting door bottoms and roadside edges from Google Street View images. The characteristic of equirectangular projection with constant spacing representation of horizontal and vertical angles allows extraction of the pitch angle from the camera to the door bottom. The depth from the camera to the door bottom was obtained from the depthmap paired with the Google Street View image. LFEs were calculated from the pitch angle and the depth. The testbed for application of the proposed method is Meyerland (Harris County, Texas). The results show that the proposed method achieved mean absolute error of 0.190 m (1.18 %) in estimating LFE. The height difference between the street and the lowest floor (HDSL) was estimated to provide information for flood damage estimation. The proposed automatic LFE estimation algorithm using Street View images and image segmentation provides a rapid and cost-effective method for LFE estimation compared with the surveys using total station theodolite and unmanned aerial systems. By obtaining more accurate and up-to-date LFE data using the proposed method, city planners, emergency planners and insurance companies could make a more precise estimation of flood damage.

LGAug 11, 2023

MaxFloodCast: Ensemble Machine Learning Model for Predicting Peak Inundation Depth And Decoding Influencing Features

Cheng-Chun Lee, Lipai Huang, Federico Antolini et al.

Timely, accurate, and reliable information is essential for decision-makers, emergency managers, and infrastructure operators during flood events. This study demonstrates a proposed machine learning model, MaxFloodCast, trained on physics-based hydrodynamic simulations in Harris County, offers efficient and interpretable flood inundation depth predictions. Achieving an average R-squared of 0.949 and a Root Mean Square Error of 0.61 ft on unseen data, it proves reliable in forecasting peak flood inundation depths. Validated against Hurricane Harvey and Storm Imelda, MaxFloodCast shows the potential in supporting near-time floodplain management and emergency operations. The model's interpretability aids decision-makers in offering critical information to inform flood mitigation strategies, to prioritize areas with critical facilities and to examine how rainfall in other watersheds influences flood exposure in one area. The MaxFloodCast model enables accurate and interpretable inundation depth predictions while significantly reducing computational time, thereby supporting emergency response efforts and flood risk management more effectively.

LGOct 3, 2023

ML4EJ: Decoding the Role of Urban Features in Shaping Environmental Injustice Using Interpretable Machine Learning

Yu-Hsuan Ho, Zhewei Liu, Cheng-Chun Lee et al.

Understanding the key factors shaping environmental hazard exposures and their associated environmental injustice issues is vital for formulating equitable policy measures. Traditional perspectives on environmental injustice have primarily focused on the socioeconomic dimensions, often overlooking the influence of heterogeneous urban characteristics. This limited view may obstruct a comprehensive understanding of the complex nature of environmental justice and its relationship with urban design features. To address this gap, this study creates an interpretable machine learning model to examine the effects of various urban features and their non-linear interactions to the exposure disparities of three primary hazards: air pollution, urban heat, and flooding. The analysis trains and tests models with data from six metropolitan counties in the United States using Random Forest and XGBoost. The performance is used to measure the extent to which variations of urban features shape disparities in environmental hazard levels. In addition, the analysis of feature importance reveals features related to social-demographic characteristics as the most prominent urban features that shape hazard extent. Features related to infrastructure distribution and land cover are relatively important for urban heat and air pollution exposure respectively. Moreover, we evaluate the models' transferability across different regions and hazards. The results highlight limited transferability, underscoring the intricate differences among hazards and regions and the way in which urban features shape hazard exposures. The insights gleaned from this study offer fresh perspectives on the relationship among urban features and their interplay with environmental hazard exposure disparities, informing the development of more integrated urban design policies to enhance social equity and environmental injustice issues.

CLJul 7, 2025

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

CLDec 19, 2023

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud et al.

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.