Zhenghao Hu

19.0CVMay 29, 2025

OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data

Fengxiang Wang, Mingshuo Chen, Xuming He et al.

Existing benchmarks for multimodal learning in Earth science offer limited, siloed coverage of Earth's spheres and their cross-sphere interactions, typically restricting evaluation to the human-activity sphere of atmosphere and to at most 16 tasks. These limitations: \textit{narrow-source heterogeneity (single/few data sources), constrained scientific granularity, and limited-sphere extensibility}. Therefore, we introduce \textbf{OmniEarth-Bench}, the first multimodal benchmark that systematically spans all six spheres: atmosphere, lithosphere, oceanosphere, cryosphere, biosphere, and human-activity sphere, and cross-spheres. Built with a scalable, modular-topology data inference framework and native multi-observation sources and expert-in-the-loop curation, OmniEarth-Bench produces 29,855 standardized, expert-curated annotations. All annotations are organized into a four-level hierarchy (Sphere, Scenario, Ability, Task), encompassing 109 expert-curated evaluation tasks. Experiments on 9 state-of-the-art MLLMs reveal that even the most advanced models struggle with our benchmarks, where none of them reach 35\% accuracy, revealing systematic gaps in Earth-system cognitive ability. The dataset and evaluation code were released at OmniEarth-Bench (https://anonymous.4open.science/r/OmniEarth-Bench-B1BD).

5.8CRAug 2, 2018

Chaff Bugs: Deterring Attackers by Making Software Buggier

Zhenghao Hu, Yu Hu, Brendan Dolan-Gavitt

Sophisticated attackers find bugs in software, evaluate their exploitability, and then create and launch exploits for bugs found to be exploitable. Most efforts to secure software attempt either to eliminate bugs or to add mitigations that make exploitation more difficult. In this paper, we introduce a new defensive technique called chaff bugs, which instead target the bug discovery and exploit creation stages of this process. Rather than eliminating bugs, we instead add large numbers of bugs that are provably (but not obviously) non-exploitable. Attackers who attempt to find and exploit bugs in software will, with high probability, find an intentionally placed non-exploitable bug and waste precious resources in trying to build a working exploit. We develop two strategies for ensuring non-exploitability and use them to automatically add thousands of non-exploitable bugs to real-world software such as nginx and libFLAC; we show that the functionality of the software is not harmed and demonstrate that our bugs look exploitable to current triage tools. We believe that chaff bugs can serve as an effective deterrent against both human attackers and automated Cyber Reasoning Systems (CRSes).

Zhenghao Hu

2 Papers