CVJun 14, 2024Code
First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language ModelsEnming Zhang, Ruobing Yao, Huanyong Liu et al.
With the development of Multimodal Large Language Models (MLLMs) technology, its general capabilities are increasingly powerful. To evaluate the various abilities of MLLMs, numerous evaluation systems have emerged. But now there is still a lack of a comprehensive method to evaluate MLLMs in the tasks related to flowcharts, which are very important in daily life and work. We propose the first comprehensive method, FlowCE, to assess MLLMs across various dimensions for tasks related to flowcharts. It encompasses evaluating MLLMs' abilities in Reasoning, Localization Recognition, Information Extraction, Logical Verification, and Summarization on flowcharts. However, we find that even the GPT4o model achieves only a score of 56.63. Among open-source models, Phi-3-Vision obtained the highest score of 49.97. We hope that FlowCE can contribute to future research on MLLMs for tasks based on flowcharts. \url{https://github.com/360AILABNLP/FlowCE}
CLNov 29, 2020Code
A Boundary Regression Model for Nested Named Entity RecognitionYanping Chen, Lefei Wu, Qinghua Zheng et al.
Recognizing named entities (NEs) is commonly conducted as a classification problem that predicts a class tag for a word or a NE candidate in a sentence. In shallow structures, categorized features are weighted to support the prediction. Recent developments in neural networks have adopted deep structures that map categorized features into continuous representations. This approach unfolds a dense space saturated with high-order abstract semantic information, where the prediction is based on distributed feature representations. In this paper, positions of NEs in a sentence are represented as continuous values. Then, a regression operation is introduced to regress boundaries of NEs in a sentence. Based on boundary regression, we design a boundary regression model to support nested NE recognition. It is a multiobjective learning framework, which simultaneously predicts the classification score of a NE candidate and refine its spatial location in a sentence. It has the advantage to resolve nested NEs and support boundary regression for locating NEs in a sntence. By sharing parameters for predicting and locating, this model enables more potent nonlinear function approximators to enhance model discriminability. Experiments demonstrate state-of-the-art performance for nested NE recognition\footnote{Our codes to implement the BR model are available at: \url{https://github.com/wuyuefei3/BR}.}.
CLSep 18, 2024
Enhancing Complex Formula Recognition with Hierarchical Detail-Focused NetworkJiale Wang, Junhui Yu, Huanyong Liu et al.
Hierarchical and complex Mathematical Expression Recognition (MER) is challenging due to multiple possible interpretations of a formula, complicating both parsing and evaluation. In this paper, we introduce the Hierarchical Detail-Focused Recognition dataset (HDR), the first dataset specifically designed to address these issues. It consists of a large-scale training set, HDR-100M, offering an unprecedented scale and diversity with one hundred million training instances. And the test set, HDR-Test, includes multiple interpretations of complex hierarchical formulas for comprehensive model performance evaluation. Additionally, the parsing of complex formulas often suffers from errors in fine-grained details. To address this, we propose the Hierarchical Detail-Focused Recognition Network (HDNet), an innovative framework that incorporates a hierarchical sub-formula module, focusing on the precise handling of formula details, thereby significantly enhancing MER performance. Experimental results demonstrate that HDNet outperforms existing MER models across various datasets.