George Almpanidis

LG
3papers
108citations
Novelty10%
AI Score21

3 Papers

LGAug 1, 2024
A Systematic Review on Long-Tailed Learning

Chongsheng Zhang, George Almpanidis, Gaojuan Fan et al.

Long-tailed data is a special type of multi-class imbalanced data with a very large amount of minority/tail classes that have a very significant combined influence. Long-tailed learning aims to build high-performance models on datasets with long-tailed distributions, which can identify all the classes with high accuracy, in particular the minority/tail classes. It is a cutting-edge research direction that has attracted a remarkable amount of research effort in the past few years. In this paper, we present a comprehensive survey of latest advances in long-tailed visual learning. We first propose a new taxonomy for long-tailed learning, which consists of eight different dimensions, including data balancing, neural architecture, feature enrichment, logits adjustment, loss function, bells and whistles, network optimization, and post hoc processing techniques. Based on our proposed taxonomy, we present a systematic review of long-tailed learning methods, discussing their commonalities and alignable differences. We also analyze the differences between imbalance learning and long-tailed learning approaches. Finally, we discuss prospects and future directions in this field.

CVMar 25, 2019Code
ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views

Chongsheng Zhang, Guowen Peng, Yuefeng Tao et al.

In this paper, we introduce the ShopSign dataset, which is a newly developed natural scene text dataset of Chinese shop signs in street views. Although a few scene text datasets are already publicly available (e.g. ICDAR2015, COCO-Text), there are few images in these datasets that contain Chinese texts/characters. Hence, we collect and annotate the ShopSign dataset to advance research in Chinese scene text detection and recognition. The new dataset has three distinctive characteristics: (1) large-scale: it contains 25,362 Chinese shop sign images, with a total number of 196,010 text-lines. (2) diversity: the images in ShopSign were captured in different scenes, from downtown to developing regions, using more than 50 different mobile phones. (3) difficulty: the dataset is very sparse and imbalanced. It also includes five categories of hard images (mirror, wooden, deformed, exposed and obscure). To illustrate the challenges in ShopSign, we run baseline experiments using state-of-the-art scene text detection methods (including CTPN, TextBoxes++ and EAST), and cross-dataset validation to compare their corresponding performance on the related datasets such as CTW, RCTW and ICPR 2018 MTWI challenge dataset. The sample images and detailed descriptions of our ShopSign dataset are publicly available at: https://github.com/chongshengzhang/shopsign.

LGSep 1, 2021
An Empirical Study on the Joint Impact of Feature Selection and Data Re-sampling on Imbalance Classification

Chongsheng Zhang, Paolo Soda, Jingjun Bi et al.

In predictive tasks, real-world datasets often present different degrees of imbalanced (i.e., long-tailed or skewed) distributions. While the majority (the head) classes have sufficient samples, the minority (the tail) classes can be under-represented by a rather limited number of samples. Data pre-processing has been shown to be very effective in dealing with such problems. On one hand, data re-sampling is a common approach to tackling class imbalance. On the other hand, dimension reduction, which reduces the feature space, is a conventional technique for reducing noise and inconsistencies in a dataset. However, the possible synergy between feature selection and data re-sampling for high-performance imbalance classification has rarely been investigated before. To address this issue, we carry out a comprehensive empirical study on the joint influence of feature selection and re-sampling on two-class imbalance classification. Specifically, we study the performance of two opposite pipelines for imbalance classification by applying feature selection before or after data re-sampling. We conduct a large number of experiments, with a total of 9225 tests, on 52 publicly available datasets, using 9 feature selection methods, 6 re-sampling approaches for class imbalance learning, and 3 well-known classification algorithms. Experimental results show that there is no constant winner between the two pipelines; thus both of them should be considered to derive the best performing model for imbalance classification. We find that the performance of an imbalance classification model not only depends on the classifier adopted and the ratio between the number of majority and minority samples, but also depends on the ratio between the number of samples and features. Overall, this study should provide new reference value for researchers and practitioners in imbalance learning.