C-DLinkNet: considering multi-level semantic features for human parsing
This work addresses human parsing, a fine-grained semantic segmentation task for identifying human parts, but it appears incremental as it builds on the LinkNet architecture with a new module.
The paper tackled the challenge of extracting effective semantic features for human parsing to address deformation and multi-scale variations, resulting in a model that achieved competitive performance with mIoU=53.05 on the LIP dataset validation set using smaller input sizes and no additional information.
Human parsing is an essential branch of semantic segmentation, which is a fine-grained semantic segmentation task to identify the constituent parts of human. The challenge of human parsing is to extract effective semantic features to resolve deformation and multi-scale variations. In this work, we proposed an end-to-end model called C-DLinkNet based on LinkNet, which contains a new module named Smooth Module to combine the multi-level features in Decoder part. C-DLinkNet is capable of producing competitive parsing performance compared with the state-of-the-art methods with smaller input sizes and no additional information, i.e., achiving mIoU=53.05 on the validation set of LIP dataset.