Improving Transformer Based Line Segment Detection with Matched Predicting and Re-ranking
This addresses efficiency and ranking issues in line segment detection for computer vision applications, but appears incremental as it builds on existing Transformer-based approaches.
The paper tackles the problem of Transformer-based line segment detection where accurately detected segments receive low confidence scores and training is inefficient due to bipartite matching. The result is RANK-LETR, which improves prediction accuracy over other methods while requiring fewer training epochs.
Classical Transformer-based line segment detection methods have delivered impressive results. However, we observe that some accurately detected line segments are assigned low confidence scores during prediction, causing them to be ranked lower and potentially suppressed. Additionally, these models often require prolonged training periods to achieve strong performance, largely due to the necessity of bipartite matching. In this paper, we introduce RANK-LETR, a novel Transformer-based line segment detection method. Our approach leverages learnable geometric information to refine the ranking of predicted line segments by enhancing the confidence scores of high-quality predictions in a posterior verification step. We also propose a new line segment proposal method, wherein the feature point nearest to the centroid of the line segment directly predicts the location, significantly improving training efficiency and stability. Moreover, we introduce a line segment ranking loss to stabilize rankings during training, thereby enhancing the generalization capability of the model. Experimental results demonstrate that our method outperforms other Transformer-based and CNN-based approaches in prediction accuracy while requiring fewer training epochs than previous Transformer-based models.