Forest Representation Learning Guided by Margin Distribution
This work provides a novel theoretical perspective on cascaded deep forests, potentially enhancing representation learning in machine learning domains, though it appears incremental as it builds on existing forest methods.
The paper tackles the problem of improving generalization bounds in forest representation learning by reformulating it as an additive model that boosts augmented features, achieving a tighter upper bound from O(√(ln m/m)) to O(ln m/m) under certain conditions, and proposes a margin distribution reweighting approach (mdDF) to optimize this bound, with experiments confirming its effectiveness.
In this paper, we reformulate the forest representation learning approach as an additive model which boosts the augmented feature instead of the prediction. We substantially improve the upper bound of generalization gap from $\mathcal{O}(\sqrt\frac{\ln m}{m})$ to $\mathcal{O}(\frac{\ln m}{m})$, while $λ$ - the margin ratio between the margin standard deviation and the margin mean is small enough. This tighter upper bound inspires us to optimize the margin distribution ratio $λ$. Therefore, we design the margin distribution reweighting approach (mdDF) to achieve small ratio $λ$ by boosting the augmented feature. Experiments and visualizations confirm the effectiveness of the approach in terms of performance and representation learning ability. This study offers a novel understanding of the cascaded deep forest from the margin-theory perspective and further uses the mdDF approach to guide the layer-by-layer forest representation learning.