Information Discrepancy in Strategic Learning
This addresses fairness and efficiency issues in strategic learning for applications like finance and admissions, though it is incremental by extending existing literature with a focus on information asymmetry.
The paper tackles the problem of non-transparency in decision rules, such as in loan approvals, where individuals lack full knowledge and infer rules from peers, leading to potential negative externalities like quality deterioration in some groups. It shows that optimal improvement can be guaranteed for all sub-populations in many cases, with theoretical analysis and experiments on real-world datasets.
We initiate the study of the effects of non-transparency in decision rules on individuals' ability to improve in strategic learning settings. Inspired by real-life settings, such as loan approvals and college admissions, we remove the assumption typically made in the strategic learning literature, that the decision rule is fully known to individuals, and focus instead on settings where it is inaccessible. In their lack of knowledge, individuals try to infer this rule by learning from their peers (e.g., friends and acquaintances who previously applied for a loan), naturally forming groups in the population, each with possibly different type and level of information regarding the decision rule. We show that, in equilibrium, the principal's decision rule optimizing welfare across sub-populations may cause a strong negative externality: the true quality of some of the groups can actually deteriorate. On the positive side, we show that, in many natural cases, optimal improvement can be guaranteed simultaneously for all sub-populations. We further introduce a measure we term information overlap proxy, and demonstrate its usefulness in characterizing the disparity in improvements across sub-populations. Finally, we identify a natural condition under which improvement can be guaranteed for all sub-populations while maintaining high predictive accuracy. We complement our theoretical analysis with experiments on real-world datasets.