LGCYSep 9, 2024

FairHome: A Fair Housing and Fair Lending Dataset

arXiv:2409.05990v11 citationsh-index: 17
Originality Synthesis-oriented
AI Analysis

This addresses the need for fair housing and lending compliance tools, but it is incremental as it primarily introduces a new dataset.

The authors tackled the problem of detecting compliance risks in housing and lending by creating the first publicly available dataset (FairHome) with 75,000 examples across 9 protected categories, and they demonstrated its effectiveness by training a classifier that achieved an F1-score of 0.91, outperforming state-of-the-art LLMs.

We present a Fair Housing and Fair Lending dataset (FairHome): A dataset with around 75,000 examples across 9 protected categories. To the best of our knowledge, FairHome is the first publicly available dataset labeled with binary labels for compliance risk in the housing domain. We demonstrate the usefulness and effectiveness of such a dataset by training a classifier and using it to detect potential violations when using a large language model (LLM) in the context of real-estate transactions. We benchmark the trained classifier against state-of-the-art LLMs including GPT-3.5, GPT-4, LLaMA-3, and Mistral Large in both zero-shot and few-shot contexts. Our classifier outperformed with an F1-score of 0.91, underscoring the effectiveness of our dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes