Djibril Sarr

2papers

2 Papers

DBSep 16, 2024
Towards Explainable Automated Data Quality Enhancement without Domain Knowledge

Djibril Sarr

In the era of big data, ensuring the quality of datasets has become increasingly crucial across various domains. We propose a comprehensive framework designed to automatically assess and rectify data quality issues in any given dataset, regardless of its specific content, focusing on both textual and numerical data. Our primary objective is to address three fundamental types of defects: absence, redundancy, and incoherence. At the heart of our approach lies a rigorous demand for both explainability and interpretability, ensuring that the rationale behind the identification and correction of data anomalies is transparent and understandable. To achieve this, we adopt a hybrid approach that integrates statistical methods with machine learning algorithms. Indeed, by leveraging statistical techniques alongside machine learning, we strike a balance between accuracy and explainability, enabling users to trust and comprehend the assessment process. Acknowledging the challenges associated with automating the data quality assessment process, particularly in terms of time efficiency and accuracy, we adopt a pragmatic strategy, employing resource-intensive algorithms only when necessary, while favoring simpler, more efficient solutions whenever possible. Through a practical analysis conducted on a publicly provided dataset, we illustrate the challenges that arise when trying to enhance data quality while keeping explainability. We demonstrate the effectiveness of our approach in detecting and rectifying missing values, duplicates and typographical errors as well as the challenges remaining to be addressed to achieve similar accuracy on statistical outliers and logic errors under the constraints set in our work.

STOct 28, 2021
Deep Calibration of Interest Rates Model

Mohamed Ben Alaya, Ahmed Kebaier, Djibril Sarr

For any financial institution, it is essential to understand the behavior of interest rates. Despite the growing use of Deep Learning, for many reasons (expertise, ease of use, etc.), classic rate models such as CIR and the Gaussian family are still widely used. In this paper, we propose to calibrate the five parameters of the G2++ model using Neural Networks. Our first model is a Fully Connected Neural Network and is trained on covariances and correlations of Zero-Coupon and Forward rates. We show that covariances are more suited to the problem than correlations due to the effects of the unfeasible backpropagation phenomenon, which we analyze in this paper. The second model is a Convolutional Neural Network trained on Zero-Coupon rates with no further transformation. Our numerical tests show that our calibration based on deep learning outperforms the classic calibration method used as a benchmark. Additionally, our Deep Calibration approach is designed to be systematic. To illustrate this feature, we applied it to calibrate the popular CIR intensity model.