Alex Kyllo

LG
3papers
10citations
Novelty40%
AI Score37

3 Papers

35.3SEMay 30
GitHub Copilot and Developer Productivity: An Observational Dose-Response Analysis

Alex Heilman, Alex Kyllo, Emerson Murphy-Hill

Does GitHub Copilot (GHCP) make engineers more productive, or do the engineers who use it more differ from those who use it less? And even within a single engineer, are GHCP-heavy weeks just busy weeks in which more of everything gets done? We study these questions using 43 weeks of data from 16,223 software engineers across Microsoft's Cloud+AI organization. Engineer fixed effects address the first concern by comparing each engineer against themselves rather than against other engineers, eliminating time-invariant differences in skill, role, and team. Active coding time and browser time then enter a Poisson Pseudo-Maximum Likelihood model with two-way fixed effects to address the harder, within-engineer confound: that GHCP-heavy weeks coincide with high-effort weeks. This defines our estimand as an efficiency effect: more pull requests completed at equivalent levels of coding time. Engineers are estimated to complete 40.5% more PRs in their highest GHCP usage weeks relative to their zero-usage weeks, holding measured development effort constant. The gradient is monotonic with diminishing returns at high intensity. Seven robustness and falsification tests target the remaining plausible alternative explanations (non-coding AI engagement, team-level shocks, within-week task reallocation, cross-week contamination, PR slicing into smaller units, shifts toward easier task types, and sensitivity to how the treatment is operationalized). Under an explicitly stated conditional-independence assumption, the within-engineer design estimates a tool-specific efficiency effect that is consistent with all seven robustness tests.

LGJan 19, 2022
Privacy-Aware Human Mobility Prediction via Adversarial Networks

Yuting Zhan, Alex Kyllo, Afra Mashhadi et al.

As various mobile devices and location-based services are increasingly developed in different smart city scenarios and applications, many unexpected privacy leakages have arisen due to geolocated data collection and sharing. While these geolocated data could provide a rich understanding of human mobility patterns and address various societal research questions, privacy concerns for users' sensitive information have limited their utilization. In this paper, we design and implement a novel LSTM-based adversarial mechanism with representation learning to attain a privacy-preserving feature representation of the original geolocated data (mobility data) for a sharing purpose. We quantify the utility-privacy trade-off of mobility datasets in terms of trajectory reconstruction risk, user re-identification risk, and mobility predictability. Our proposed architecture reports a Pareto Frontier analysis that enables the user to assess this trade-off as a function of Lagrangian loss weight parameters. The extensive comparison results on four representative mobility datasets demonstrate the superiority of our proposed architecture and the efficiency of the proposed privacy-preserving features extractor. Our results show that by exploring Pareto optimal setting, we can simultaneously increase both privacy (45%) and utility (32%).

LGJan 17, 2022
Fairness in Federated Learning for Spatial-Temporal Applications

Afra Mashhadi, Alex Kyllo, Reza M. Parizi

Federated learning involves training statistical models over remote devices such as mobile phones while keeping data localized. Training in heterogeneous and potentially massive networks introduces opportunities for privacy-preserving data analysis and diversifying these models to become more inclusive of the population. Federated learning can be viewed as a unique opportunity to bring fairness and parity to many existing models by enabling model training to happen on a diverse set of participants and on data that is generated regularly and dynamically. In this paper, we discuss the current metrics and approaches that are available to measure and evaluate fairness in the context of spatial-temporal models. We propose how these metrics and approaches can be re-defined to address the challenges that are faced in the federated learning setting.