A Federated Learning Benchmark for Drug-Target Interaction
This work addresses data privacy issues in the pharmaceutical industry for drug discovery, though it is incremental as it applies existing federated learning to a specific domain.
The paper tackled the challenge of aggregating pharmaceutical data for drug-target interaction (DTI) by applying federated learning to avoid sharing sensitive information, achieving up to 15% improved performance compared to non-privacy preserving alternatives on the KIBA dataset. It also found that non-IID data distribution in DTI does not harm FL performance and identified a trade-off between adding new data and client costs.
Aggregating pharmaceutical data in the drug-target interaction (DTI) domain has the potential to deliver life-saving breakthroughs. It is, however, notoriously difficult due to regulatory constraints and commercial interests. This work proposes the application of federated learning, which we argue to be reconcilable with the industry's constraints, as it does not require sharing of any information that would reveal the entities' data or any other high-level summary of it. When used on a representative GraphDTA model and the KIBA dataset it achieves up to 15% improved performance relative to the best available non-privacy preserving alternative. Our extensive battery of experiments shows that, unlike in other domains, the non-IID data distribution in the DTI datasets does not deteriorate FL performance. Additionally, we identify a material trade-off between the benefits of adding new data, and the cost of adding more clients.