Chengzhi Huang

LGDec 13, 2021

Challenges and Solutions to Build a Data Pipeline to Identify Anomalies in Enterprise System Performance

Xiaobo Huang, Amitabha Banerjee, Chien-Chia Chen et al.

We discuss how VMware is solving the following challenges to harness data to operate our ML-based anomaly detection system to detect performance issues in our Software Defined Data Center (SDDC) enterprise deployments: (i) label scarcity and label bias due to heavy dependency on unscalable human annotators, and (ii) data drifts due to ever-changing workload patterns, software stack and underlying hardware. Our anomaly detection system has been deployed in production for many years and has successfully detected numerous major performance issues. We demonstrate that by addressing these data challenges, we not only improve the accuracy of our performance anomaly detection model by 30%, but also ensure that the model performance to never degrade over time.

CLMar 31, 2021

Zero-Shot Language Transfer vs Iterative Back Translation for Unsupervised Machine Translation

Aviral Joshi, Chengzhi Huang, Har Simrat Singh

This work focuses on comparing different solutions for machine translation on low resource language pairs, namely, with zero-shot transfer learning and unsupervised machine translation. We discuss how the data size affects the performance of both unsupervised MT and transfer learning. Additionally we also look at how the domain of the data affects the result of unsupervised MT. The code to all the experiments performed in this project are accessible on Github.

Chengzhi Huang

2 Papers