Alex Cheng

AI
3papers
24citations
Novelty53%
AI Score43

3 Papers

CLJun 23, 2024Code
Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages

Jimin Sohn, Haeji Jung, Alex Cheng et al.

Existing zero-shot cross-lingual NER approaches require substantial prior knowledge of the target language, which is impractical for low-resource languages. In this paper, we propose a novel approach to NER using phonemic representation based on the International Phonetic Alphabet (IPA) to bridge the gap between representations of different languages. Our experiments show that our method significantly outperforms baseline models in extremely low-resource languages, with the highest average F1 score (46.38%) and lowest standard deviation (12.67), particularly demonstrating its robustness with non-Latin scripts. Our codes are available at https://github.com/Gabriel819/zeroshot_ner.git

75.4LGApr 24
Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

Haoze He, Xingyuan Ding, Xuan Jiang et al.

Despite MoE models leading many benchmarks, supervised fine-tuning (SFT) for the MoE architectures remains difficult because its router layers are fragile. Methods such as DenseMixer and ESFT mitigate router collapse with dense mixing or auxiliary load-balancing losses, but these introduce noisy gradients that often degrade performance. In preliminary experiments, we systematically pruned experts and observed that while certain super experts are activated far more frequently, discarding less used experts still leads to notable performance degradation. This suggests that even rarely activated experts encode non-trivial knowledge useful for downstream tasks. Motivated by this, we propose an auxiliary-loss-free MoE SFT framework that combines bias-driven sparsification with always-active gated condenser experts. Rather than enforcing balanced activation across all experts, our method encourages task-relevant experts to remain active while pushing long-tailed experts toward inactivity. The condenser experts provide a persistent, learnable pathway that alleviates gradient starvation and facilitates consolidation of information that would otherwise remain fragmented across sparsely activated experts. Analysis further suggest that this design better preserves long-tailed expert information under sparse routing. Experiments on large-scale MoE models demonstrate that our approach outperforms state-of-the-art SFT baselines such as DenseMixer and ESFT, achieving average gain of 2.5%+ on both mathematical reasoning and commonsenseQA benchmarks.

AIMay 6, 2018
Automated Diagnosis of Clinic Workflows

Alex Cheng, Jules White

Outpatient clinics often run behind schedule due to patients who arrive late or appointments that run longer than expected. We sought to develop a generalizable method that would allow healthcare providers to diagnose problems in workflow that disrupt the schedule on any given provider clinic day. We use a constraint optimization problem to identify the least number of appointment modifications that make the rest of the schedule run on-time. We apply this method to an outpatient clinic at Vanderbilt. For patient seen in this clinic between March 27, 2017 and April 21, 2017, long cycle times tended to affect the overall schedule more than late patients. Results from this workflow diagnosis method could be used to inform interventions to help clinics run smoothly, thus decreasing patient wait times and increasing provider utilization.