CVJan 1
Towards Automated Differential Diagnosis of Skin Diseases Using Deep Learning and Imbalance-Aware StrategiesAli Anaissi, Ali Braytee, Weidong Huang et al.
As dermatological conditions become increasingly common and the availability of dermatologists remains limited, there is a growing need for intelligent tools to support both patients and clinicians in the timely and accurate diagnosis of skin diseases. In this project, we developed a deep learning based model for the classification and diagnosis of skin conditions. By leveraging pretraining on publicly available skin disease image datasets, our model effectively extracted visual features and accurately classified various dermatological cases. Throughout the project, we refined the model architecture, optimized data preprocessing workflows, and applied targeted data augmentation techniques to improve overall performance. The final model, based on the Swin Transformer, achieved a prediction accuracy of 87.71 percent across eight skin lesion classes on the ISIC2019 dataset. These results demonstrate the model's potential as a diagnostic support tool for clinicians and a self assessment aid for patients.
QMJan 1
Benchmarking Preprocessing and Integration Methods in Single-Cell GenomicsAli Anaissi, Seid Miad Zandavi, Weidong Huang et al.
Single-cell data analysis has the potential to revolutionize personalized medicine by characterizing disease-associated molecular changes at the single-cell level. Advanced single-cell multimodal assays can now simultaneously measure various molecules (e.g., DNA, RNA, Protein) across hundreds of thousands of individual cells, providing a comprehensive molecular readout. A significant analytical challenge is integrating single-cell measurements across different modalities. Various methods have been developed to address this challenge, but there has been no systematic evaluation of these techniques with different preprocessing strategies. This study examines a general pipeline for single-cell data analysis, which includes normalization, data integration, and dimensionality reduction. The performance of different algorithm combinations often depends on the dataset sizes and characteristics. We evaluate six datasets across diverse modalities, tissues, and organisms using three metrics: Silhouette Coefficient Score, Adjusted Rand Index, and Calinski-Harabasz Index. Our experiments involve combinations of seven normalization methods, four dimensional reduction methods, and five integration methods. The results show that Seurat and Harmony excel in data integration, with Harmony being more time-efficient, especially for large datasets. UMAP is the most compatible dimensionality reduction method with the integration techniques, and the choice of normalization method varies depending on the integration method used.
66.0CYMar 24
Sibling Rivalry in the Ivory Tower: Mass Science, Expanding Scholarly Families, and the Reshaping of Academic StratificationLikun Cao, Jie Hua, James Evans
This paper investigates mechanisms underlying scientific stratification in the transition from elite to mass science. Existing scholarship has examined stratification through the Matthew effect framework, but this approach is increasingly limited as mass, team-based research becomes dominant. While scientists now share institutions and lineages, substantial career outcome differences remain unexplained. We propose integrating demographic concepts into science studies. Drawing parallels between biological families and scholarly lineages as fundamental reproductive units, we adapt the birth order concept to examine how doctoral student sequence within a lineage shapes career trajectories. Using data on over one million U.S. doctoral graduates, we find that later students of the same advisor systematically underperform earlier ones across multiple achievement dimensions, both short and long term. Examining underlying mechanisms reveals that although advisors invest comparable resources in all students, later students receive less cognitive stimulation from mature scholars than peers and specialize in narrower niches under peer differentiation pressure. Both of these factors constrain intellectual development and subsequent success. By introducing a demographic framework, this paper offers new perspectives on scientific stratification and demonstrates how demographic concepts can fruitfully analyze broader social and epistemic systems.
IRAug 4, 2025
Realizing Scaling Laws in Recommender Systems: A Foundation-Expert Paradigm for Hyperscale Model DeploymentDai Li, Kevin Course, Wei Li et al.
While scaling laws promise significant performance gains for recommender systems, efficiently deploying hyperscale models remains a major unsolved challenge. In contrast to fields where FMs are already widely adopted such as natural language processing and computer vision, progress in recommender systems is hindered by unique challenges including the need to learn from online streaming data under shifting data distributions, the need to adapt to different recommendation surfaces with a wide diversity in their downstream tasks and their input distributions, and stringent latency and computational constraints. To bridge this gap, we propose to leverage the Foundation-Expert Paradigm: a framework designed for the development and deployment of hyperscale recommendation FMs. In our approach, a central FM is trained on lifelong, cross-surface, multi-modal user data to learn generalizable knowledge. This knowledge is then efficiently transferred to various lightweight, surface-specific "expert" models via target-aware embeddings, allowing them to adapt to local data distributions and optimization goals with minimal overhead. To meet our training, inference and development needs, we built HyperCast, a production-grade infrastructure system that re-engineers training, serving, logging and iteration to power this decoupled paradigm. Our approach is now deployed at Meta serving tens of billions of user requests daily, demonstrating online metric improvements over our previous one-stage production system while improving developer velocity and maintaining infrastructure efficiency. To the best of our knowledge, this work represents the first successful deployment of a Foundation-Expert paradigm at this scale, offering a proven, compute-efficient, and developer-friendly blueprint to realize the promise of scaling laws in recommender systems.
HCSep 30, 2021
Dataset: Analysis of IFTTT Recipes to Study How Humans Use Internet-of-Things (IoT) DevicesHaoxiang Yu, Jie Hua, Christine Julien
With the rapid development and usage of Internet-of-Things (IoT) and smart-home devices, researchers continue efforts to improve the "smartness" of those devices to address daily needs in people's lives. Such efforts usually begin with understanding evolving user behaviors on how humans utilize the devices and what they expect in terms of their behavior. However, while research efforts abound, there is a very limited number of datasets that researchers can use to both understand how people use IoT devices and to evaluate algorithms or systems for smart spaces. In this paper, we collect and characterize more than 50,000 recipes from the online If-This-Then-That (IFTTT) service to understand a seemingly straightforward but complicated question: "What kinds of behaviors do humans expect from their IoT devices?"
LGMar 24, 2021
Opportunistic Federated Learning: An Exploration of Egocentric Collaboration for Pervasive Computing ApplicationsSangsu Lee, Xi Zheng, Jie Hua et al.
Pervasive computing applications commonly involve user's personal smartphones collecting data to influence application behavior. Applications are often backed by models that learn from the user's experiences to provide personalized and responsive behavior. While models are often pre-trained on massive datasets, federated learning has gained attention for its ability to train globally shared models on users' private data without requiring the users to share their data directly. However, federated learning requires devices to collaborate via a central server, under the assumption that all users desire to learn the same model. We define a new approach, opportunistic federated learning, in which individual devices belonging to different users seek to learn robust models that are personalized to their user's own experiences. However, instead of learning in isolation, these models opportunistically incorporate the learned experiences of other devices they encounter opportunistically. In this paper, we explore the feasibility and limits of such an approach, culminating in a framework that supports encounter-based pairwise collaborative learning. The use of our opportunistic encounter-based learning amplifies the performance of personalized learning while resisting overfitting to encountered data.
CYNov 1, 2019
rIoT: Enabling Seamless Context-Aware Automation in the Internet of ThingsJie Hua, Chenguang Liu, Tomasz Kalbarczyk et al.
Advances in mobile computing capabilities and an increasing number of Internet of Things (IoT) devices have enriched the possibilities of the IoT but have also increased the cognitive load required of IoT users. Existing context-aware systems provide various levels of automation in the IoT. Many of these systems adaptively take decisions on how to provide services based on assumptions made a priori. The approaches are difficult to personalize to an individual's dynamic environment, and thus today's smart IoT spaces often demand complex and specialized interactions with the user in order to provide tailored services. We propose rIoT, a framework for seamless and personalized automation of human-device interaction in the IoT. rIoT leverages existing technologies to operate across heterogeneous devices and networks to provide a one-stop solution for device interaction in the IoT. We show how rIoT exploits similarities between contexts and employs a decision-tree like method to adaptively capture a user's preferences from a small number of interactions with the IoT space. We measure the performance of rIoT on two real-world data sets and a real mobile device in terms of accuracy, learning speed, and latency in comparison to two state-of-the-art machine learning algorithms.