22.5HCMar 17
One Kiss: Emojis as Agents of Genre Flux in Generative ComicsXiruo Wang, Xinyi Jiang, Ziqi Lyu
Generative AI has made visual storytelling widely accessible, yet current prompt-based interactions often force users into a trade-off between precise control and creative flow. We present One Kiss, a co-creative comic generation system that introduces "Affective Steering". Instead of writing text prompts, users guide the tone of their story through emoji inputs, whose semantic ambiguity becomes a resource rather than a limitation. Unlike traditional text-to-image tools that rely on explicit descriptions, One Kiss uses a dual-stream input in which users define structural pacing by sketching panel frames and set atmospheric tone by pairing keywords with emojis. This mechanism enables "Genre Flux," where emotional inputs accumulate across panels and gradually shift the genre of a story. A preliminary study (N = 6) suggests that this soft steering approach may reframe the user's role from prompt engineer to narrative director, with ambiguity serving as a source of creative surprise rather than a loss of control.
LGJul 22, 2025
CIMR: Contextualized Iterative Multimodal Reasoning for Robust Instruction Following in LVLMsYangshu Yuan, Heng Chen, Xinyi Jiang et al.
The rapid advancement of Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) has enhanced our ability to process and generate human language and visual information. However, these models often struggle with complex, multi-step multi-modal instructions that require logical reasoning, dynamic feedback integration, and iterative self-correction. To address this, we propose CIMR: Contextualized Iterative Multimodal Reasoning, a novel framework that introduces a context-aware iterative reasoning and self-correction module. CIMR operates in two stages: initial reasoning and response generation, followed by iterative refinement using parsed multi-modal feedback. A dynamic fusion module deeply integrates textual, visual, and contextual features at each step. We fine-tune LLaVA-1.5-7B on the Visual Instruction Tuning (VIT) dataset and evaluate CIMR on the newly introduced Multi-modal Action Planning (MAP) dataset. CIMR achieves 91.5% accuracy, outperforming state-of-the-art models such as GPT-4V (89.2%), LLaVA-1.5 (78.5%), MiniGPT-4 (75.3%), and InstructBLIP (72.8%), demonstrating the efficacy of its iterative reasoning and self-correction capabilities in complex tasks.
CRDec 29, 2021
Physical Layer Security Techniques for Future Wireless NetworksWeiping Shi, Xinyi Jiang, Jinsong Hu et al.
The broadcast nature of wireless communication systems makes wireless transmission extremely susceptible to eavesdropping and even malicious interference. Physical layer security technology can effectively protect the private information sent by the transmitter from being listened to by illegal eavesdroppers, thus ensuring the privacy and security of communication between the transmitter and legitimate users. The development of mobile communication presents new challenges to physical layer security research. This paper provides a comprehensive survey of the physical layer security research on various promising mobile technologies, including directional modulation (DM), spatial modulation (SM), covert communication, intelligent reflecting surface (IRS)-aided communication, and so on. Finally, future trends and the unresolved technical challenges are summarized in physical layer security for mobile communications.
SDMar 2, 2021
Virufy: A Multi-Branch Deep Learning Network for Automated Detection of COVID-19Ahmed Fakhry, Xinyi Jiang, Jaclyn Xiao et al.
Fast and affordable solutions for COVID-19 testing are necessary to contain the spread of the global pandemic and help relieve the burden on medical facilities. Currently, limited testing locations and expensive equipment pose difficulties for individuals trying to be tested, especially in low-resource settings. Researchers have successfully presented models for detecting COVID-19 infection status using audio samples recorded in clinical settings [5, 15], suggesting that audio-based Artificial Intelligence models can be used to identify COVID-19. Such models have the potential to be deployed on smartphones for fast, widespread, and low-resource testing. However, while previous studies have trained models on cleaned audio samples collected mainly from clinical settings, audio samples collected from average smartphones may yield suboptimal quality data that is different from the clean data that models were trained on. This discrepancy may add a bias that affects COVID-19 status predictions. To tackle this issue, we propose a multi-branch deep learning network that is trained and tested on crowdsourced data where most of the data has not been manually processed and cleaned. Furthermore, the model achieves state-of-art results for the COUGHVID dataset [16]. After breaking down results for each category, we have shown an AUC of 0.99 for audio samples with COVID-19 positive labels.
SDNov 26, 2020
Virufy: Global Applicability of Crowdsourced and Clinical Datasets for AI Detection of COVID-19 from CoughGunvant Chaudhari, Xinyi Jiang, Ahmed Fakhry et al.
Rapid and affordable methods of testing for COVID-19 infections are essential to reduce infection rates and prevent medical facilities from becoming overwhelmed. Current approaches of detecting COVID-19 require in-person testing with expensive kits that are not always easily accessible. This study demonstrates that crowdsourced cough audio samples recorded and acquired on smartphones from around the world can be used to develop an AI-based method that accurately predicts COVID-19 infection with an ROC-AUC of 77.1% (75.2%-78.3%). Furthermore, we show that our method is able to generalize to crowdsourced audio samples from Latin America and clinical samples from South Asia, without further training using the specific samples from those regions. As more crowdsourced data is collected, further development can be implemented using various respiratory audio samples to create a cough analysis-based machine learning (ML) solution for COVID-19 detection that can likely generalize globally to all demographic groups in both clinical and non-clinical settings.
CLNov 5, 2019
Incremental Sense Weight Training for the Interpretation of Contextualized Word EmbeddingsXinyi Jiang, Zhengzhe Yang, Jinho D. Choi
We present a novel online algorithm that learns the essence of each dimension in word embeddings by minimizing the within-group distance of contextualized embedding groups. Three state-of-the-art neural-based language models are used, Flair, ELMo, and BERT, to generate contextualized word embeddings such that different embeddings are generated for the same word type, which are grouped by their senses manually annotated in the SemCor dataset. We hypothesize that not all dimensions are equally important for downstream tasks so that our algorithm can detect unessential dimensions and discard them without hurting the performance. To verify this hypothesis, we first mask dimensions determined unessential by our algorithm, apply the masked word embeddings to a word sense disambiguation task (WSD), and compare its performance against the one achieved by the original embeddings. Several KNN approaches are experimented to establish strong baselines for WSD. Our results show that the masked word embeddings do not hurt the performance and can improve it by 3%. Our work can be used to conduct future research on the interpretability of contextualized embeddings.