CVApr 22Code
MambaLiteUNet: Cross-Gated Adaptive Feature Fusion for Robust Skin Lesion SegmentationMd Maklachur Rahman, Soon Ki Jung, Tracy Hammond
Recent segmentation models have demonstrated promising efficiency by aggressively reducing parameter counts and computational complexity. However, these models often struggle to accurately delineate fine lesion boundaries and texture patterns essential for early skin cancer diagnosis and treatment planning. In this paper, we propose MambaLiteUNet, a compact yet robust segmentation framework that integrates Mamba state space modeling into a U-Net architecture, along with three key modules: Adaptive Multi-Branch Mamba Feature Fusion (AMF), Local-Global Feature Mixing (LGFM), and Cross-Gated Attention (CGA). These modules are designed to enhance local-global feature interaction, preserve spatial details, and improve the quality of skip connections. MambaLiteUNet achieves an average IoU of 87.12% and average Dice score of 93.09% across ISIC2017, ISIC2018, HAM10000, and PH2 benchmarks, outperforming state-of-the-art models. Compared to U-Net, our model improves average IoU and Dice by 7.72 and 4.61 points, respectively, while reducing parameters by 93.6% and GFLOPs by 97.6%. Additionally, in domain generalization with six unseen lesion categories, MambaLiteUNet achieves 77.61% IoU and 87.23% Dice, performing best among all evaluated models. Our extensive experiments demonstrate that MambaLiteUNet achieves a strong balance between accuracy and efficiency, making it a competitive and practical solution for dermatological image segmentation. Our code is publicly available at: https://github.com/maklachur/MambaLiteUNet.
SDFeb 18, 2025Code
Myna: Masking-Based Contrastive Learning of Musical RepresentationsOri Yonay, Tracy Hammond, Tianbao Yang
We present Myna, a simple yet effective approach for self-supervised musical representation learning. Built on a contrastive learning framework, Myna introduces two key innovations: (1) the use of a Vision Transformer (ViT) on mel-spectrograms as the backbone and (2) a novel data augmentation strategy, token masking, that masks 90 percent of spectrogram tokens. These innovations deliver both effectiveness and efficiency: (i) Token masking enables a significant increase in per-GPU batch size, from 48 or 120 in prior methods (CLMR, MULE) to 4096. (ii) By avoiding traditional augmentations, Myna retains pitch sensitivity, enhancing performance in tasks like key detection. (iii) The use of vertical patches allows the model to better capture critical features for key detection. Our hybrid model, Myna-22M-Hybrid, processes both 16x16 and 128x2 patches, achieving state-of-the-art results. Trained on a single GPU, it outperforms MULE (62M) on average and rivals MERT-95M, which was trained on 16 and 64 GPUs, respectively. Additionally, it surpasses MERT-95M-public, establishing itself as the best-performing model trained on publicly available data. We release our code and models to promote reproducibility and facilitate future research.
HCApr 15, 2025
Hashigo: A Next Generation Sketch Interactive System for Japanese KanjiPaul Taele, Tracy Hammond
Language students can increase their effectiveness in learning written Japanese by mastering the visual structure and written technique of Japanese kanji. Yet, existing kanji handwriting recognition systems do not assess the written technique sufficiently enough to discourage students from developing bad learning habits. In this paper, we describe our work on Hashigo, a kanji sketch interactive system which achieves human instructor-level critique and feedback on both the visual structure and written technique of students' sketched kanji. This type of automated critique and feedback allows students to target and correct specific deficiencies in their sketches that, if left untreated, are detrimental to effective long-term kanji learning.
HCApr 4, 2025
Maestoso: An Intelligent Educational Sketching Tool for Learning Music TheoryPaul Taele, Laura Barreto, Tracy Hammond
Learning music theory not only has practical benefits for musicians to write, perform, understand, and express music better, but also for both non-musicians to improve critical thinking, math analytical skills, and music appreciation. However, current external tools applicable for learning music theory through writing when human instruction is unavailable are either limited in feedback, lacking a written modality, or assuming already strong familiarity of music theory concepts. In this paper, we describe Maestoso, an educational tool for novice learners to learn music theory through sketching practice of quizzed music structures. Maestoso first automatically recognizes students' sketched input of quizzed concepts, then relies on existing sketch and gesture recognition techniques to automatically recognize the input, and finally generates instructor-emulated feedback. From our evaluations, we demonstrate that Maestoso performs reasonably well on recognizing music structure elements and that novice students can comfortably grasp introductory music theory in a single session.
SDApr 11
Masked Contrastive Pre-Training Improves Music Audio Key DetectionOri Yonay, Tracy Hammond, Tianbao Yang
Self-supervised music foundation models underperform on key detection, which requires pitch-sensitive representations. In this work, we present the first systematic study showing that the design of self-supervised pretraining directly impacts pitch sensitivity, and demonstrate that masked contrastive embeddings uniquely enable state-of-the-art (SOTA) performance in key detection in the supervised setting. First, we discover that linear evaluation after masking-based contrastive pretraining on Mel spectrograms leads to competitive performance on music key detection out of the box. This leads us to train shallow but wide multi-layer perceptrons (MLPs) on features extracted from our base model, leading to SOTA performance without the need for sophisticated data augmentation policies. We further analyze robustness and show empirically that the learned representations naturally encode common augmentations. Our study establishes self-supervised pretraining as an effective approach for pitch-sensitive MIR tasks and provides insights for designing and probing music foundation models.
HCApr 4, 2025
Kanji Workbook: A Writing-Based Intelligent Tutoring System for Learning Proper Japanese Kanji Writing Technique with Instructor-Emulated AssessmentPaul Taele, Jung In Koh, Tracy Hammond
Kanji script writing is a skill that is often introduced to novice Japanese foreign language students for achieving Japanese writing mastery, but often poses difficulties to students with primarily English fluency due to their its vast differences with written English. Instructors often introduce various pedagogical methods -- such as visual structure and written techniques -- to assist students in kanji study, but may lack availability providing direct feedback on students' writing outside of class. Current educational applications are also limited due to lacking richer instructor-emulated feedback. We introduce Kanji Workbook, a writing-based intelligent tutoring system for students to receive intelligent assessment that emulates human instructor feedback. Our interface not only leverages students' computing devices for allowing them to learn, practice, and review the writing of prompted characters from their course's kanji script lessons, but also provides a diverse set of writing assessment metrics -- derived from instructor interviews and classroom observation insights -- through intelligent scoring and visual animations. We deployed our interface onto novice- and intermediate-level university courses over an entire academic year, and observed that interface users on average achieved higher course grades than their peers and also reacted positively to our interface's various features.
CLFeb 17
DependencyAI: Detecting AI Generated Text through Dependency ParsingSara Ahmed, Tracy Hammond
As large language models (LLMs) become increasingly prevalent, reliable methods for detecting AI-generated text are critical for mitigating potential risks. We introduce DependencyAI, a simple and interpretable approach for detecting AI-generated text using only the labels of linguistic dependency relations. Our method achieves competitive performance across monolingual, multi-generator, and multilingual settings. To increase interpretability, we analyze feature importance to reveal syntactic structures that distinguish AI-generated from human-written text. We also observe a systematic overprediction of certain models on unseen domains, suggesting that generator-specific writing styles may affect cross-domain generalization. Overall, our results demonstrate that dependency relations alone provide a robust signal for AI-generated text detection, establishing DependencyAI as a strong linguistically grounded, interpretable, and non-neural network baseline.
HCMay 26, 2025
It's Not Just Labeling -- A Research on LLM Generated Feedback Interpretability and Image Labeling Sketch FeaturesBaichuan Li, Larry Powell, Tracy Hammond
The quality of training data is critical to the performance of machine learning applications in domains like transportation, healthcare, and robotics. Accurate image labeling, however, often relies on time-consuming, expert-driven methods with limited feedback. This research introduces a sketch-based annotation approach supported by large language models (LLMs) to reduce technical barriers and enhance accessibility. Using a synthetic dataset, we examine how sketch recognition features relate to LLM feedback metrics, aiming to improve the reliability and interpretability of LLM-assisted labeling. We also explore how prompting strategies and sketch variations influence feedback quality. Our main contribution is a sketch-based virtual assistant that simplifies annotation for non-experts and advances LLM-driven labeling tools in terms of scalability, accessibility, and explainability.
HCMar 21, 2018
Gaze-Assisted User Authentication to Counter Shoulder-surfing AttacksVijay Rajanna, Tracy Hammond
A highly secure, foolproof, user authentication system is still a primary focus of research in the field of User Privacy and Security. Shoulder-surfing is an act of spying when an authorized user is logging into a system, and is promoted by a malicious intent of gaining unauthorized access. We present a gaze-assisted user authentication system as a potential solution to counter shoulder-surfing attacks. The system comprises of an eye tracker and an authentication interface with 12 pre-defined shapes (e.g., triangle, circle, etc.) that move simultaneously on the screen. A user chooses a set of three shapes as a password. To authenticate, the user follows the paths of three shapes as they move, one on each frame, over three consecutive frames. The system uses either the template matching or decision tree algorithms to match the scan-path of the user's gaze with the path traversed by the shape. The system was evaluated with seven users to test the accuracy of both the algorithms. We found that with the template matching algorithm the system achieves an accuracy of 95%, and with the decision tree algorithm an accuracy of 90.2%. We also present the advantages and disadvantages of using both the algorithms. Our study suggests that gaze-based authentication is a highly secure method against shoulder-surfing attacks as the unique pattern of eye movements for each individual makes the system hard to break into.
HCMar 13, 2018
A Gaze-Assisted Multimodal Approach to Rich and Accessible Human-Computer InteractionVijay Rajanna, Tracy Hammond
Recent advancements in eye tracking technology are driving the adoption of gaze-assisted interaction as a rich and accessible human-computer interaction paradigm. Gaze-assisted interaction serves as a contextual, non-invasive, and explicit control method for users without disabilities; for users with motor or speech impairments, text entry by gaze serves as the primary means of communication. Despite significant advantages, gaze-assisted interaction is still not widely accepted because of its inherent limitations: 1) Midas touch, 2) low accuracy for mouse-like interactions, 3) need for repeated calibration, 4) visual fatigue with prolonged usage, 5) lower gaze typing speed, and so on. This dissertation research proposes a gaze-assisted, multimodal, interaction paradigm, and related frameworks and their applications that effectively enable gaze-assisted interactions while addressing many of the current limitations. In this regard, we present four systems that leverage gaze-assisted interaction: 1) a gaze- and foot-operated system for precise point-and-click interactions, 2) a dwell-free, foot-operated gaze typing system. 3) a gaze gesture-based authentication system, and 4) a gaze gesture-based interaction toolkit. In addition, we also present the goals to be achieved, technical approach, and overall contributions of this dissertation research.