GRMar 29
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World TasksSamin Mahdizadeh Sani, Max Ku, Nima Jamali et al.
Advances in diffusion, autoregressive, and hybrid models have enabled high-quality image synthesis for tasks such as text-to-image, editing, and reference-guided composition. Yet, existing benchmarks remain limited, either focus on isolated tasks, cover only narrow domains, or provide opaque scores without explaining failure modes. We introduce \textbf{ImagenWorld}, a benchmark of 3.6K condition sets spanning six core tasks (generation and editing, with single or multiple references) and six topical domains (artworks, photorealistic images, information graphics, textual graphics, computer graphics, and screenshots). The benchmark is supported by 20K fine-grained human annotations and an explainable evaluation schema that tags localized object-level and segment-level errors, complementing automated VLM-based metrics. Our large-scale evaluation of 14 models yields several insights: (1) models typically struggle more in editing tasks than in generation tasks, especially in local edits. (2) models excel in artistic and photorealistic settings but struggle with symbolic and text-heavy domains such as screenshots and information graphics. (3) closed-source systems lead overall, while targeted data curation (e.g., Qwen-Image) narrows the gap in text-heavy cases. (4) modern VLM-based metrics achieve Kendall accuracies up to 0.79, approximating human ranking, but fall short of fine-grained, explainable error attribution. ImagenWorld provides both a rigorous benchmark and a diagnostic tool to advance robust image generation.
SPAug 6, 2025
Energy-Efficient Real-Time 4-Stage Sleep Classification at 10-Second Resolution: A Comprehensive StudyZahra Mohammadi, Parnian Fazel, Siamak Mohammadi
Sleep stage classification is crucial for diagnosing and managing disorders such as sleep apnea and insomnia. Conventional clinical methods like polysomnography are costly and impractical for long-term home use. We present an energy-efficient pipeline that detects four sleep stages (wake, REM, light, and deep) from a single-lead ECG. Two windowing strategies are introduced: (1) a 5-minute window with 30-second steps for machine-learning models that use handcrafted features, and (2) a 30-second window with 10-second steps for deep-learning models, enabling near-real-time 10-second resolution. Lightweight networks such as MobileNet-v1 reach 92 percent accuracy and 91 percent F1-score but still draw significant energy. We therefore design SleepLiteCNN, a custom model that achieves 89 percent accuracy and 89 percent F1-score while lowering energy use to 5.48 microjoules per inference at 45 nm. Applying eight-bit quantization preserves accuracy and further reduces power, and FPGA deployment confirms low resource usage. The proposed system offers a practical solution for continuous, wearable ECG-based sleep monitoring.
CLMar 22, 2025
ParsiPy: NLP Toolkit for Historical Persian Texts in PythonFarhan Farsi, Parnian Fazel, Sepand Haghighi et al.
The study of historical languages presents unique challenges due to their complex orthographic systems, fragmentary textual evidence, and the absence of standardized digital representations of text in those languages. Tackling these challenges needs special NLP digital tools to handle phonetic transcriptions and analyze ancient texts. This work introduces ParsiPy, an NLP toolkit designed to facilitate the analysis of historical Persian languages by offering modules for tokenization, lemmatization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embedding. We demonstrate the utility of our toolkit through the processing of Parsig (Middle Persian) texts, highlighting its potential for expanding computational methods in the study of historical languages. Through this work, we contribute to computational philology, offering tools that can be adapted for the broader study of ancient texts and their digital preservation.