Back to Explore
cs.SDComputer Science

Sound

Audio processing, speech, music

99.2SDMar 18
MOSS-TTS Technical Report

Yitian Gong, Botian Jiang, Yiwei Zhao et al.

This work addresses the need for efficient and controllable text-to-speech systems, though it appears incremental as it builds on existing tokenization and transformer methods.

98.7SDMar 29Code
EvA: An Evidence-First Audio Understanding Paradigm for LALMs

Xinyuan Xie, Shunian Chen, Zhiheng Liu et al.

For researchers and practitioners in audio understanding, EvA demonstrates that preserving acoustic evidence before reasoning is critical for LALM performance, offering a new paradigm to address the evidence bottleneck.

97.9SDApr 20
VoxSafeBench: Not Just What Is Said, but Who, How, and Where

Yuxiang Wang, Hongyu Liu, Yijiang Xu et al.

For researchers and developers of speech language models, this benchmark exposes a pervasive speech grounding gap where models recognize social norms in text but fail to apply them when cues are grounded in speech.