ASSep 20, 2024Code
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing TasksYu Zhang, Changhao Pan, Wenxiang Guo et al.
The scarcity of high-quality and multi-task singing datasets significantly hinders the development of diverse controllable and personalized singing tasks, as existing singing datasets suffer from low quality, limited diversity of languages and singers, absence of multi-technique information and realistic music scores, and poor task suitability. To tackle these problems, we present GTSinger, a large global, multi-technique, free-to-use, high-quality singing corpus with realistic music scores, designed for all singing tasks, along with its benchmarks. Particularly, (1) we collect 80.59 hours of high-quality singing voices, forming the largest recorded singing dataset; (2) 20 professional singers across nine widely spoken languages offer diverse timbres and styles; (3) we provide controlled comparison and phoneme-level annotations of six commonly used singing techniques, helping technique modeling and control; (4) GTSinger offers realistic music scores, assisting real-world musical composition; (5) singing voices are accompanied by manual phoneme-to-audio alignments, global style labels, and 16.16 hours of paired speech for various singing tasks. Moreover, to facilitate the use of GTSinger, we conduct four benchmark experiments: technique-controllable singing voice synthesis, technique recognition, style transfer, and speech-to-singing conversion. The demos can be found at http://aaronz345.github.io/GTSingerDemo/. We provide the dataset and the code for processing data and conducting benchmarks at https://huggingface.co/datasets/AaronZ345/GTSinger and https://github.com/AaronZ345/GTSinger.
SDAug 23, 2024Code
Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled SamplesQi Fan, Yutong Li, Yi Xin et al.
The Multimodal Emotion Recognition challenge MER2024 focuses on recognizing emotions using audio, language, and visual signals. In this paper, we present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI), which tackles the issue of limited annotated data in emotion recognition. Firstly, to address the class imbalance, we adopt an oversampling strategy. Secondly, we propose a modality representation combinatorial contrastive learning (MR-CCL) framework on the trimodal input data to establish robust initial models. Thirdly, we explore a self-training approach to expand the training set. Finally, we enhance prediction robustness through a multi-classifier weighted soft voting strategy. Our proposed method is validated to be effective on the MER2024-SEMI Challenge, achieving a weighted average F-score of 88.25% and ranking 6th on the leaderboard. Our project is available at https://github.com/WooyoohL/MER2024-SEMI.
84.4AIMay 12Code
CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAGJianghan Shen, Siqi Luo, Xinyu Cheng et al.
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for training agentic retrieval-augmented generation (RAG) systems from outcome-only supervision. Most existing methods optimize policies from uniformly sampled rollouts, implicitly treating all trajectories as equally informative. However, trajectories differ substantially in search depth and are therefore not equally informative: deeper-search trajectories contain more retrieval decision points and provide denser direct supervision for the retrieval sub-policy. Moreover, this heterogeneity grows over training as the within-batch depth distribution shifts toward higher values, yet uniform rollout sampling remains blind to this shift. To address this, we propose CuSearch, a curriculum rollout sampling framework built on Search-Depth Greedy Allocation (SDGA), a batch-level operator that reallocates a fixed update budget toward deeper-search trajectories. SDGA-Auto always targets the deepest available trajectories in the current batch, yielding an implicit training-aligned curriculum as the depth distribution shifts upward. SDGA-Phase explicitly advances the curriculum threshold as deeper trajectories become sufficiently abundant. Experiments across model types and retrieval frameworks show that CuSearch consistently improves performance, achieving up to 11.8 exact-match points over standard GRPO on ZeroSearch. These results establish per-trajectory search depth as a reliable, annotation-free proxy for retrieval supervision density in RLVR-based agentic RAG training. The code is available at https://github.com/MrToser/CuSearch.
CVFeb 20
DEIG: Detail-Enhanced Instance Generation with Fine-Grained Semantic ControlShiyan Du, Conghan Yue, Xinyu Cheng et al.
Multi-Instance Generation has advanced significantly in spatial placement and attribute binding. However, existing approaches still face challenges in fine-grained semantic understanding, particularly when dealing with complex textual descriptions. To overcome these limitations, we propose DEIG, a novel framework for fine-grained and controllable multi-instance generation. DEIG integrates an Instance Detail Extractor (IDE) that transforms text encoder embeddings into compact, instance-aware representations, and a Detail Fusion Module (DFM) that applies instance-based masked attention to prevent attribute leakage across instances. These components enable DEIG to generate visually coherent multi-instance scenes that precisely match rich, localized textual descriptions. To support fine-grained supervision, we construct a high-quality dataset with detailed, compositional instance captions generated by VLMs. We also introduce DEIG-Bench, a new benchmark with region-level annotations and multi-attribute prompts for both humans and objects. Experiments demonstrate that DEIG consistently outperforms existing approaches across multiple benchmarks in spatial consistency, semantic accuracy, and compositional generalization. Moreover, DEIG functions as a plug-and-play module, making it easily integrable into standard diffusion-based pipelines.
LGSep 7, 2020
Scalar Coupling Constant Prediction Using Graph Embedding Local Attention EncoderCaiqing Jian, Xinyu Cheng, Jian Zhang et al.
Scalar coupling constant (SCC) plays a key role in the analysis of three-dimensional structure of organic matter, however, the traditional SCC prediction using quantum mechanical calculations is very time-consuming. To calculate SCC efficiently and accurately, we proposed a graph embedding local self-attention encoder (GELAE) model, in which, a novel invariant structure representation of the coupling system in terms of bond length, bond angle and dihedral angle was presented firstly, and then a local self-attention module embedded with the adjacent matrix of a graph was designed to extract effectively the features of coupling systems, finally, with a modified classification loss function, the SCC was predicted. To validate the superiority of the proposed method, we conducted a series of comparison experiments using different structure representations, different attention modules, and different losses. The experimental results demonstrate that, compared to the traditional chemical bond structure representations, the rotation and translation invariant structure representations proposed in this work can improve the SCC prediction accuracy; with the graph embedded local self-attention, the mean absolute error (MAE) of the prediction model in the validation set decreases from 0.1603 Hz to 0.1067 Hz; using the classification based loss function instead of the scaled regression loss, the MAE of the predicted SCC can be decreased to 0.0963 HZ, which is close to the quantum chemistry standard on CHAMPS dataset.
IVDec 17, 2019
CNN-Based Invertible Wavelet Scattering for the Investigation of Diffusion Properties of the In Vivo Human Heart in Diffusion Tensor ImagingZeyu Deng, Lihui Wang, Zixiang Kuai et al.
In vivo diffusion tensor imaging (DTI) is a promising technique to investigate noninvasively the fiber structures of the in vivo human heart. However, signal loss due to motions remains a persistent problem in in vivo cardiac DTI. We propose a novel motion-compensation method for investigating in vivo myocardium structures in DTI with free-breathing acquisitions. The method is based on an invertible Wavelet Scattering achieved by means of Convolutional Neural Network (WSCNN). It consists of first extracting translation-invariant wavelet scattering features from DW images acquired at different trigger delays and then mapping the fused scattering features into motion-compensated spatial DW images by performing an inverse wavelet scattering transform achieved using CNN. The results on both simulated and acquired in vivo cardiac DW images showed that the proposed WSCNN method effectively compensates for motion-induced signal loss and produces in vivo cardiac DW images with better quality and more coherent fiber structures with respect to existing methods, which makes it an interesting method for measuring correctly the diffusion properties of the in vivo human heart in DTI under free breathing.
IVMay 23, 2019
Convolutional Restricted Boltzmann Machine Based-Radiomics for Prediction of Pathological Complete Response to Neoadjuvant Chemotherapy in Breast CancerLi Wang, Lihui Wang, Qijian Chen et al.
We proposed a novel convolutional restricted Boltzmann machine CRBM-based radiomic method for predicting pathologic complete response (pCR) to neoadjuvant chemotherapy treatment (NACT) in breast cancer. The method consists of extracting semantic features from CRBM network, and pCR prediction. It was evaluated on the dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) data of 57 patients and using the area under the receiver operating characteristic curve (AUC). Traditional radiomics features and the semantic features learned from CRBM network were extracted from the images acquired before and after the administration of NACT. After the feature selection, the support vector machine (SVM), logistic regression (LR) and random forest (RF) were trained to predict the pCR status. Compared to traditional radiomic methods, the proposed CRBM-based radiomic method yielded an AUC of 0.92 for the prediction with the images acquired before and after NACT, and an AUC of 0.87 for the pretreatment prediction, which was increased by about 38%. The results showed that the CRBM-based radiomic method provided a potential means for accurately predicting the pCR to NACT in breast cancer before the treatment, which is very useful for making more appropriate and personalized treatment regimens.