LGDec 16, 2022
GeneFormer: Learned Gene Compression using Transformer-based Context ModelingZhanbei Cui, Yu Liao, Tongda Xu et al.
With the development of gene sequencing technology, an explosive growth of gene data has been witnessed. And the storage of gene data has become an important issue. Traditional gene data compression methods rely on general software like G-zip, which fails to utilize the interrelation of nucleotide sequence. Recently, many researchers begin to investigate deep learning based gene data compression method. In this paper, we propose a transformer-based gene compression method named GeneFormer. Specifically, we first introduce a modified transformer structure to fully explore the nucleotide sequence dependency. Then, we propose fixed-length parallel grouping to accelerate the decoding speed of our autoregressive model. Experimental results on real-world datasets show that our method saves 29.7% bit rate compared with the state-of-the-art method, and the decoding speed is significantly faster than all existing learning-based gene compression methods.
CVOct 16, 2023
GreatSplicing: A Semantically Rich Splicing DatasetJiaming Liang, Yuwan Xue, Haowei Liu et al.
In existing splicing forgery datasets, the insufficient semantic variety of spliced regions causes trained detection models to overfit semantic features rather than learn genuine splicing traces. Meanwhile, the lack of a reasonable benchmark dataset has led to inconsistent experimental settings across existing detection methods. To address these issues, we propose GreatSplicing, a manually created, large-scale, high-quality splicing dataset. GreatSplicing comprises 5,000 spliced images and covers spliced regions across 335 distinct semantic categories, enabling detection models to learn splicing traces more effectively. Empirical results show that detection models trained on GreatSplicing achieve low misidentification rates and stronger cross-dataset generalization compared to existing datasets. GreatSplicing is now publicly available for research purposes at the following link.
CLFeb 17, 2024
Grammaticality illusion or ambiguous interpretation? Event-related potentials reveal the nature of the missing-NP effect in Mandarin centre-embedded structuresQihang Yang, Caimei Yang, Yu Liao et al.
In several languages, omitting a verb phrase (VP) in double centre-embedded structures creates a grammaticality illusion. Similar illusion also exhibited in Mandarin missing-NP double centre-embedded structures. However, there is no consensus on its very nature. Instead of treating it as grammaticality illusion, we argue that ambiguous interpretations of verbs can best account for this phenomenon in Mandarin. To further support this hypothesis, we conducted two electroencephalography (EEG) experiments on quasi double centre-embedded structures whose complexity is reduced by placing the self-embedding relative clauses into the sentence's subject position. Experiment 1 showed that similar phenomenon even exhibited in this structure, evidenced by an absence of P600 effect and a presence of N400 effect. In Experiment 2, providing semantic cues to reduce ambiguity dispelled this illusion, as evidenced by a P600 effect. We interpret the results under garden-path theory and propose that word-order difference may account for this cross-linguistic variation.