84.9LGMay 25Code
ViroBench: Benchmarking Nucleotide Foundation Models on Viral Genomics TasksDongxin Ye, Fang Hu, Han Hu et al.
Nucleotide sequences constitute the fundamental genetic basis of biological systems, rendering viral genomic analysis critical for biomedical advancement. Despite progress in biological foundation models, specifically nucleotide foundation models (NFMs), the field lacks a unified standard for viral genomics to facilitate community development and enforce biosecurity constraints. To address this, we introduce ViroBench, the first comprehensive and large-scale benchmark specifically designed for NFMs in viral settings. ViroBench evaluates models across two critical dimensions: biological understanding and latent biosecurity risk, covering 18 diverse scenarios within 4 task types. Extensive evaluation of 66 NFMs across diverse architectures yields three critical conclusions. Firstly, NFMs exhibit a performance degradation in biological understanding under phylogenetic and temporal shifts, indicating weak extrapolation capabilities. Secondly, generation tasks reveal a decoupling between statistical likelihood and biological functional validity, posing latent biosecurity risks. Thirdly, controlled ablation studies reveal that taxonomic diversity in pretraining data outweighs parameter scale. Specifically, a lightweight baseline trained on diverse data achieves a 67.5% performance gain over its original model. Overall, ViroBench provides interpretable, diagnostic evaluations and a reproducible measurement framework for future research on viral nucleotide foundation models. The datasets and code are publicly available at https://github.com/QIANJINYDX/ViroBench.
IRAug 21, 2024
DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task RecommendationYaowen Bi, Yuteng Lian, Jie Cui et al.
Neural-based multi-task learning (MTL) has been successfully applied to many recommendation applications. However, these MTL models (e.g., MMoE, PLE) did not consider feature interaction during the optimization, which is crucial for capturing complex high-order features and has been widely used in ranking models for real-world recommender systems. Moreover, through feature importance analysis across various tasks in MTL, we have observed an interesting divergence phenomenon that the same feature can have significantly different importance across different tasks in MTL. To address these issues, we propose Deep Multiple Task-specific Feature Interactions Network (DTN) with a novel model structure design. DTN introduces multiple diversified task-specific feature interaction methods and task-sensitive network in MTL networks, enabling the model to learn task-specific diversified feature interaction representations, which improves the efficiency of joint representation learning in a general setup. We applied DTN to our company's real-world E-commerce recommendation dataset, which consisted of over 6.3 billion samples, the results demonstrated that DTN significantly outperformed state-of-the-art MTL models. Moreover, during online evaluation of DTN in a large-scale E-commerce recommender system, we observed a 3.28% in clicks, a 3.10% increase in orders and a 2.70% increase in GMV (Gross Merchandise Value) compared to the state-of-the-art MTL models. Finally, extensive offline experiments conducted on public benchmark datasets demonstrate that DTN can be applied to various scenarios beyond recommendations, enhancing the performance of ranking models.
CRNov 24, 2024
Hide in Plain Sight: Clean-Label Backdoor for Auditing Membership InferenceDepeng Chen, Hao Chen, Hulin Jin et al.
Membership inference attacks (MIAs) are critical tools for assessing privacy risks and ensuring compliance with regulations like the General Data Protection Regulation (GDPR). However, their potential for auditing unauthorized use of data remains under explored. To bridge this gap, we propose a novel clean-label backdoor-based approach for MIAs, designed specifically for robust and stealthy data auditing. Unlike conventional methods that rely on detectable poisoned samples with altered labels, our approach retains natural labels, enhancing stealthiness even at low poisoning rates. Our approach employs an optimal trigger generated by a shadow model that mimics the target model's behavior. This design minimizes the feature-space distance between triggered samples and the source class while preserving the original data labels. The result is a powerful and undetectable auditing mechanism that overcomes limitations of existing approaches, such as label inconsistencies and visual artifacts in poisoned samples. The proposed method enables robust data auditing through black-box access, achieving high attack success rates across diverse datasets and model architectures. Additionally, it addresses challenges related to trigger stealthiness and poisoning durability, establishing itself as a practical and effective solution for data auditing. Comprehensive experiments validate the efficacy and generalizability of our approach, outperforming several baseline methods in both stealth and attack success metrics.
LGNov 17, 2024
CLMIA: Membership Inference Attacks via Unsupervised Contrastive LearningDepeng Chen, Xiao Liu, Jie Cui et al.
Since machine learning model is often trained on a limited data set, the model is trained multiple times on the same data sample, which causes the model to memorize most of the training set data. Membership Inference Attacks (MIAs) exploit this feature to determine whether a data sample is used for training a machine learning model. However, in realistic scenarios, it is difficult for the adversary to obtain enough qualified samples that mark accurate identity information, especially since most samples are non-members in real world applications. To address this limitation, in this paper, we propose a new attack method called CLMIA, which uses unsupervised contrastive learning to train an attack model without using extra membership status information. Meanwhile, in CLMIA, we require only a small amount of data with known membership status to fine-tune the attack model. Experimental results demonstrate that CLMIA performs better than existing attack methods for different datasets and model structures, especially with data with less marked identity information. In addition, we experimentally find that the attack performs differently for different proportions of labeled identity information for member and non-member data. More analysis proves that our attack method performs better with less labeled identity information, which applies to more realistic scenarios.
CROct 18, 2018
Making Double Spectrum Auction Practical: Both Privacy and Efficiency MatterZhili Chen, Xuemei Wei, Hong Zhong et al.
Truthful spectrum auction is believed to be an effective method for spectrum redistribution. However, privacy concerns have largely hampered the practical applications of truthful spectrum auctions. In this paper, to make the applications of double spectrum auctions practical, we present a privacy-preserving and socially efficient double spectrum auction design, SDSA. Specifically, by combining three security techniques: homomorphic encryption, secret sharing and garbled circuits, we design a secure two-party protocol computing a socially efficient double spectrum auction, TDSA, without leaking any information about sellers' requests or buyers' bids beyond the auction outcome. We give the formal security definition in our context, and theoretically prove the security that our design achieves. Experimental results show that our design is also efficient in performance, even for large-scale double spectrum auctions.
CROct 18, 2018
Differentially Private Double Spectrum Auction with Approximate Social Welfare MaximizationZhili Chen, Tianjiao Ni, Hong Zhong et al.
Spectrum auction is an effective approach to improving spectrum utilization, by leasing idle spectrum from primary users to secondary users. Recently, a few differentially private spectrum auction mechanisms have been proposed, but, as far as we know, none of them addressed the differential privacy in the setting of double spectrum auctions. In this paper, we combine the concept of differential privacy with double spectrum auction design, and present a Differentially private Double spectrum auction mechanism with approximate Social welfare Maximization (DDSM). Specifically, we design the mechanism by employing the exponential mechanism to select clearing prices for the double spectrum auction with probabilities exponentially proportional to the related social welfare values, and then improve the mechanism in several aspects like the designs of the auction algorithm, the utility function and the buyer grouping algorithm. Through theoretical analysis, we prove that DDSM achieves differential privacy, approximate truthfulness, approximate social welfare maximization. Extensive experimental evaluations show that DDSM achieves a good performance in term of social welfare.
ITApr 26, 2018
Linear $(2,p,p)$-AONTs do ExistXin Wang, Jie Cui, Lijun Ji
A $(t,s,v)$-all-or-nothing transform (AONT) is a bijective mapping defined on $s$-tuples over an alphabet of size $v$, which satisfies that if any $s-t$ of the $s$ outputs are given, then the values of any $t$ inputs are completely undetermined. When $t$ and $v$ are fixed, to determine the maximum integer $s$ such that a $(t,s,v)$-AONT exists is the main research objective. In this paper, we solve three open problems proposed in [IEEE Trans. Inform. Theory 64 (2018), 3136-3143.] and show that there do exist linear $(2,p,p)$-AONTs. Then for the size of the alphabet being a prime power, we give the first infinite class of linear AONTs which is better than the linear AONTs defined by Cauchy matrices. Besides, we also present a recursive construction for general AONTs and a new relationship between AONTs and orthogonal arrays.