Shuai Ye

CL
h-index117
4papers
3,177citations
Novelty60%
AI Score45

4 Papers

CVSep 13, 2023Code
Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

Zhenguang Liu, Xinyang Yu, Ruili Wang et al.

The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to extract useful representations. Despite its simplicity, this paradigm heavily relies on the original entangled features and lacks constraints guaranteeing that useful task-relevant semantics are extracted from the features. In this paper, we seek to tackle the above challenges from two aspects: (1) We propose to disentangle an original high-dimensional feature into multiple sub-features, explicitly disentangling the feature into exclusive lower-dimensional components. We expect the sub-features to encode non-overlapping semantics of the original feature and remove redundant information. (2) On top of the disentangled sub-features, we further learn an auxiliary feature to enhance the sub-features. We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature. Extensive experiments on two large-scale benchmark datasets (i.e., SVD and VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset. Our code and model have been released at https://github.com/yyyooooo/DMI/, hoping to contribute to the community.

NAMay 24, 2016
Generalized multiscale finite element methods for space-time heterogeneous parabolic equations

Eric T. Chung, Yalchin Efendiev, Wing Tat Leung et al.

In this paper, we consider local multiscale model reduction for problems with multiple scales in space and time. We developed our approaches within the framework of the Generalized Multiscale Finite Element Method (GMsFEM) using space-time coarse cells. The main idea of GMsFEM is to construct a local snapshot space and a local spectral decomposition in the snapshot space. Previous research in developing multiscale spaces within GMsFEM focused on constructing multiscale spaces and relevant ingredients in space only. In this paper, our main objective is to develop a multiscale model reduction framework within GMsFEM that uses space-time coarse cells. We construct space-time snapshot and offline spaces. We compute these snapshot solutions by solving local problems. A complete snapshot space will use all possible boundary conditions; however, this can be very expensive. We propose using randomized boundary conditions and oversampling. We construct the local spectral decomposition based on our analysis, as presented in the paper. We present numerical results to confirm our theoretical findings and to show that using our proposed approaches, we can obtain an accurate solution with low dimensional coarse spaces. We remark that the proposed method is a significant extension compared to existing methods, which use coarse cells in space only because of (1) the parabolic nature of cell solutions, (2) extra degrees of freedom associated with space-time cells, and (3) local boundary conditions in space-time cells.

CRApr 1, 2022
FedRecAttack: Model Poisoning Attack to Federated Recommendation

Dazhong Rong, Shuai Ye, Ruoyan Zhao et al.

Federated Recommendation (FR) has received considerable popularity and attention in the past few years. In FR, for each user, its feature vector and interaction data are kept locally on its own client thus are private to others. Without the access to above information, most existing poisoning attacks against recommender systems or federated learning lose validity. Benifiting from this characteristic, FR is commonly considered fairly secured. However, we argue that there is still possible and necessary security improvement could be made in FR. To prove our opinion, in this paper we present FedRecAttack, a model poisoning attack to FR aiming to raise the exposure ratio of target items. In most recommendation scenarios, apart from private user-item interactions (e.g., clicks, watches and purchases), some interactions are public (e.g., likes, follows and comments). Motivated by this point, in FedRecAttack we make use of the public interactions to approximate users' feature vectors, thereby attacker can generate poisoned gradients accordingly and control malicious users to upload the poisoned gradients in a well-designed way. To evaluate the effectiveness and side effects of FedRecAttack, we conduct extensive experiments on three real-world datasets of different sizes from two completely different scenarios. Experimental results demonstrate that our proposed FedRecAttack achieves the state-of-the-art effectiveness while its side effects are negligible. Moreover, even with small proportion (3%) of malicious users and small proportion (1%) of public interactions, FedRecAttack remains highly effective, which reveals that FR is more vulnerable to attack than people commonly considered.

CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.