99.7AIMar 10Code
Logics-Parsing-Omni Technical ReportXin An, Jingyi Cai, Xiangyang Chen et al.
Addressing the challenges of fragmented task definitions and the heterogeneity of unstructured data in multimodal parsing, this paper proposes the Omni Parsing framework. This framework establishes a Unified Taxonomy covering documents, images, and audio-visual streams, introducing a progressive parsing paradigm that bridges perception and cognition. Specifically, the framework integrates three hierarchical levels: 1) Holistic Detection, which achieves precise spatial-temporal grounding of objects or events to establish a geometric baseline for perception; 2) Fine-grained Recognition, which performs symbolization (e.g., OCR/ASR) and attribute extraction on localized objects to complete structured entity parsing; and 3) Multi-level Interpreting, which constructs a reasoning chain from local semantics to global logic. A pivotal advantage of this framework is its evidence anchoring mechanism, which enforces a strict alignment between high-level semantic descriptions and low-level facts. This enables ``evidence-based'' logical induction, transforming unstructured signals into standardized knowledge that is locatable, enumerable, and traceable. Building on this foundation, we constructed a standardized dataset and released the Logics-Parsing-Omni model, which successfully converts complex audio-visual signals into machine-readable structured knowledge. Experiments demonstrate that fine-grained perception and high-level cognition are synergistic, effectively enhancing model reliability. Furthermore, to quantitatively evaluate these capabilities, we introduce OmniParsingBench. Code, models and the benchmark are released at https://github.com/alibaba/Logics-Parsing/tree/master/Logics-Parsing-Omni.
8.4CVApr 22
Opportunistic Bone-Loss Screening from Routine Knee Radiographs Using a Multi-Task Deep Learning Framework with Sensitivity-Constrained Threshold OptimizationZhaochen Li, Xinghao Yan, Runni Zhou et al.
Background: Osteoporosis and osteopenia are often undiagnosed until fragility fractures occur. Dual-energy X-ray absorptiometry (DXA) is the reference standard for bone mineral density (BMD) assessment, but access remains limited. Knee radiographs are obtained at high volume for osteoarthritis evaluation and may offer an opportunity for opportunistic bone-loss screening. Objective: To develop and evaluate a multi-task deep learning system for opportunistic bone-loss screening from routine knee radiographs without additional imaging or patient visits. Methods: We developed STR-Net, a multi-task framework for single-channel grayscale knee radiographs. The model includes a shared backbone, global average pooling feature aggregation, a shared neck, and a task-aware representation routing module connected to three task-specific heads: binary screening (Normal vs. Bone Loss), severity sub-classification (Osteopenia vs. Osteoporosis), and weakly coupled T-score regression with optional clinical variables. A sensitivity-constrained threshold optimization strategy (minimum sensitivity >= 0.86) was applied. The dataset included 1,570 knee radiographs, split at the patient level into training (n=1,120), validation (n=226), and test (n=224) sets. Results: On the held-out test set, STR-Net achieved an AUROC of 0.933, sensitivity of 0.904, specificity of 0.773, and AUPRC of 0.956 for binary screening. Severity sub-classification achieved an AUROC of 0.898. The T-score regression branch showed a Pearson correlation of 0.801 with DXA-measured T-scores in a pilot subset (n=31), with MAE of 0.279 and RMSE of 0.347. Conclusions: STR-Net enables single-pass bone-loss screening, severity stratification, and quantitative T-score estimation from routine knee radiographs. Prospective clinical validation is needed before deployment.
20.1LGMay 13
Byzantine-Robust Distributed Sparse Learning RevisitedYuxuan Wang, Lixin Zhang, Kangqiang Li
We revisit Byzantine robust distributed estimation for high-dimensional sparse linear models. By combining local $\ell_1$-regularized robust estimation with robust aggregation at the server, the framework applies to pseudo-Huber regression, quantile regression, and sparse SVM. We show that the resulting estimators yield non-asymptotic guarantees and attain near-optimal statistical rates under mild conditions, while remaining communication-efficient. Simulations confirm strong robustness in estimation, support recovery and classification accuracy under various Byzantine attacks.
LGJan 24, 2025
Humanity's Last ExamLong Phan, Alice Gatti, Ziwen Han et al. · amazon-science, apple-ml
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
CLSep 25, 2025Code
Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific ReasoningXiangru Tang, Wanghan Xu, Yujie Wang et al.
Large language models (LLMs) have recently shown strong progress on scientific reasoning, yet two major bottlenecks remain. First, explicit retrieval fragments reasoning, imposing a hidden "tool tax" of extra tokens and steps. Second, multi-agent pipelines often dilute strong solutions by averaging across all candidates. We address these challenges with a unified framework that combines implicit retrieval and structured collaboration. At its foundation, a Monitor-based retrieval module operates at the token level, integrating external knowledge with minimal disruption to reasoning. On top of this substrate, Hierarchical Solution Refinement (HSR) iteratively designates each candidate as an anchor to be repaired by its peers, while Quality-Aware Iterative Reasoning (QAIR) adapts refinement to solution quality. On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3\% accuracy -- the highest reported to date, surpassing the strongest agent baseline by 13.4 points and leading frontier LLMs by up to 18.1 points, while simultaneously reducing token usage by 53.5\% and agent steps by 43.7\%. Results on SuperGPQA and TRQA confirm robustness across domains. Error analysis shows that reasoning failures and knowledge gaps co-occur in over 85\% of cases, while diversity analysis reveals a clear dichotomy: retrieval tasks benefit from solution variety, whereas reasoning tasks favor consensus. Together, these findings demonstrate how implicit augmentation and structured refinement overcome the inefficiencies of explicit tool use and uniform aggregation. Code is available at: https://github.com/tangxiangru/Eigen-1.
IRDec 13, 2021Code
CT4Rec: Simple yet Effective Consistency Training for Sequential RecommendationChong Liu, Xiaoyang Liu, Rongqin Zheng et al.
Sequential recommendation methods are increasingly important in cutting-edge recommender systems. Through leveraging historical records, the systems can capture user interests and perform recommendations accordingly. State-of-the-art sequential recommendation models proposed very recently combine contrastive learning techniques for obtaining high-quality user representations. Though effective and performing well, the models based on contrastive learning require careful selection of data augmentation methods and pretext tasks, efficient negative sampling strategies, and massive hyper-parameters validation. In this paper, we propose an ultra-simple alternative for obtaining better user representations and improving sequential recommendation performance. Specifically, we present a simple yet effective \textbf{C}onsistency \textbf{T}raining method for sequential \textbf{Rec}ommendation (CT4Rec) in which only two extra training objectives are utilized without any structural modifications and data augmentation. Experiments on three benchmark datasets and one large newly crawled industrial corpus demonstrate that our proposed method outperforms SOTA models by a large margin and with much less training time than these based on contrastive learning. Online evaluation on real-world content recommendation system also achieves 2.717\% improvement on the click-through rate and 3.679\% increase on the average click number per capita. Further exploration reveals that such a simple method has great potential for CTR prediction. Our code is available at \url{https://github.com/ct4rec/CT4Rec.git}.
CVSep 23, 2025
Source-Free Domain Adaptive Semantic Segmentation of Remote Sensing Images with Diffusion-Guided Label EnrichmentWenjie Liu, Hongmin Liu, Lixin Zhang et al.
Research on unsupervised domain adaptation (UDA) for semantic segmentation of remote sensing images has been extensively conducted. However, research on how to achieve domain adaptation in practical scenarios where source domain data is inaccessible namely, source-free domain adaptation (SFDA) remains limited. Self-training has been widely used in SFDA, which requires obtaining as many high-quality pseudo-labels as possible to train models on target domain data. Most existing methods optimize the entire pseudo-label set to obtain more supervisory information. However, as pseudo-label sets often contain substantial noise, simultaneously optimizing all labels is challenging. This limitation undermines the effectiveness of optimization approaches and thus restricts the performance of self-training. To address this, we propose a novel pseudo-label optimization framework called Diffusion-Guided Label Enrichment (DGLE), which starts from a few easily obtained high-quality pseudo-labels and propagates them to a complete set of pseudo-labels while ensuring the quality of newly generated labels. Firstly, a pseudo-label fusion method based on confidence filtering and super-resolution enhancement is proposed, which utilizes cross-validation of details and contextual information to obtain a small number of high-quality pseudo-labels as initial seeds. Then, we leverage the diffusion model to propagate incomplete seed pseudo-labels with irregular distributions due to its strong denoising capability for randomly distributed noise and powerful modeling capacity for complex distributions, thereby generating complete and high-quality pseudo-labels. This method effectively avoids the difficulty of directly optimizing the complete set of pseudo-labels, significantly improves the quality of pseudo-labels, and thus enhances the model's performance in the target domain.
LGMay 29, 2023
Graph Exploration Matters: Improving both individual-level and system-level diversity in WeChat Feed RecommenderShuai Yang, Lixin Zhang, Feng Xia et al.
There are roughly three stages in real industrial recommendation systems, candidates generation (retrieval), ranking and reranking. Individual-level diversity and system-level diversity are both important for industrial recommender systems. The former focus on each single user's experience, while the latter focus on the difference among users. Graph-based retrieval strategies are inevitably hijacked by heavy users and popular items, leading to the convergence of candidates for users and the lack of system-level diversity. Meanwhile, in the reranking phase, Determinantal Point Process (DPP) is deployed to increase individual-level diverisity. Heavily relying on the semantic information of items, DPP suffers from clickbait and inaccurate attributes. Besides, most studies only focus on one of the two levels of diversity, and ignore the mutual influence among different stages in real recommender systems. We argue that individual-level diversity and system-level diversity should be viewed as an integrated problem, and we provide an efficient and deployable solution for web-scale recommenders. Generally, we propose to employ the retrieval graph information in diversity-based reranking, by which to weaken the hidden similarity of items exposed to users, and consequently gain more graph explorations to improve the system-level diveristy. Besides, we argue that users' propensity for diversity changes over time in content feed recommendation. Therefore, with the explored graph, we also propose to capture the user's real-time personalized propensity to the diversity. We implement and deploy the combined system in WeChat App's Top Stories used by hundreds of millions of users. Offline simulations and online A/B tests show our solution can effectively improve both user engagement and system revenue.
CRMay 17, 2020
A Lightweight Isolation Mechanism for Secure Branch PredictorsLutan Zhao, Peinan Li, Rui Hou et al.
Recently exposed vulnerabilities reveal the necessity to improve the security of branch predictors. Branch predictors record history about the execution of different programs, and such information from different processes are stored in the same structure and thus accessible to each other. This leaves the attackers with the opportunities for malicious training and malicious perception. Instead of flush-based or physical isolation of hardware resources, we want to achieve isolation of the content in these hardware tables with some lightweight processing using randomization as follows. (1) Content encoding. We propose to use hardware-based thread-private random numbers to encode the contents of the branch predictor tables (both direction and destination histories) which we call XOR-BP. Specifically, the data is encoded by XOR operation with the key before written in the table and decoded after read from the table. Such a mechanism obfuscates the information adding difficulties to cross-process or cross-privilege level analysis and perception. It achieves a similar effect of logical isolation but adds little in terms of space or time overheads. (2) Index encoding. We propose a randomized index mechanism of the branch predictor (Noisy-XOR-BP). Similar to the XOR-BP, another thread-private random number is used together with the branch instruction address as the input to compute the index of the branch predictor. This randomized indexing mechanism disrupts the correspondence between the branch instruction address and the branch predictor entry, thus increases the noise for malicious perception attacks. Our analyses using an FPGA-based RISC-V processor prototype and additional auxiliary simulations suggest that the proposed mechanisms incur a very small performance cost while providing strong protection.
CRApr 9, 2019
Enabling Privacy-Preserving, Compute- and Data-Intensive Computing using Heterogeneous Trusted Execution EnvironmentJianping Zhu, Rui Hou, XiaoFeng Wang et al.
There is an urgent demand for privacy-preserving techniques capable of supporting compute and data intensive (CDI) computing in the era of big data. However, none of existing TEEs can truly support CDI computing tasks, as CDI requires high throughput accelerators like GPU and TPU but TEEs do not offer security protection of such accelerators. This paper present HETEE (Heterogeneous TEE), the first design of TEE capable of strongly protecting heterogeneous computing with unsecure accelerators. HETEE is uniquely constructed to work with today's servers, and does not require any changes for existing commercial CPUs or accelerators. The key idea of our design runs security controller as a stand-alone computing system to dynamically adjust the boundary of between secure and insecure worlds through the PCIe switches, rendering the control of an accelerator to the host OS when it is not needed for secure computing, and shifting it back when it is. The controller is the only trust unit in the system and it runs the custom OS and accelerator runtimes, together with the encryption, authentication and remote attestation components. The host server and other computing systems communicate with controller through an in memory task queue that accommodates the computing tasks offloaded to HETEE, in the form of encrypted and signed code and data. Also, HETEE offers a generic and efficient programming model to the host CPU. We have implemented the HETEE design on a hardware prototype system, and evaluated it with large-scale Neural Networks inference and training tasks. Our evaluations show that HETEE can easily support such secure computing tasks and only incurs a 12.34% throughput overhead for inference and 9.87% overhead for training on average.