CVJun 4Code
TextWand: A Unified Framework for Scene Text EditingShuyu Wang, Zhile Guan, Hongxiu Chen et al.
We propose TextWand, a general-purpose framework that unifies scene text removal, generation, and replacement into a single model. By decomposing complex editing tasks into the atomic primitives of rendering and erasure, TextWand achieves precise control over both text appearance and background integrity. Specifically, we introduce a novel design, Overlay-Reference Positional Encoding (ORPE), to enforce pixel-level layout fidelity and exemplar-driven style control, alongside a new strategy, Region-Adaptive Suppression (RAS), to ensure clean text erasure. To address the absence of a comprehensive benchmark for general-purpose scene text editing among existing single-task datasets, we construct TextWand-Bench. Extensive experiments demonstrate that TextWand outperforms existing leading open-source and closed-source models by delivering superior text content accuracy, layout and style consistency, and overall image quality across scene text removal, generation and replacement tasks.
CLJun 2
ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search AgentsZheng Liu, Longxiang Zhang, Xintong Wang et al.
LLM-based search agents are trained predominantly with outcome-only reward, leaving the search process itself unsupervised. This signal degenerates on outcome-homogeneous groups where all sampled trajectories share the same correctness, yielding zero within-group advantage and no gradient. Existing process supervision either trains a costly verifier or generates per-query rubrics that are inconsistent across queries and discarded after one use. We propose ARBOR (Adaptive Rubric Buffer for Online Reward), a reusable process-reward framework that maintains a rubric memory shared across queries. Query-local drafts induced from contrastive trajectories are admitted, consolidated into cross-query common rubrics, and retired as the policy evolves. A small active subset of common rubrics scores trajectories via sparse pairwise judging, and the resulting scores are added to the base reward, providing process-level gradient even when outcome reward is uniform. ARBOR consistently outperforms GRPO and DAPO baselines on four multi-hop QA benchmarks, raising average LLM-judge accuracy by up to 4.2 points and converting up to 42% of otherwise-zero-gradient training groups into informative ones.
CLJul 2, 2023
PatternGPT :A Pattern-Driven Framework for Large Language Model Text GenerationLe Xiao, Xin Shan
Large language models(LLMS)have shown excellent text generation capabilities, capable of generating fluent human-like responses for many downstream tasks. However, applying large language models to real-world critical tasks remains challenging due to their susceptibility to hallucinations and inability to directly use external knowledge. To cope with the above challenges, this paper proposes PatternGPT, a pattern-driven text generation framework for Large Language Models. Firstly, the framework utilizes the extraction capability of Large Language Models to generate rich and diversified structured and formalized patterns, which facilitates the introduction of external knowledge to do the computation, and then draws on the idea of federated learning to use multiple agents to achieve the sharing in order to obtain more diversified patterns, and finally uses judgment criteria and optimization algorithm to search for high-quality patterns to guide the generation of models. Finally, external knowledge such as judgment criteria and optimization algorithms are used to search for high-quality patterns, and the searched patterns are used to guide model generation. This framework has the advantages of generating diversified patterns, protecting data privacy, combining external knowledge, and improving the quality of generation, which provides an effective method to optimize the text generation capability of large language models, and make it better applied to the field of intelligent dialogue and content generation.
CLSep 15, 2023
Research on Joint Representation Learning Methods for Entity Neighborhood Information and Description InformationLe Xiao, Xin Shan, Yuhua Wang et al.
To address the issue of poor embedding performance in the knowledge graph of a programming design course, a joint represen-tation learning model that combines entity neighborhood infor-mation and description information is proposed. Firstly, a graph at-tention network is employed to obtain the features of entity neigh-boring nodes, incorporating relationship features to enrich the structural information. Next, the BERT-WWM model is utilized in conjunction with attention mechanisms to obtain the representation of entity description information. Finally, the final entity vector representation is obtained by combining the vector representations of entity neighborhood information and description information. Experimental results demonstrate that the proposed model achieves favorable performance on the knowledge graph dataset of the pro-gramming design course, outperforming other baseline models.
CVOct 29, 2025
RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation ModelsZijun Liao, Yian Zhao, Xin Shan et al.
Real-time object detection has achieved substantial progress through meticulously designed architectures and optimization strategies. However, the pursuit of high-speed inference via lightweight network designs often leads to degraded feature representation, which hinders further performance improvements and practical on-device deployment. In this paper, we propose a cost-effective and highly adaptable distillation framework that harnesses the rapidly evolving capabilities of Vision Foundation Models (VFMs) to enhance lightweight object detectors. Given the significant architectural and learning objective disparities between VFMs and resource-constrained detectors, achieving stable and task-aligned semantic transfer is challenging. To address this, on one hand, we introduce a Deep Semantic Injector (DSI) module that facilitates the integration of high-level representations from VFMs into the deep layers of the detector. On the other hand, we devise a Gradient-guided Adaptive Modulation (GAM) strategy, which dynamically adjusts the intensity of semantic transfer based on gradient norm ratios. Without increasing deployment and inference overhead, our approach painlessly delivers striking and consistent performance gains across diverse DETR-based models, underscoring its practical utility for real-time detection. Our new model family, RT-DETRv4, achieves state-of-the-art results on COCO, attaining AP scores of 49.7/53.5/55.4/57.0 at corresponding speeds of 273/169/124/78 FPS.
LGJan 4, 2025
Heterogeneous Graph Pre-training Based Model for Secure and Efficient Prediction of Default Risk Propagation among Bond IssuersXurui Li, Xin Shan, Wenhao Yin et al.
Efficient prediction of default risk for bond-issuing enterprises is pivotal for maintaining stability and fostering growth in the bond market. Conventional methods usually rely solely on an enterprise's internal data for risk assessment. In contrast, graph-based techniques leverage interconnected corporate information to enhance default risk identification for targeted bond issuers. Traditional graph techniques such as label propagation algorithm or deepwalk fail to effectively integrate a enterprise's inherent attribute information with its topological network data. Additionally, due to data scarcity and security privacy concerns between enterprises, end-to-end graph neural network (GNN) algorithms may struggle in delivering satisfactory performance for target tasks. To address these challenges, we present a novel two-stage model. In the first stage, we employ an innovative Masked Autoencoders for Heterogeneous Graph (HGMAE) to pre-train on a vast enterprise knowledge graph. Subsequently, in the second stage, a specialized classifier model is trained to predict default risk propagation probabilities. The classifier leverages concatenated feature vectors derived from the pre-trained encoder with the enterprise's task-specific feature vectors. Through the two-stage training approach, our model not only boosts the importance of unique bond characteristics for specific default prediction tasks, but also securely and efficiently leverage the global information pre-trained from other enterprises. Experimental results demonstrate that our proposed model outperforms existing approaches in predicting default risk for bond issuers.