CVNov 29, 2024Code
Pretrained Reversible Generation as Unsupervised Visual Representation LearningRongkun Xue, Jinouwen Zhang, Yazhe Niu et al.
Recent generative models based on score matching and flow matching have significantly advanced generation tasks, but their potential in discriminative tasks remains underexplored. Previous approaches, such as generative classifiers, have not fully leveraged the capabilities of these models for discriminative tasks due to their intricate designs. We propose Pretrained Reversible Generation (PRG), which extracts unsupervised representations by reversing the generative process of a pretrained continuous generation model. PRG effectively reuses unsupervised generative models, leveraging their high capacity to serve as robust and generalizable feature extractors for downstream tasks. This framework enables the flexible selection of feature hierarchies tailored to specific downstream tasks. Our method consistently outperforms prior approaches across multiple benchmarks, achieving state-of-the-art performance among generative model based methods, including 78% top-1 accuracy on ImageNet at a resolution of 64*64. Extensive ablation studies, including out-of-distribution evaluations, further validate the effectiveness of our approach.PRG is available at https://github.com/opendilab/PRG.
AIJan 8
SciIF: Benchmarking Scientific Instruction Following Towards Rigorous Scientific IntelligenceEncheng Su, Jianyu Wu, Chen Tang et al.
As large language models (LLMs) transition from general knowledge retrieval to complex scientific discovery, their evaluation standards must also incorporate the rigorous norms of scientific inquiry. Existing benchmarks exhibit a critical blind spot: general instruction-following metrics focus on superficial formatting, while domain-specific scientific benchmarks assess only final-answer correctness, often rewarding models that arrive at the right result with the wrong reasons. To address this gap, we introduce scientific instruction following: the capability to solve problems while strictly adhering to the constraints that establish scientific validity. Specifically, we introduce SciIF, a multi-discipline benchmark that evaluates this capability by pairing university-level problems with a fixed catalog of constraints across three pillars: scientific conditions (e.g., boundary checks and assumptions), semantic stability (e.g., unit and symbol conventions), and specific processes(e.g., required numerical methods). Uniquely, SciIF emphasizes auditability, requiring models to provide explicit evidence of constraint satisfaction rather than implicit compliance. By measuring both solution correctness and multi-constraint adherence, SciIF enables finegrained diagnosis of compositional reasoning failures, ensuring that LLMs can function as reliable agents within the strict logical frameworks of science.
CLApr 10
S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question AnsweringEncheng Su, Jinouwen Zhang, Jianyu Wu et al.
Long-horizon interactive agents often accumulate large trajectory histories yet still fail to answer questions about earlier events reliably. We argue that the main bottleneck is not context length alone, but the trajectory-to-answer interface of long-term memory. When histories are stored as plain-text chunks and queried with standard retrieval-augmented generation (RAG), systems often retrieve locally relevant but chain-incomplete evidence, especially for spatial, temporal, repeated-event, and multi-hop state questions. We propose S3MEM, a structured scene-event episodic memory framework for long-horizon interactive question answering (QA). S3MEM writes trajectories into structured memory units, retrieves evidence through anchor-sensitive retrieval, and exposes a compact token-budget-aware evidence interface for answer-time inference. In this sense, S3MEM is a structured evidence harness that converts agent trajectories into query-aligned support. We evaluate S3MEM on two internal headline environments (Crafter, Jericho) and two out-of-family environments (SciWorld, ALFWorld). Under a shared frozen answer-time protocol, S3MEM consistently outperforms Vanilla RAG across all four environments, surpasses Graph-NoReader on Crafter, Jericho, and ALFWorld, and matches it on SciWorld while using dramatically fewer evidence tokens. Three adapted recent baselines -- A-MEM-inspired, MemoryOS-adapted, and LightMem-adapted -- improve over Vanilla RAG in several settings, but none matches S3MEM's overall accuracy-efficiency frontier. Overall, the evidence supports a bounded conclusion: under the current frozen answer-time protocol, structured writing and anchor-sensitive evidence routing provide a stronger accuracy-efficiency frontier for long-horizon interactive QA than more generic memory interfaces.
LGDec 2, 2024
Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic PerspectiveJinouwen Zhang, Rongkun Xue, Yazhe Niu et al.
Generative models, particularly diffusion models, have achieved remarkable success in density estimation for multimodal data, drawing significant interest from the reinforcement learning (RL) community, especially in policy modeling in continuous action spaces. However, existing works exhibit significant variations in training schemes and RL optimization objectives, and some methods are only applicable to diffusion models. In this study, we compare and analyze various generative policy training and deployment techniques, identifying and validating effective designs for generative policy algorithms. Specifically, we revisit existing training objectives and classify them into two categories, each linked to a simpler approach. The first approach, Generative Model Policy Optimization (GMPO), employs a native advantage-weighted regression formulation as the training objective, which is significantly simpler than previous methods. The second approach, Generative Model Policy Gradient (GMPG), offers a numerically stable implementation of the native policy gradient method. We introduce a standardized experimental framework named GenerativeRL. Our experiments demonstrate that the proposed methods achieve state-of-the-art performance on various offline-RL datasets, offering a unified and practical guideline for training and deploying generative policies.
LGFeb 15, 2025
FuncGenFoil: Airfoil Generation and Editing Model in Function SpaceJinouwen Zhang, Junjie Ren, Aobo Yang et al.
Aircraft manufacturing is the jewel in the crown of industry, in which generating high-fidelity airfoil geometries with controllable and editable representations remains a fundamental challenge. Existing deep learning methods, which typically rely on predefined parametric representations (e.g., Bézier) or discrete point sets, face an inherent trade-off between expressive power and resolution adaptability. To tackle this challenge, we introduce FuncGenFoil, a novel function-space generative model that directly reconstructs airfoil geometries as function curves. Our method inherits the advantages of arbitrary-resolution sampling and smoothness from parametric functions, as well as the strong expressiveness of discrete point-based representations. Empirical evaluations demonstrate that FuncGenFoil improves upon state-of-the-art methods in airfoil generation, achieving a relative 74.4% reduction in label error and a 23.2% increase in diversity on the AF-200K dataset. Our results highlight the advantages of function-space modeling for aerodynamic shape optimization, offering a powerful and flexible framework for high-fidelity airfoil design.