DyRRen: A Dynamic Retriever-Reranker-Generator Model for Numerical Reasoning over Tabular and Textual DataXiao Li, Yin Zhu, Sichen Liu et al.
Numerical reasoning over hybrid data containing tables and long texts has recently received research attention from the AI community. To generate an executable reasoning program consisting of math and table operations to answer a question, state-of-the-art methods use a retriever-generator pipeline. However, their retrieval results are static, while different generation steps may rely on different sentences. To attend to the retrieved information that is relevant to each generation step, in this paper, we propose DyRRen, an extended retriever-reranker-generator framework where each generation step is enhanced by a dynamic reranking of retrieved sentences. It outperforms existing baselines on the FinQA dataset.
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You ThinkJie Tian, Xiaoye Qu, Zhenyi Lu et al.
Image-to-Video (I2V) generation aims to synthesize a video clip according to a given image and condition (e.g., text). The key challenge of this task lies in simultaneously generating natural motions while preserving the original appearance of the images. However, current I2V diffusion models (I2V-DMs) often produce videos with limited motion degrees or exhibit uncontrollable motion that conflicts with the textual condition. To address these limitations, we propose a novel Extrapolating and Decoupling framework, which introduces model merging techniques to the I2V domain for the first time. Specifically, our framework consists of three separate stages: (1) Starting with a base I2V-DM, we explicitly inject the textual condition into the temporal module using a lightweight, learnable adapter and fine-tune the integrated model to improve motion controllability. (2) We introduce a training-free extrapolation strategy to amplify the dynamic range of the motion, effectively reversing the fine-tuning process to enhance the motion degree significantly. (3) With the above two-stage models excelling in motion controllability and degree, we decouple the relevant parameters associated with each type of motion ability and inject them into the base I2V-DM. Since the I2V-DM handles different levels of motion controllability and dynamics at various denoising time steps, we adjust the motion-aware parameters accordingly over time. Extensive qualitative and quantitative experiments have been conducted to demonstrate the superiority of our framework over existing methods.
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization AlignmentChenghao Fan, Zhenyi Lu, Sichen Liu et al.
While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs), its performance often falls short of Full Fine-Tuning (Full FT). Current methods optimize LoRA by initializing with static singular value decomposition (SVD) subsets, leading to suboptimal leveraging of pre-trained knowledge. Another path for improving LoRA is incorporating a Mixture-of-Experts (MoE) architecture. However, weight misalignment and complex gradient dynamics make it challenging to adopt SVD prior to the LoRA MoE architecture. To mitigate these issues, we propose \underline{G}reat L\underline{o}R\underline{A} Mixture-of-Exper\underline{t} (GOAT), a framework that (1) adaptively integrates relevant priors using an SVD-structured MoE, and (2) aligns optimization with full fine-tuned MoE by deriving a theoretical scaling factor. We demonstrate that proper scaling, without modifying the architecture or training algorithms, boosts LoRA MoE's efficiency and performance. Experiments across 25 datasets, including natural language understanding, commonsense reasoning, image classification, and natural language generation, demonstrate GOAT's state-of-the-art performance, closing the gap with Full FT.
FormulaReasoning: A Dataset for Formula-Based Numerical ReasoningXiao Li, Bolin Zhu, Kaiwen Shi et al.
The application of formulas (e.g., physics formulas) is a fundamental human ability in solving numerical reasoning problems. Existing numerical reasoning datasets rarely explicitly state the formulas employed, as their questions often rely on implicit commonsense mathematical knowledge. To address this gap, we introduce FormulaReasoning, a new dataset specifically designed for formula-based numerical reasoning. It consists of 5,324 questions that require numerical calculations grounded in external physics formulas. We provide normalized, fine-grained annotations in both English and Chinese, including formula structures, parameter names, symbols, numerical values, and units-curated through extensive manual effort with LLM-assisted validation to ensure high quality. Additionally, we offer a consolidated formula database to serve as an external knowledge source. We analyze various reasoning approaches on FormulaReasoning, with emphasis on comparative evaluation of different architectural and methodological frameworks. Our assessment includes retrieval-augmented methods, approaches that decompose reasoning into formula generation, parameter extraction, and numerical calculation, as well as optimization techniques using preference data. We identify key challenges in formula-based numerical reasoning that require further investigation across different reasoning paradigms, highlighting opportunities for methodological advancement.