CLMay 24

DTO: a Differentiable Training Objective for Effective Counterfactual Story Rewriting

arXiv:2605.2488526.6

Predicted impact top 94% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers in controlled text generation, this work offers a simpler and more effective alternative to reinforcement learning for nuanced tasks like counterfactual rewriting.

The paper proposes a differentiable training objective (DTO) for counterfactual story rewriting that jointly optimizes fidelity to the reference rewrite and semantic consistency with the source narrative. On TimeTravel and ART datasets, DTO surpasses a maximum-likelihood baseline and a preference-based approach, performing competitively against two large language models across all metrics.

Counterfactual story rewriting is a natural language processing task that requires updating an existing story to reflect a chosen alternative event, yet preserving all the unaffected storyline elements and overall coherence. While large language models have recently made remarkable progress on this task, it still remains challenging since the required modifications are typically very small in size and highly localized. As a consequence, models trained in a conventional manner with the maximum-likelihood training objective tend to overlook these nuances. At the same time, more sophisticated training approaches based on reinforcement learning are notoriously slow and difficult to set up. For these reasons, our paper proposes a novel, differentiable training objective (DTO) that directly optimizes for the requisite counterfactual improvements. In our approach, a transformer model is fine-tuned via end-to-end backpropagation against a fully differentiable loss function that jointly rewards (i) fidelity to the reference rewrite and (ii) semantic consistency with the source narrative. The empirical evaluation on the TimeTravel and ART datasets shows that the proposed DTO approach has been able to surpass a maximum-likelihood baseline and a preference-based approach, and perform competitively against two contemporary large language models in all evaluation metrics. These findings substantiate the effectiveness of task-specific differentiable objectives for nuanced, controlled text-generation tasks.

View on arXiv PDF

Similar