3 Papers

LGApr 22Code
Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

Aravind Venugopal, Jiayu Chen, Xudong Wu et al.

The temporal lag between actions and their long-term consequences makes credit assignment a challenge when learning goal-directed behaviors from data. Generative world models capture the distribution of future states an agent may visit, indicating that they have captured temporal information. How can that temporal information be extracted to perform credit assignment? In this paper, we formalize how the temporal information stored in world models encodes the underlying geometry of the world. Leveraging optimal transport, we extract this geometry from a learned model of the occupancy measure into a reward function that captures goal-reaching information. Our resulting method, Occupancy Reward Shaping, largely mitigates the problem of credit assignment in sparse reward settings. ORS provably does not alter the optimal policy, yet empirically improves performance by 2.2x across 13 diverse long-horizon locomotion and manipulation tasks. Moreover, we demonstrate the effectiveness of ORS in the real world for controlling nuclear fusion on 3 Tokamak control tasks. Code: https://github.com/aravindvenu7/occupancy_reward_shaping; Website: https://aravindvenu7.github.io/website/ors/

CCApr 22
A Lifting Theorem for Hybrid Classical-Quantum Communication Complexity

Xudong Wu, Guangxu Yang, Penghui Yao

We investigates a model of hybrid classical-quantum communication complexity, in which two parties first exchange classical messages and subsequently communicate using quantum messages. We study the trade-off between the classical and quantum communication for composed functions of the form $f\circ G^n$, where $f:\{0,1\}^n\to\{\pm1\}$ and $G$ is an inner product function of $Θ(\log n)$ bits. To prove the trade-off, we establish a novel lifting theorem for hybrid communication complexity. This theorem unifies two previously separate lifting paradigms: the query-to-communication lifting framework for classical communication complexity and the approximate-degree-to-generalized-discrepancy lifting methods for quantum communication complexity. Our hybrid lifting theorem therefore offers a new framework for proving lower bounds in hybrid classical-quantum communication models. As a corollary, we show that any hybrid protocol communicating $c$ classical bits followed by $q$ qubits to compute $f\circ G^n$ must satisfy $c+q^2=Ω\big(\max\{\mathrm{deg}(f),\mathrm{bs}(f)\}\cdot\log n\big)$, where $\mathrm{deg}(f)$ is the degree of $f$ and $\mathrm{bs}(f)$ is the block sensitivity of $f$. For read-once formula $f$, this yields an almost tight trade-off: either they have to exchange $Θ\big(n\cdot\log n\big)$ classical bits or $\widetildeΘ\big(\sqrt n\cdot\log n\big)$ qubits, showing that classical pre-processing cannot significantly reduce the quantum communication required. To the best of our knowledge, this is the first non-trivial trade-off between classical and quantum communication in hybrid two-way communication complexity.

MLMay 9
Core-Halo Decomposition: Decentralizing Large-Scale Fixed-Point Problems

Haixiang, Yang Xu, Jiefu Zhang et al.

We study solving large-scale fixed-point equation \(x^\star=\bar F(x^\star)\) with decomposition. Standard strict decomposition assigns each agent a disjoint block and evaluates updates using only owned coordinates. For most operators, however, a block update may depend on variables outside the block. Truncating these dependencies by strict decomposition changes the mean operator and creates structural bias that cannot be removed by more samples, smaller stepsizes, or additional consensus. We therefore propose Core-Halo decomposition, which separates write ownership from read-only evaluation context: each agent updates its own core and reads from an overlapping halo. By aligning the Core-Halo decomposition with the block-dependence structure of $\bar F$, the original fixed-point problem can be implemented faithfully in a decentralized multi-agent system. We further characterize the fundamental obstruction faced by strict decomposition through a Bellman closure condition and a blockwise bias lower bound, showing that local-only updates can alter the original fixed-point operator. Finally, we conduct extensive experiments across a range of application settings, and demonstrate that Core-Halo achieves near-centralized performance while retaining the parallelism benefits of decentralization.