CVAug 27, 2024
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe InterpolationXiaojuan Wang, Boyang Zhou, Brian Curless et al. · uw
We present a method for generating video sequences with coherent motion between a pair of input key frames. We adapt a pretrained large-scale image-to-video diffusion model (originally trained to generate videos moving forward in time from a single input image) for key frame interpolation, i.e., to produce a video in between two input frames. We accomplish this adaptation through a lightweight fine-tuning technique that produces a version of the model that instead predicts videos moving backwards in time from a single input image. This model (along with the original forward-moving model) is subsequently used in a dual-directional diffusion sampling process that combines the overlapping model estimates starting from each of the two keyframes. Our experiments show that our method outperforms both existing diffusion-based methods and traditional frame interpolation techniques.
41.0HCJun 3
Addressing Negative Commons Governance with Positive Commons PrinciplesBoyang Zhou, Oleg Ianchenko
Computing is accompanied by both positive and negative commons throughout its lifecycle of creation, execution, and disposal. We examine two governance systems situated within this lifecycle -- global e-waste trade and the Linux kernel community -- to evaluate whether Elinor Ostrom's eight design principles for common-pool resource (CPR) governance extend to the management of negative common-pool resources (NCPRs). Unlike traditional CPRs where communities work to preserve a finite resource (i.e. clean water), NCPR governance seeks to collectively reduce a negative shared stock. In our two cases, e-waste governance aims to reduce the volume of mismanaged waste and illicit trade, while the Linux community aims to reduce the number of error-prone or malicious contributions that reach the main branch and, in turn, extend the life of existing hardware. Through qualitative analysis of primary sources from each domain, we find that the same eight principles by Ostrom that aid positive commons governance tend to appear in successful negative commons governance systems. We argue that future NCPR governance design should prioritize Ostrom's principles, particularly clearly defined boundaries and well-functioning nested structures.
CVOct 23, 2023
F$^2$AT: Feature-Focusing Adversarial Training via Disentanglement of Natural and Perturbed PatternsYaguan Qian, Chenyu Zhao, Zhaoquan Gu et al.
Deep neural networks (DNNs) are vulnerable to adversarial examples crafted by well-designed perturbations. This could lead to disastrous results on critical applications such as self-driving cars, surveillance security, and medical diagnosis. At present, adversarial training is one of the most effective defenses against adversarial examples. However, traditional adversarial training makes it difficult to achieve a good trade-off between clean accuracy and robustness since spurious features are still learned by DNNs. The intrinsic reason is that traditional adversarial training makes it difficult to fully learn core features from adversarial examples when adversarial noise and clean examples cannot be disentangled. In this paper, we disentangle the adversarial examples into natural and perturbed patterns by bit-plane slicing. We assume the higher bit-planes represent natural patterns and the lower bit-planes represent perturbed patterns, respectively. We propose a Feature-Focusing Adversarial Training (F$^2$AT), which differs from previous work in that it enforces the model to focus on the core features from natural patterns and reduce the impact of spurious features from perturbed patterns. The experimental results demonstrated that F$^2$AT outperforms state-of-the-art methods in clean accuracy and adversarial robustness.
31.7HCApr 5Code
HeartbeatCam: Self-Triggered Photo Elicitation of Stress Events Using Wearable SensingBoyang Zhou, Zara Dana
People often recognize what triggered their stress only after the moment has passed. In therapy, this can become a recurring problem: clients are asked to remember what happened between sessions, but the details that matter (where they were, what they saw and heard, what was happening around them) are easy to lose. We introduce HeartbeatCam, a wearable sensing system that gathers contextual information during moments of elevated stress. It uses a consumer smartwatch stress signal to trigger capture from an open-source AR glasses camera, recording a sparse image-audio clip that can later be reviewed and annotated. The system adopts an actionable sensing approach to mental healthcare, using physiological signals along with contextual capture to support collaborative interpretation of stress-triggering moments with mental health professionals.
ROJan 27
E2HiL: Entropy-Guided Sample Selection for Efficient Real-World Human-in-the-Loop Reinforcement LearningHaoyuan Deng, Yuanjiang Xue, Haoyang Du et al.
Human-in-the-loop guidance has emerged as an effective approach for enabling faster convergence in online reinforcement learning (RL) of complex real-world manipulation tasks. However, existing human-in-the-loop RL (HiL-RL) frameworks often suffer from low sample efficiency, requiring substantial human interventions to achieve convergence and thereby leading to high labor costs. To address this, we propose a sample-efficient real-world human-in-the-loop RL framework named \method, which requires fewer human intervention by actively selecting informative samples. Specifically, stable reduction of policy entropy enables improved trade-off between exploration and exploitation with higher sample efficiency. We first build influence functions of different samples on the policy entropy, which is efficiently estimated by the covariance of action probabilities and soft advantages of policies. Then we select samples with moderate values of influence functions, where shortcut samples that induce sharp entropy drops and noisy samples with negligible effect are pruned. Extensive experiments on four real-world manipulation tasks demonstrate that \method achieves a 42.1\% higher success rate while requiring 10.1\% fewer human interventions compared to the state-of-the-art HiL-RL method, validating its effectiveness. The project page providing code, videos, and mathematical formulations can be found at https://e2hil.github.io/.