LGNov 26, 2023
Unlearning via Sparse RepresentationsVedant Shah, Frederik Träuble, Ashish Malik et al. · mila
Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the data set. We evaluate the proposed technique on the problem of \textit{class unlearning} using three datasets: CIFAR-10, CIFAR-100, and LACUNA-100. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all three datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost.
35.2AIMar 24
Grounding Vision and Language to 3D Masks for Long-Horizon Box RearrangementAshish Malik, Caleb Lowe, Aayam Shrestha et al.
We study long-horizon planning in 3D environments from under-specified natural-language goals using only visual observations, focusing on multi-step 3D box rearrangement tasks. Existing approaches typically rely on symbolic planners with brittle relational grounding of states and goals, or on direct action-sequence generation from 2D vision-language models (VLMs). Both approaches struggle with reasoning over many objects, rich 3D geometry, and implicit semantic constraints. Recent advances in 3D VLMs demonstrate strong grounding of natural-language referents to 3D segmentation masks, suggesting the potential for more general planning capabilities. We extend existing 3D grounding models and propose Reactive Action Mask Planner (RAMP-3D), which formulates long-horizon planning as sequential reactive prediction of paired 3D masks: a "which-object" mask indicating what to pick and a "which-target-region" mask specifying where to place it. The resulting system processes RGB-D observations and natural-language task specifications to reactively generate multi-step pick-and-place actions for 3D box rearrangement. We conduct experiments across 11 task variants in warehouse-style environments with 1-30 boxes and diverse natural-language constraints. RAMP-3D achieves 79.5% success rate on long-horizon rearrangement tasks and significantly outperforms 2D VLM-based baselines, establishing mask-based reactive policies as a promising alternative to symbolic pipelines for long-horizon planning.
ROAug 16, 2025
No More Blind Spots: Learning Vision-Based Omnidirectional Bipedal Locomotion for Challenging TerrainMohitvishnu S. Gadde, Pranay Dugar, Ashish Malik et al.
Effective bipedal locomotion in dynamic environments, such as cluttered indoor spaces or uneven terrain, requires agile and adaptive movement in all directions. This necessitates omnidirectional terrain sensing and a controller capable of processing such input. We present a learning framework for vision-based omnidirectional bipedal locomotion, enabling seamless movement using depth images. A key challenge is the high computational cost of rendering omnidirectional depth images in simulation, making traditional sim-to-real reinforcement learning (RL) impractical. Our method combines a robust blind controller with a teacher policy that supervises a vision-based student policy, trained on noise-augmented terrain data to avoid rendering costs during RL and ensure robustness. We also introduce a data augmentation technique for supervised student training, accelerating training by up to 10 times compared to conventional methods. Our framework is validated through simulation and real-world tests, demonstrating effective omnidirectional locomotion with minimal reliance on expensive rendering. This is, to the best of our knowledge, the first demonstration of vision-based omnidirectional bipedal locomotion, showcasing its adaptability to diverse terrains.
RODec 11, 2019
Zero-shot generalization using cascaded system-representationsAshish Malik
Deep reinforcement learning has been applied to solve a variety of control problems in isolation. However, the learned latent representations cannot be optimally reused for other analogous tasks and/or control systems without additional training or tuning. In this regard, we propose a novel framework that can be used to learn a single control policy for a whole class of analogous control systems. The framework is abbreviated as CASNET and it leverages the similarities in the designs of analogous control-systems to learn general-purpose abstract system-representations. The framework uses a cascade of recurrent neural networks-based encoders to create these representations which are then fed to a conventional policy network as input. A similar cascade of decoders decodes the output of the policy network to generate system-specific output. We illustrate the effectiveness of this framework on arguably the most significant use-case of DRL: Robotics. In this paper, we use CASNET to learn generalizable control policies for two separate classes of robots: planer-manipulators and crawling robots, using 15+ and 55+ morphologically analogous simulated robots respectively. These robot models encompass the most common design variations used in the real world. Our empirical results using state of the art on and off policy learning algorithms show that on average, CASNET agent achieves zero shot optimal performance (performance equivalent to expert agents trained for individual robot models) on unseen robot models. These results illustrate that the performance of the learned policy is bound the learning algorithm rather than the framework itself. The proposed framework serves a major step towards universal controllers.