ETFeb 10, 2023
Building Intelligence in the Mechanical Domain -- Harvesting the Reservoir Computing Power in Origami to Achieve Information Perception TasksJun Wang, Suyi Li
In this paper, we experimentally examine the cognitive capability of a simple, paper-based Miura-ori -- using the physical reservoir computing framework -- to achieve different information perception tasks. The body dynamics of Miura-ori (aka. its vertices displacements), which is excited by a simple harmonic base excitation, can be exploited as the reservoir computing resource. By recording these dynamics with a high-resolution camera and image processing program and then using linear regression for training, we show that the origami reservoir has sufficient computing capacity to estimate the weight and position of a payload. It can also recognize the input frequency and magnitude patterns. Furthermore, multitasking is achievable by simultaneously applying two targeted functions to the same reservoir state matrix. Therefore, we demonstrate that Miura-ori can assess the dynamic interactions between its body and ambient environment to extract meaningful information -- an intelligent behavior in the mechanical domain. Given that Miura-ori has been widely used to construct deployable structures, lightweight materials, and compliant robots, enabling such information perception tasks can add a new dimension to the functionality of such a versatile structure.
ROMar 25
Interdisciplinary Workshop on Mechanical Intelligence: Summary ReportVictoria A. Webster-Wood, Nicholas Gravish, Amir Alavi et al.
This report provides a summary of the outcomes of the Interdisciplinary Workshop on Mechanical Intelligence held in 2024. Mechanical Intelligence (MI) represents the phenomenon that novel structural features of material/biological/robotic systems can encode intelligence through responsiveness, adaptivity, memory, and learning in the mechanical structure itself. This is in contrast to computational intelligence, wherein the intelligence functions occur through electrical signaling and computer code. The two-day workshop was held at NSF headquarters on May 30-31 and included 38 invited academic researcher participants, and 8 program officers from the NSF. The workshop was structured around active small and large group discussions in groups of 4-5 and 9-10 with the goal of addressing topical questions on MI. Working groups entered notes into shared presentation slides for each discussion session and presented their outcomes in a final presentation on the last day. Here we summarize the overall outcomes of the workshop.
DCJul 2, 2024
SwiftDiffusion: Efficient Diffusion Model Serving with Add-on ModulesSuyi Li, Lingyun Yang, Xiaoxiao Jiang et al.
Text-to-image (T2I) generation using diffusion models has become a blockbuster service in today's AI cloud. A production T2I service typically involves a serving workflow where a base diffusion model is augmented with various "add-on" modules, notably ControlNet and LoRA, to enhance image generation control. Compared to serving the base model alone, these add-on modules introduce significant loading and computational overhead, resulting in increased latency. In this paper, we present SwiftDiffusion, a system that efficiently serves a T2I workflow through a holistic approach. SwiftDiffusion decouples ControNet from the base model and deploys it as a separate, independently scaled service on dedicated GPUs, enabling ControlNet caching, parallelization, and sharing. To mitigate the high loading overhead of LoRA serving, SwiftDiffusion employs a bounded asynchronous LoRA loading (BAL) technique, allowing LoRA loading to overlap with the initial base model execution by up to k steps without compromising image quality. Furthermore, SwiftDiffusion optimizes base model execution with a novel latent parallelism technique. Collectively, these designs enable SwiftDiffusion to outperform the state-of-the-art T2I serving systems, achieving up to 7.8x latency reduction and 1.6x throughput improvement in serving SDXL models on H800 GPUs, without sacrificing image quality.
ROApr 8Code
OpenPRC: A Unified Open-Source Framework for Physics-to-Task Evaluation in Physical Reservoir ComputingYogesh Phalak, Wen Sin Lor, Apoorva Khairnar et al.
Physical Reservoir Computing (PRC) leverages the intrinsic nonlinear dynamics of physical substrates, mechanical, optical, spintronic, and beyond, as fixed computational reservoirs, offering a compelling paradigm for energy-efficient and embodied machine learning. However, the practical workflow for developing and evaluating PRC systems remains fragmented: existing tools typically address only isolated parts of the pipeline, such as substrate-specific simulation, digital reservoir benchmarking, or readout training. What is missing is a unified framework that can represent both high-fidelity simulated trajectories and real experimental measurements through the same data interface, enabling reproducible evaluation, analysis, and physics-aware optimization across substrates and data sources. We present OpenPRC, an open-source Python framework that fills this gap through a schema-driven physics-to-task pipeline built around five modules: a GPU-accelerated hybrid RK4-PBD physics engine (demlat), a video-based experimental ingestion layer (openprc.vision), a modular learning layer (reservoir), information-theoretic analysis and benchmarking tools (analysis), and physics-aware optimization (optimize). A universal HDF5 schema enforces reproducibility and interoperability, allowing GPU-simulated and experimentally acquired trajectories to enter the same downstream workflow without modification. Demonstrated capabilities include simulations of Origami tessellations, video-based trajectory extraction from a physical reservoir, and a common interface for standardized PRC benchmarking, correlation diagnostics, and capacity analysis. The longer-term vision is to serve as a standardizing layer for the PRC community, compatible with external physics engines including PyBullet, PyElastica, and MERLIN.
CVJun 21, 2024Code
Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot LearningSuyi Li, Chenyi Jiang, Shidong Wang et al.
Compositional Zero-shot Learning (CZSL) aims to identify novel compositions via known attribute-object pairs. The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object, consequently decreasing the classification performance towards novel compositions. Previous remarkable works primarily addressed this issue by focusing on disentangling strategy or utilizing object-based conditional probabilities to constrain the selection space of attributes. Unfortunately, few studies have explored the problem from the perspective of modeling the mechanism of visual primitive interactions. Inspired by the success of vanilla adversarial learning in Cross-Domain Few-Shot Learning, we take a step further and devise a model-agnostic and Primitive-Based Adversarial training (PBadv) method to deal with this problem. Besides, the latest studies highlight the weakness of the perception of hard compositions even under data-balanced conditions. To this end, we propose a novel over-sampling strategy with object-similarity guidance to augment target compositional training data. We performed detailed quantitative analysis and retrieval experiments on well-established datasets, such as UT-Zappos50K, MIT-States, and C-GQA, to validate the effectiveness of our proposed method, and the state-of-the-art (SOTA) performance demonstrates the superiority of our approach. The code is available at https://github.com/lisuyi/PBadv_czsl.
ROApr 29
Stochastic Entanglement of Deterministic Origami Tentacles For Universal Robotic GrippingAlec Boron, Bokun Zheng, Ziyang Zhou et al.
Origami-inspired robotic grippers have shown promising potential for object manipulation tasks due to their compact volume and mechanical flexibility. However, robust capture of objects with random shapes in dynamic working environments often comes at the cost of additional actuation channels and control complexity. Here, we introduce a tendon-driven origami tentacle gripper capable of universal object gripping by exploiting a synergy between local, deterministic deformation programming and global, stochastic entanglements. Each origami tentacle is made by cutting thin Mylar sheets; It features carefully placed holes for routing an actuation tendon, origami creases for controlling the deformation, and a tapered shape. By tailoring these design features, one can prescribe the shrinking, bending, and twisting deformation, eventually creating deterministic coiling with a simple tendon pull. Then, when multiple coiling tentacles are placed in proximity, stochastic entanglement emerges, allowing the tentacles to braid, knot, and grip objects with random shapes. We derived a simulation model by integrating origami mechanics with Cosserat rods to correlate origami design, tendon deformation, and their collective gripping performance. Then, we experimentally tested how these coiling and entangling origami tentacles can grasp objects under gravity and in water. A stow-and-release deployment mechanism was also tested to simulate in-orbit grasping. Overall, the entertaining origami tentacle gripper presents a new strategy for robust object grasping with simple design and actuation.
DCApr 9
LegoDiffusion: Micro-Serving Text-to-Image Diffusion WorkflowsLingyun Yang, Suyi Li, Tianyu Feng et al.
Text-to-image generation executes a diffusion workflow comprising multiple models centered on a base diffusion model. Existing serving systems treat each workflow as an opaque monolith, provisioning, placing, and scaling all constituent models together, which obscures internal dataflow, prevents model sharing, and enforces coarse-grained resource management. In this paper, we make a case for micro-serving diffusion workflows with LegoDiffusion, a system that decomposes a workflow into loosely coupled model-execution nodes that can be independently managed and scheduled. By explicitly managing individual model inference, LegoDiffusion unlocks cluster-scale optimizations, including per-model scaling, model sharing, and adaptive model parallelism. Collectively, LegoDiffusion outperforms existing diffusion workflow serving systems, sustaining up to 3x higher request rates and tolerating up to 8x higher burst traffic.
DCMay 27, 2025
InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and SchedulingXiaoxiao Jiang, Suyi Li, Lingyun Yang et al.
Generative image editing using diffusion models has become a prevalent application in today's AI cloud services. In production environments, image editing typically involves a mask that specifies the regions of an image template to be edited. The use of masks provides direct control over the editing process and introduces sparsity in the model inference. In this paper, we present InstGenIE, a system that efficiently serves image editing requests. The key insight behind InstGenIE is that image editing only modifies the masked regions of image templates while preserving the original content in the unmasked areas. Driven by this insight, InstGenIE judiciously skips redundant computations associated with the unmasked areas by reusing cached intermediate activations from previous inferences. To mitigate the high cache loading overhead, InstGenIE employs a bubble-free pipeline scheme that overlaps computation with cache loading. Additionally, to reduce queuing latency in online serving while improving the GPU utilization, InstGenIE proposes a novel continuous batching strategy for diffusion model serving, allowing newly arrived requests to join the running batch in just one step of denoising computation, without waiting for the entire batch to complete. As heterogeneous masks induce imbalanced loads, InstGenIE also develops a load balancing strategy that takes into account the loads of both computation and cache loading. Collectively, InstGenIE outperforms state-of-the-art diffusion serving systems for image editing, achieving up to 3x higher throughput and reducing average request latency by up to 14.7x while ensuring image quality.
ROJan 20, 2021
Physical Reservoir Computing with Origami and its Application to Robotic CrawlingPriyanka Bhovad, Suyi Li
A new paradigm called physical reservoir computing has recently emerged, where the nonlinear dynamics of high-dimensional and fixed physical systems are harnessed as a computational resource to achieve complex tasks. Via extensive simulations based on a dynamic truss-frame model, this study shows that an origami structure can perform as a dynamic reservoir with sufficient computing power to emulate high-order nonlinear systems, generate stable limit cycles, and modulate outputs according to dynamic inputs. This study also uncovers the linkages between the origami reservoir's physical designs and its computing power, offering a guideline to optimize the computing performance. Comprehensive parametric studies show that selecting optimal feedback crease distribution and fine-tuning the underlying origami folding designs are the most effective approach to improve computing performance. Furthermore, this study shows how origami's physical reservoir computing power can apply to soft robotic control problems by a case study of earthworm-like peristaltic crawling without traditional controllers. These results can pave the way for origami-based robots with embodied mechanical intelligence.
ROOct 26, 2020
Exploiting the Nonlinear Stiffness of TMP Origami Folding to Enhance Robotic Jumping PerformanceSahand Sadeghi, Samuel Allison, Blake Betsill et al.
Via numerical simulation and experimental assessment, this study examines the use of origami folding to develop robotic jumping mechanisms with tailored nonlinear stiffness to improve dynamic performance. Specifically, we use Tachi-Miura Polyhedron (TMP) bellow origami -- which exhibits a nonlinear "strain-softening" force-displacement curve -- as a jumping robotic skeleton with embedded energy storage. TMP's nonlinear stiffness allows it to store more energy than a linear spring and offers improved jumping height and airtime. Moreover, the nonlinearity can be tailored by directly changing the underlying TMP crease geometry. A critical challenge is to minimize the TMP's hysteresis and energy loss during its compression stage right before jumping. So we used the plastically annealed lamina emergent origami (PALEO) concept to modify the TMP creases. PALEO increases the folding limit before plastic deformation occurs, thus improving the overall strain energy retention. Jumping experiments confirmed that a nonlinear TMP mechanism achieved roughly 9% improvement in air time and a 13% improvement in jumping height compared to a "control" TMP sample with a relatively linear stiffness. This study's results validate the advantages of using origami in robotic jumping mechanisms and demonstrate the benefits of utilizing nonlinear spring elements for improving jumping performance. Therefore, they could foster a new family of energetically efficient jumping mechanisms with optimized performance in the future.
APP-PHAug 17, 2020
Harnessing The Multi-Stability Of Kresling Origami For Reconfigurable Articulation In Soft Robotic ArmsJoshua Kaufmann, Suyi Li
This study examines a biology-inspired approach of using reconfigurable articulation to reduce the control requirement for soft robotic arms. We construct a robotic arm by assembling Kresling origami modules that exhibit predictable bistability. Via switching between their two stable states, these origami modules can behave either like a flexible joint with low bending stiffness or like a stiff link with high stiffness, without requiring any continuous power supply. In this way, the robotic arm can exhibit pseudo-linkage kinematics with lower control requirements and improved motion accuracy. A unique advantage of using origami as the robotic arm skeleton is that its bending stiffness ratio between stable states is directly related to the underlying Kresling design. Therefore, we conduct extensive parametric analyses and experimental validations to identify the optimized Kresling pattern for articulation. The results indicate that a higher angle ratio, a smaller resting length at contracted stable state, and a large number of polygon sides can offer more significant and robust bending stiffness tuning. Based on this insight, we construct a proof-of-concept, tendon-driven robotic arm consisting of three modules, and show that it can exhibit the desired reconfigurable articulation behavior. Moreover, the deformations of this manipulator are consistent with kinematic model predictions, which validate the possibility of using simple controllers for such compliant robotic systems.
LGFeb 1, 2020
Learning to Detect Malicious Clients for Robust Federated LearningSuyi Li, Yong Cheng, Wei Wang et al.
Federated learning systems are vulnerable to attacks from malicious clients. As the central server in the system cannot govern the behaviors of the clients, a rogue client may initiate an attack by sending malicious model updates to the server, so as to degrade the learning performance or enforce targeted model poisoning attacks (a.k.a. backdoor attacks). Therefore, timely detecting these malicious model updates and the underlying attackers becomes critically important. In this work, we propose a new framework for robust federated learning where the central server learns to detect and remove the malicious model updates using a powerful detection model, leading to targeted defense. We evaluate our solution in both image classification and sentiment analysis tasks with a variety of machine learning models. Experimental results show that our solution ensures robust federated learning that is resilient to both the Byzantine attacks and the targeted model poisoning attacks.
LGOct 22, 2019
Abnormal Client Behavior Detection in Federated LearningSuyi Li, Yong Cheng, Yang Liu et al.
In federated learning systems, clients are autonomous in that their behaviors are not fully governed by the server. Consequently, a client may intentionally or unintentionally deviate from the prescribed course of federated model training, resulting in abnormal behaviors, such as turning into a malicious attacker or a malfunctioning client. Timely detecting those anomalous clients is therefore critical to minimize their adverse impacts. In this work, we propose to detect anomalous clients at the server side. In particular, we generate low-dimensional surrogates of model weight vectors and use them to perform anomaly detection. We evaluate our solution through experiments on image classification model training over the FEMNIST dataset. Experimental results show that the proposed detection-based approach significantly outperforms the conventional defense-based methods.
CLSep 11, 2019
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to DatabasesTao Yu, Rui Zhang, He Yang Er et al.
We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets:(1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot-value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https://yale-lily.github.io/cosql.
ROJun 10, 2019
Peristaltic locomotion without digital controllers: Exploiting the origami multi-stability to coordinate robotic motionsPriyanka Bhovad, Joshua Kaufmann, Suyi Li
This study proposes and examines a novel approach to generate peristaltic-like locomotion in a segmented origami robot. Specifically, we demonstrate the use of multi-stability embedded in origami skeleton to eliminate the need for multiple actuators or digital controllers to coordinate the complex robotic movements in peristaltic crawling. The crawling robot in this study consists of two serially connected bistable origami segments, each featuring a generalized Kresling design and a foldable anchoring mechanism. Mechanics analysis and experimental testing of this dual-segment module reveal a deterministic deformation sequence or actuation cycle, which is then used to generate the different phases in a peristaltic-like locomotion gait. Instead of individually controlling the segment deformation like in earthworm and other crawling robots, we only control the total length of this robot. Therefore, this approach can significantly reduce the total number of actuators needed for locomotion and simplify the control requirements. Moreover, the richness in Kresling origami design offers us substantial freedom to tailor the locomotion performance. Results of this study will contribute to a paradigm shift in how we can use the mechanics of multi-stability for robotic actuation and control.
CLJun 5, 2019
SParC: Cross-Domain Semantic Parsing in ContextTao Yu, Rui Zhang, Michihiro Yasunaga et al.
We present SParC, a dataset for cross-domainSemanticParsing inContext that consists of 4,298 coherent question sequences (12k+ individual questions annotated with SQL queries). It is obtained from controlled user interactions with 200 complex databases over 138 domains. We provide an in-depth analysis of SParC and show that it introduces new challenges compared to existing datasets. SParC demonstrates complex contextual dependencies, (2) has greater semantic diversity, and (3) requires generalization to unseen domains due to its cross-domain nature and the unseen databases at test time. We experiment with two state-of-the-art text-to-SQL models adapted to the context-dependent, cross-domain setup. The best model obtains an exact match accuracy of 20.2% over all questions and less than10% over all interaction sequences, indicating that the cross-domain setting and the con-textual phenomena of the dataset present significant challenges for future research. The dataset, baselines, and leaderboard are released at https://yale-lily.github.io/sparc.
CLJun 4, 2019
Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical ModelAlexander R. Fabbri, Irene Li, Tianwei She et al.
Automatic generation of summaries from multiple news articles is a valuable tool as the number of online publications grows rapidly. Single document summarization (SDS) systems have benefited from advances in neural encoder-decoder model thanks to the availability of large datasets. However, multi-document summarization (MDS) of news articles has been limited to datasets of a couple of hundred examples. In this paper, we introduce Multi-News, the first large-scale MDS news dataset. Additionally, we propose an end-to-end model which incorporates a traditional extractive summarization model with a standard SDS model and achieves competitive results on MDS datasets. We benchmark several methods on Multi-News and release our data and code in hope that this work will promote advances in summarization in the multi-document setting.