73.6AIMay 11
LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action ModelsBoyang Shen, Kaixiang Yang, Hao Wang et al.
Current Vision-Language-Action (VLA) models typically treat the deepest representation of a vision-language backbone as universally optimal for action prediction. However, robotic manipulation is composed of many frequent closed-loop spatial adjustments, for which excessive abstraction may waste computation and weaken low-level geometric cues essential for precise control. Existing early-exit strategies attempt to reduce computation by stopping at predefined layers or applying heuristic rules such as action consistency, but they do not directly answer when a representation is actually sufficient for action. In this paper, we present LoopVLA, a recurrent VLA architecture that jointly learns representation refinement, action prediction, and sufficiency estimation. LoopVLA iteratively applies a shared Transformer block to refine multimodal tokens, and at each iteration produces both a candidate action and a sufficiency score that estimates whether further refinement is necessary. By sharing parameters across iterations, LoopVLA decouples refinement from absolute layer indices and grounds sufficiency estimation in the evolving representation itself. Since sufficiency has no direct supervision, we introduce a self-supervised distribution alignment objective, where intermediate confidence scores are trained to match the relative action quality across refinement steps, thereby linking sufficiency learning to policy optimization signals. Experiments on LIBERO, LIBERO-Plus, and VLA-Arena show that LoopVLA pushes the efficiency-performance frontier of VLA policies, reducing parameters by 45% and improving inference throughput by up to 1.7 times while matching or outperforming strong baselines in task success.
CRSep 7, 2020
A Blockchain-based Platform Architecture for Multimedia Data ManagementYue Liu, Qinghua Lu, Chunsheng Zhu et al.
Massive amounts of multimedia data (i.e., text, audio, video, graphics and animation) are being generated everyday. Conventionally, multimedia data are managed by the platforms maintained by multimedia service providers, which are generally designed using centralised architecture. However, such centralised architecture may lead to a single point of failure and disputes over royalties or other rights. It is hard to ensure the data integrity and track fulfilment of obligations listed on the copyright agreement. To tackle these issues, in this paper, we present a blockchain-based platform architecture for multimedia data management. We adopt self-sovereign identity for identity management and design a multi-level capability-based mechanism for access control. We implement a proof-of-concept prototype using the proposed approach and evaluate it using a use case. The results show that the proposed approach is feasible and has scalable performance.
DCSep 6, 2020
Blockchain-based Federated Learning for Device Failure Detection in Industrial IoTWeishan Zhang, Qinghua Lu, Qiuyu Yu et al.
Device failure detection is one of most essential problems in industrial internet of things (IIoT). However, in conventional IIoT device failure detection, client devices need to upload raw data to the central server for model training, which might lead to disclosure of sensitive business data. Therefore, in this paper, to ensure client data privacy, we propose a blockchain-based federated learning approach for device failure detection in IIoT. First, we present a platform architecture of blockchain-based federated learning systems for failure detection in IIoT, which enables verifiable integrity of client data. In the architecture, each client periodically creates a Merkle tree in which each leaf node represents a client data record, and stores the tree root on a blockchain. Further, to address the data heterogeneity issue in IIoT failure detection, we propose a novel centroid distance weighted federated averaging (CDW\_FedAvg) algorithm taking into account the distance between positive class and negative class of each client dataset. In addition, to motivate clients to participate in federated learning, a smart contact based incentive mechanism is designed depending on the size and the centroid distance of client data used in local model training. A prototype of the proposed architecture is implemented with our industry partner, and evaluated in terms of feasibility, accuracy and performance. The results show that the approach is feasible, and has satisfactory accuracy and performance.