50.2ARApr 24Code
Microarchitectural Co-Optimization for Sustained Throughput of RISC-V Multi-Lane Chaining Vector ProcessorsWeiying Wang, Zhiwei Zhang
Modern RISC vector processors rely on the synergy of multi-lane parallelism and chaining to achieve high sustained throughput, yet their achieved performance often falls substantially short of the theoretical performance bound due to microarchitectural inefficiencies. In this work, we take the open-source RVV processor Ara as the target platform and analyze the sources of its sustained-throughput loss and optimize the design accordingly. We first establish an ideal multi-lane chaining execution model as a microarchitectural reference for the ideal steady-state progression of the vector backend. Based on this model, we attribute Ara's key bottlenecks to inefficiencies along three critical execution paths: memory-side inefficiencies in data supply and transaction issuance, control-side inefficiencies caused by conservative dependence management and issue control, and operand-delivery inefficiencies caused by access conflicts and result-propagation overhead. To address these bottlenecks, we propose a coordinated set of microarchitectural optimizations. Experimental results show that, without increasing raw memory bandwidth or changing the main processor configuration, Ara-Opt achieves a geometric-mean speedup of 1.33x over baseline Ara. Under roofline-based normalization, the geometric-mean gap-closed ratio reaches 12.2%. In particular, scal, axpy, ger, and gemm achieve speedups of approximately 2.41x, 1.60x, 1.52x, and 1.42x, with corresponding gap-closed ratios of 93.7%, 88.9%, 78.3%, and 59.3%, respectively. These results show that the proposed method can effectively recover sustained-throughput capability lost to microarchitectural inefficiencies in Ara under essentially unchanged hardware resource constraints, and move the implementation points of regular streaming and high-throughput workloads significantly closer to the theoretical performance bound.
MMFeb 9, 2022Code
Image Difference Captioning with Pre-training and Contrastive LearningLinli Yao, Weiying Wang, Qin Jin
The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. The major challenges of this task lie in two aspects: 1) fine-grained visual differences that require learning stronger vision and language association and 2) high-cost of manual annotations that leads to limited supervised data. To address these challenges, we propose a new modeling framework following the pre-training-finetuning paradigm. Specifically, we design three self-supervised tasks and contrastive learning strategies to align visual differences and text descriptions at a fine-grained level. Moreover, we propose a data expansion strategy to utilize extra cross-task supervision information, such as data for fine-grained image classification, to alleviate the limitation of available supervised IDC data. Extensive experiments on two IDC benchmark datasets, CLEVR-Change and Birds-to-Words, demonstrate the effectiveness of the proposed modeling framework. The codes and models will be released at https://github.com/yaolinli/IDC.
ROSep 24, 2021Code
Toolbox Release: A WiFi-Based Relative Bearing Sensor for RoboticsNinad Jadhav, Weiying Wang, Diana Zhang et al.
This paper presents the WiFi-Sensor-for-Robotics (WSR) toolbox, an open source C++ framework. It enables robots in a team to obtain relative bearing to each other, even in non-line-of-sight (NLOS) settings which is a very challenging problem in robotics. It does so by analyzing the phase of their communicated WiFi signals as the robots traverse the environment. This capability, based on the theory developed in our prior works, is made available for the first time as an opensource tool. It is motivated by the lack of easily deployable solutions that use robots' local resources (e.g WiFi) for sensing in NLOS. This has implications for localization, ad-hoc robot networks, and security in multi-robot teams, amongst others. The toolbox is designed for distributed and online deployment on robot platforms using commodity hardware and on-board sensors. We also release datasets demonstrating its performance in NLOS and line-of-sight (LOS) settings for a multi-robot localization usecase. Empirical results show that the bearing estimation from our toolbox achieves mean accuracy of 5.10 degrees. This leads to a median error of 0.5m and 0.9m for localization in LOS and NLOS settings respectively, in a hardware deployment in an indoor office environment.
CVApr 12, 2020Code
YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific VideosShizhe Chen, Weiying Wang, Ludan Ruan et al.
The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e.g. makeup instructional videos. We propose two novel question-answering tasks to evaluate models' fine-grained action understanding abilities. The first task is \textbf{Facial Image Ordering}, which aims to understand visual effects of different actions expressed in natural language to the facial object. The second task is \textbf{Step Ordering}, which aims to measure cross-modal semantic alignments between untrimmed videos and multi-sentence texts. In this paper, we present the challenge guidelines, the dataset used, and performances of baseline models on the two proposed tasks. The baseline codes and models are released at \url{https://github.com/AIM3-RUC/YouMakeup_Baseline}.
RODec 8, 2020
A wireless signal-based sensing framework for roboticsNinad Jadhav, Weiying Wang, Diana Zhang et al.
In this paper we develop the analytical framework for a novel Wireless signal-based Sensing capability for Robotics (WSR) by leveraging robots' mobility. It allows robots to primarily measure relative direction, or Angle-of-Arrival (AOA), to other robots, while operating in non-line-of-sight unmapped environments and without requiring external infrastructure. We do so by capturing all of the paths that a wireless signal traverses as it travels from a transmitting to a receiving robot in the team, which we term as an AOA profile. The key intuition behind our approach is to enable a robot to emulate antenna arrays as it moves freely in 2D and 3D space. The small differences in the phase of the wireless signals are thus processed with knowledge of robots' local displacement to obtain the profile, via a method akin to Synthetic Aperture Radar (SAR). The main contribution of this work is the development of i) a framework to accommodate arbitrary 2D and 3D motion, as well as continuous mobility of both signal transmitting and receiving robots, while computing AOA profiles between them and ii) a Cramer-Rao Bound analysis, based on antenna array theory, that provides a lower bound on the variance in AOA estimation as a function of the geometry of robot motion. We show that allowing robots to use their full mobility in 3D space while performing SAR, results in more accurate AOA profiles and thus better AOA estimation. All analytical developments are substantiated by extensive simulation and hardware experiments on air/ground robot platforms using 5 GHz WiFi. Our experimental results bolster our analytical findings, demonstrating that 3D motion provides enhanced and consistent accuracy, with total AOA error of less than 10 degree for 95% of trials. We also analytically characterize the impact of displacement estimation errors on the measured AOA.
ROJul 12, 2019
Active Rendezvous for Multi-Robot Pose Graph Optimization using Sensing over Wi-FiWeiying Wang, Ninad Jadhav, Paul Vohs et al.
We present a novel framework for collaboration amongst a team of robots performing Pose Graph Optimization (PGO) that addresses two important challenges for multi-robot SLAM: i) that of enabling information exchange "on-demand" via Active Rendezvous without using a map or the robot's location, and ii) that of rejecting outlying measurements. Our key insight is to exploit relative position data present in the communication channel between robots to improve groundtruth accuracy of PGO. We develop an algorithmic and experimental framework for integrating Channel State Information (CSI) with multi-robot PGO; it is distributed, and applicable in low-lighting or featureless environments where traditional sensors often fail. We present extensive experimental results on actual robots and observe that using Active Rendezvous results in a 64% reduction in ground truth pose error and that using CSI observations to aid outlier rejection reduces ground truth pose error by 32%. These results show the potential of integrating communication as a novel sensor for SLAM.