Wei Shi

h-index15

18papers

1,756citations

Novelty48%

AI Score41

Ranked #64,255 of 194,257 authors (top 33%)#12,469 in CL (top 41%)

18 Papers

32.4CLMar 16, 2022

E-KAR: A Benchmark for Rationalizing Natural Language Analogical Reasoning

Jiangjie Chen, Rui Xu, Ziquan Fu et al. · cmu

The ability to recognize analogies is fundamental to human cognition. Existing benchmarks to test word analogy do not reveal the underneath process of analogical reasoning of neural models. Holding the belief that models capable of reasoning should be right for the right reasons, we propose a first-of-its-kind Explainable Knowledge-intensive Analogical Reasoning benchmark (E-KAR). Our benchmark consists of 1,655 (in Chinese) and 1,251 (in English) problems sourced from the Civil Service Exams, which require intensive background knowledge to solve. More importantly, we design a free-text explanation scheme to explain whether an analogy should be drawn, and manually annotate them for each and every question and candidate answer. Empirical results suggest that this benchmark is very challenging for some state-of-the-art models for both explanation generation and analogical question answering tasks, which invites further research in this area.

4.3ETJul 13, 2022

RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL

Wei Shi, Hanrui Wang, Jiaqi Gu et al. · mit

Analog/mixed-signal circuit design is one of the most complex and time-consuming stages in the whole chip design process. Due to various process, voltage, and temperature (PVT) variations from chip manufacturing, analog circuits inevitably suffer from performance degradation. Although there has been plenty of work on automating analog circuit design under the typical condition, limited research has been done on exploring robust designs under real and unpredictable silicon variations. Automatic analog design against variations requires prohibitive computation and time costs. To address the challenge, we present RobustAnalog, a robust circuit design framework that involves the variation information in the optimization process. Specifically, circuit optimizations under different variations are considered as a set of tasks. Similarities among tasks are leveraged and competitions are alleviated to realize a sample-efficient multi-task training. Moreover, RobustAnalog prunes the task space according to the current performance in each iteration, leading to a further simulation cost reduction. In this way, RobustAnalog can rapidly produce a set of circuit parameters that satisfies diverse constraints (e.g. gain, bandwidth, noise...) across variations. We compare RobustAnalog with Bayesian optimization, Evolutionary algorithm, and Deep Deterministic Policy Gradient (DDPG) and demonstrate that RobustAnalog can significantly reduce required optimization time by 14-30 times. Therefore, our study provides a feasible method to handle various real silicon conditions.

6.5CVAug 25, 2022Code

Identity-Sensitive Knowledge Propagation for Cloth-Changing Person Re-identification

Jianbing Wu, Hong Liu, Wei Shi et al.

Cloth-changing person re-identification (CC-ReID), which aims to match person identities under clothing changes, is a new rising research topic in recent years. However, typical biometrics-based CC-ReID methods often require cumbersome pose or body part estimators to learn cloth-irrelevant features from human biometric traits, which comes with high computational costs. Besides, the performance is significantly limited due to the resolution degradation of surveillance images. To address the above limitations, we propose an effective Identity-Sensitive Knowledge Propagation framework (DeSKPro) for CC-ReID. Specifically, a Cloth-irrelevant Spatial Attention module is introduced to eliminate the distraction of clothing appearance by acquiring knowledge from the human parsing module. To mitigate the resolution degradation issue and mine identity-sensitive cues from human faces, we propose to restore the missing facial details using prior facial knowledge, which is then propagated to a smaller network. After training, the extra computations for human parsing or face restoration are no longer required. Extensive experiments show that our framework outperforms state-of-the-art methods by a large margin. Our code is available at https://github.com/KimbingNg/DeskPro.

1.2SYApr 3, 2018

Distributed Equilibrium-Learning for Power Network Voltage Control With a Locally Connected Communication Network

Kaiqing Zhang, Wei Shi, Hao Zhu et al.

In current power distribution systems, one of the most challenging operation tasks is to coordinate the network- wide distributed energy resources (DERs) to maintain the stability of voltage magnitude of the system. This voltage control task has been investigated actively under either distributed optimization-based or local feedback control-based characterizations. The former architecture requires a strongly-connected communication network among all DERs for implementing the optimization algorithms, a scenario not yet realistic in most of the existing distribution systems with under-deployed communication infrastructure. The latter one, on the other hand, has been proven to suffer from loss of network-wide op- erational optimality. In this paper, we propose a game-theoretic characterization for semi-local voltage control with only a locally connected communication network. We analyze the existence and uniqueness of the generalized Nash equilibrium (GNE) for this characterization and develop a fully distributed equilibrium-learning algorithm that relies on only neighbor-to-neighbor information exchange. Provable convergence results are provided along with numerical tests which corroborate the robust convergence property of the proposed algorithm.

1.2SPJul 13, 2022

Federated Multi-Task Learning for THz Wideband Channel and DoA Estimation

Ahmet M. Elbir, Wei Shi, Kumar Vijay Mishra et al.

This paper addresses two major challenges in terahertz (THz) channel estimation: the beam-split phenomenon, i.e., beam misalignment because of frequency-independent analog beamformers, and computational complexity because of the usage of ultra-massive number of antennas to compensate propagation losses. Data-driven techniques are known to mitigate the complexity of this problem but usually require the transmission of the datasets from the users to a central server entailing huge communication overhead. In this work, we introduce a federated multi-task learning (FMTL), wherein the users transmit only the model parameters instead of the whole dataset, for THz channel and user direction-of-arrival (DoA) estimation to improve the communications-efficiency. We first propose a novel beamspace support alignment technique for channel estimation with beam-split correction. Then, the channel and DoA information are used as labels to train an FMTL model. By exploiting the sparsity of the THz channel, the proposed approach is implemented with fewer pilot signals than the traditional techniques. Compared to the previous works, our FMTL approach provides higher channel estimation accuracy as well as approximately 25 (32) times lower model (channel) training overhead, respectively.

1.2SPJun 24, 2022

Implicit Channel Learning for Machine Learning Applications in 6G Wireless Networks

Ahmet M. Elbir, Wei Shi, Kumar Vijay Mishra et al.

With the deployment of the fifth generation (5G) wireless systems gathering momentum across the world, possible technologies for 6G are under active research discussions. In particular, the role of machine learning (ML) in 6G is expected to enhance and aid emerging applications such as virtual and augmented reality, vehicular autonomy, and computer vision. This will result in large segments of wireless data traffic comprising image, video and speech. The ML algorithms process these for classification/recognition/estimation through the learning models located on cloud servers. This requires wireless transmission of data from edge devices to the cloud server. Channel estimation, handled separately from recognition step, is critical for accurate learning performance. Toward combining the learning for both channel and the ML data, we introduce implicit channel learning to perform the ML tasks without estimating the wireless channel. Here, the ML models are trained with channel-corrupted datasets in place of nominal data. Without channel estimation, the proposed approach exhibits approximately 60% improvement in image and speech classification tasks for diverse scenarios such as millimeter wave and IEEE 802.11p vehicular channels.

3.3ITApr 27, 2022Code

Supervised Contrastive CSI Representation Learning for Massive MIMO Positioning

Junquan Deng, Wei Shi, Jianzhao Zhang et al.

Similarity metric is crucial for massive MIMO positioning utilizing channel state information~(CSI). In this letter, we propose a novel massive MIMO CSI similarity learning method via deep convolutional neural network~(DCNN) and contrastive learning. A contrastive loss function is designed considering multiple positive and negative CSI samples drawn from a training dataset. The DCNN encoder is trained using the loss so that positive samples are mapped to points close to the anchor's encoding, while encodings of negative samples are kept away from the anchor's in the representation space. Evaluation results of fingerprint-based positioning on a real-world CSI dataset show that the learned similarity metric improves positioning accuracy significantly compared with other known state-of-the-art methods.

16.5LGMar 15, 2023

Efficient and Secure Federated Learning for Financial Applications

Tao Liu, Zhi Wang, Hui He et al.

The conventional machine learning (ML) and deep learning approaches need to share customers' sensitive information with an external credit bureau to generate a prediction model that opens the door to privacy leakage. This leakage risk makes financial companies face an enormous challenge in their cooperation. Federated learning is a machine learning setting that can protect data privacy, but the high communication cost is often the bottleneck of the federated systems, especially for large neural networks. Limiting the number and size of communications is necessary for the practical training of large neural structures. Gradient sparsification has received increasing attention as a method to reduce communication cost, which only updates significant gradients and accumulates insignificant gradients locally. However, the secure aggregation framework cannot directly use gradient sparsification. This article proposes two sparsification methods to reduce communication cost in federated learning. One is a time-varying hierarchical sparsification method for model parameter update, which solves the problem of maintaining model accuracy after high ratio sparsity. It can significantly reduce the cost of a single communication. The other is to apply the sparsification method to the secure aggregation framework. We sparse the encryption mask matrix to reduce the cost of communication while protecting privacy. Experiments show that under different Non-IID experiment settings, our method can reduce the upload communication cost to about 2.9% to 18.9% of the conventional federated learning algorithm when the sparse rate is 0.01.

29.3LGMar 11, 2025Code

Route Sparse Autoencoder to Interpret Large Language Models

Wei Shi, Sihang Li, Tao Liang et al.

Mechanistic interpretability of large language models (LLMs) aims to uncover the internal processes of information propagation and reasoning. Sparse autoencoders (SAEs) have demonstrated promise in this domain by extracting interpretable and monosemantic features. However, prior works primarily focus on feature extraction from a single layer, failing to effectively capture activations that span multiple layers. In this paper, we introduce Route Sparse Autoencoder (RouteSAE), a new framework that integrates a routing mechanism with a shared SAE to efficiently extract features from multiple layers. It dynamically assigns weights to activations from different layers, incurring minimal parameter overhead while achieving high interpretability and flexibility for targeted feature manipulation. We evaluate RouteSAE through extensive experiments on Llama-3.2-1B-Instruct. Specifically, under the same sparsity constraint of 64, RouteSAE extracts 22.5% more features than baseline SAEs while achieving a 22.3% higher interpretability score. These results underscore the potential of RouteSAE as a scalable and effective method for LLM interpretability, with applications in feature discovery and model intervention. Our codes are available at https://github.com/swei2001/RouteSAEs.

3.9CVFeb 7, 2023

PAMI: partition input and aggregate outputs for model interpretation

Wei Shi, Wentao Zhang, Weishi Zheng et al.

There is an increasing demand for interpretation of model predictions especially in high-risk applications. Various visualization approaches have been proposed to estimate the part of input which is relevant to a specific model prediction. However, most approaches require model structure and parameter details in order to obtain the visualization results, and in general much effort is required to adapt each approach to multiple types of tasks particularly when model backbone and input format change over tasks. In this study, a simple yet effective visualization framework called PAMI is proposed based on the observation that deep learning models often aggregate features from local regions for model predictions. The basic idea is to mask majority of the input and use the corresponding model output as the relative contribution of the preserved input part to the original model prediction. For each input, since only a set of model outputs are collected and aggregated, PAMI does not require any model detail and can be applied to various prediction tasks with different model backbones and input formats. Extensive experiments on multiple tasks confirm the proposed method performs better than existing visualization approaches in more precisely finding class-specific input regions, and when applied to different model backbones and input formats. The source code will be released publicly.

8.3CLJul 1, 2025Code

SAFER: Probing Safety in Reward Models with Sparse Autoencoder

Sihang Li, Wei Shi, Ziyuan Xie et al.

Reinforcement learning from human feedback (RLHF) is a key paradigm for aligning large language models (LLMs) with human values, yet the reward models at its core remain largely opaque. In this work, we present sparse Autoencoder For Enhanced Reward model (\textbf{SAFER}), a novel framework for interpreting and improving reward models through mechanistic analysis. Leveraging Sparse Autoencoders (SAEs), we uncover human-interpretable features in reward model activations, enabling insight into safety-relevant decision-making. We apply SAFER to safety-oriented preference datasets and quantify the salience of individual features by activation differences between chosen and rejected responses. Using these feature-level signals, we design targeted data poisoning and denoising strategies. Experiments show that SAFER can precisely degrade or enhance safety alignment with minimal data modification, without sacrificing general chat performance. Our approach contributes to interpreting, auditing and refining reward models in high-stakes LLM alignment tasks. Our codes are available at https://github.com/xzy-101/SAFER-code. \textit{This paper discusses topics related to large language model safety and may include discussions or examples that highlight potential risks or unsafe outcomes.}

21.1CVDec 5, 2021Code

Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer

Tao Wang, Hong Liu, Pinhao Song et al.

Occluded person re-identification is a challenging task as human body parts could be occluded by some obstacles (e.g. trees, cars, and pedestrians) in certain scenes. Some existing pose-guided methods solve this problem by aligning body parts according to graph matching, but these graph-based methods are not intuitive and complicated. Therefore, we propose a transformer-based Pose-guided Feature Disentangling (PFD) method by utilizing pose information to clearly disentangle semantic components (e.g. human body or joint parts) and selectively match non-occluded parts correspondingly. First, Vision Transformer (ViT) is used to extract the patch features with its strong capability. Second, to preliminarily disentangle the pose information from patch information, the matching and distributing mechanism is leveraged in Pose-guided Feature Aggregation (PFA) module. Third, a set of learnable semantic views are introduced in transformer decoder to implicitly enhance the disentangled body part features. However, those semantic views are not guaranteed to be related to the body without additional supervision. Therefore, Pose-View Matching (PVM) module is proposed to explicitly match visible body parts and automatically separate occlusion features. Fourth, to better prevent the interference of occlusions, we design a Pose-guided Push Loss to emphasize the features of visible body parts. Extensive experiments over five challenging datasets for two tasks (occluded and holistic Re-ID) demonstrate that our proposed PFD is superior promising, which performs favorably against state-of-the-art methods. Code is available at https://github.com/WangTaoAs/PFD_Net

27.7CLMay 10, 2023Code

Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge

Jiangjie Chen, Wei Shi, Ziquan Fu et al.

Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as "lions don't live in the ocean", is also ubiquitous in the world but rarely mentioned explicitly in the text. What do LLMs know about negative knowledge? This work examines the ability of LLMs to negative commonsense knowledge. We design a constrained keywords-to-sentence generation task (CG) and a Boolean question-answering task (QA) to probe LLMs. Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions. We term this phenomenon the belief conflict of LLMs. Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict.

4.4LGNov 26, 2021

Semi-supervised t-SNE for Millimeter-wave Wireless Localization

Junquan Deng, Wei Shi, Jian Hu et al.

We consider the mobile localization problem in future millimeter-wave wireless networks with distributed Base Stations (BSs) based on multi-antenna channel state information (CSI). For this problem, we propose a Semi-supervised tdistributed Stochastic Neighbor Embedding (St-SNE) algorithm to directly embed the high-dimensional CSI samples into the 2D geographical map. We evaluate the performance of St-SNE in a simulated urban outdoor millimeter-wave radio access network. Our results show that St-SNE achieves a mean localization error of 6.8 m with only 5% of labeled CSI samples in a 200*200 m^2 area with a ray-tracing channel model. St-SNE does not require accurate synchronization among multiple BSs, and is promising for future large-scale millimeter-wave localization.

20.3CRFeb 19, 2021

Obfuscation of Images via Differential Privacy: From Facial Images to General Images

William Croft, Jörg-Rüdiger Sack, Wei Shi

Due to the pervasiveness of image capturing devices in every-day life, images of individuals are routinely captured. Although this has enabled many benefits, it also infringes on personal privacy. A promising direction in research on obfuscation of facial images has been the work in the k-same family of methods which employ the concept of k-anonymity from database privacy. However, there are a number of deficiencies of k-anonymity that carry over to the k-same methods, detracting from their usefulness in practice. In this paper, we first outline several of these deficiencies and discuss their implications in the context of facial obfuscation. We then develop a framework through which we obtain a formal differentially private guarantee for the obfuscation of facial images in generative machine learning models. Our approach provides a provable privacy guarantee that is not susceptible to the outlined deficiencies of k-same obfuscation and produces photo-realistic obfuscated output. In addition, we demonstrate through experimental comparisons that our approach can achieve comparable utility to k-same obfuscation in terms of preservation of useful features in the images. Furthermore, we propose a method to achieve differential privacy for any image (i.e., without restriction to facial images) through the direct modification of pixel intensities. Although the addition of noise to pixel intensities does not provide the high visual quality obtained via generative machine learning models, it offers greater versatility by eliminating the need for a trained model. We demonstrate that our proposed use of the exponential mechanism in this context is able to provide superior visual quality to pixel-space obfuscation using the Laplace mechanism.

6.8CRFeb 12, 2019

Real Time Lateral Movement Detection based on Evidence Reasoning Network for Edge Computing Environment

Zhihong Tian, Wei Shi, Yuhang Wang et al.

Edge computing is providing higher class intelligent service and computing capabilities at the edge of the network. The aim is to ease the backhaul impacts and offer an improved user experience, however, the edge artificial intelligence exacerbates the security of the cloud computing environment due to the dissociation of data, access control and service stages. In order to prevent users from using the edge-cloud computing environment to carry out lateral movement attacks, we proposed a method named CloudSEC meaning real time lateral movement detection based on evidence reasoning network for the edge-cloud environment. The concept of vulnerability correlation is introduced. Based on the vulnerability knowledge and environmental information of the network system, the evidence reasoning network is constructed, and the lateral movement reasoning ability provided by the evidence reasoning network is used. CloudSEC realizes the reconfiguration of the efficient real-time attack process. The experiment shows that the results are complete and credible.

20.5OCSep 19, 2016

Geometrically Convergent Distributed Optimization with Uncoordinated Step-Sizes

Angelia Nedić, Alex Olshevsky, Wei Shi et al.

A recent algorithmic family for distributed optimization, DIGing's, have been shown to have geometric convergence over time-varying undirected/directed graphs. Nevertheless, an identical step-size for all agents is needed. In this paper, we study the convergence rates of the Adapt-Then-Combine (ATC) variation of the DIGing algorithm under uncoordinated step-sizes. We show that the ATC variation of DIGing algorithm converges geometrically fast even if the step-sizes are different among the agents. In addition, our analysis implies that the ATC structure can accelerate convergence compared to the distributed gradient descent (DGD) structure which has been used in the original DIGing algorithm.

1.2SYJul 23, 2016

Decentralized Dynamic Optimization for Power Network Voltage Control

Hao Jan Liu, Wei Shi, Hao Zhu

Voltage control in power distribution networks has been greatly challenged by the increasing penetration of volatile and intermittent devices. These devices can also provide limited reactive power resources that can be used to regulate the network-wide voltage. A decentralized voltage control strategy can be designed by minimizing a quadratic voltage mismatch error objective using gradient-projection (GP) updates. Coupled with the power network flow, the local voltage can provide the instantaneous gradient information. This paper aims to analyze the performance of this decentralized GP-based voltage control design under two dynamic scenarios: i) the nodes perform the decentralized update in an asynchronous fashion, and ii) the network operating condition is time-varying. For the asynchronous voltage control, we improve the existing convergence condition by recognizing that the voltage based gradient is always up-to-date. By modeling the network dynamics using an autoregressive process and considering time-varying resource constraints, we provide an error bound in tracking the instantaneous optimal solution to the quadratic error objective. This result can be extended to more general \textit{constrained dynamic optimization} problems with smooth strongly convex objective functions under stochastic processes that have bounded iterative changes. Extensive numerical tests have been performed to demonstrate and validate our analytical results for realistic power networks.