93.2ROApr 20Code
ST-$π$: Structured SpatioTemporal VLA for Robotic ManipulationChuanhao Ma, Hanyu Zhou, Shihan Peng et al.
Vision-language-action (VLA) models have achieved great success on general robotic tasks, but still face challenges in fine-grained spatiotemporal manipulation. Typically, existing methods mainly embed spatiotemporal knowledge into visual and action representations, and directly perform a cross-modal mapping for step-level action prediction. However, such spatiotemporal reasoning remains largely implicit, making it difficult to handle multiple sequential behaviors with explicit spatiotemporal boundaries. In this work, we propose ST-$π$, a structured spatiotemporal VLA model for robotic manipulation. Our model is guided by two key designs: 1) Spatiotemporal VLM. We encode 4D observations and task instructions into latent spaces, and feed them into the LLM to generate a sequence of causally ordered chunk-level action prompts consisting of sub-tasks, spatial grounding and temporal grounding. 2) Spatiotemporal action expert. Conditioned on chunk-level action prompts, we design a structured dual-generator guidance to jointly model spatial dependencies and temporal causality, thus predicting step-level action parameters. Within this structured framework, the VLM explicitly plans global spatiotemporal behavior, and the action expert further refines local spatiotemporal control. In addition, we propose a real-world robotic dataset with structured spatiotemporal annotations for fine-tuning. Extensive experiments have been conducted to demonstrate the effectiveness of our model. Our code link: https://github.com/chuanhaoma/ST-pi.
92.7LOMay 6
Continuations and Completeness in Proof-theoretic SemanticsTao Gu, David Pym, Eike Ritter et al.
This is a short paper about the relationship between logic and computation. More specifically, it is about a relationship between the completeness proof for intuitionistic propositional logic within the form of proof-theoretic semantics that is known as base-extension semantics and a fundamental idea from the theory of computation called continuation-passing semantics. The latter is explained herein both in terms of reduction in natural deduction and the lambda calculus and in terms of proof-search. The relationship between completeness and continuations is explored through an analysis of Sandqvist's proof of the completeness theorem as seen from the mathematical perspective of Kripke's and Heyting's semantics. Our analysis can be seen to reveal how syntactic representations of continuations embody intensional semantical intuitions about the relationship between their meaning and use. These intuitions are made precise using the tools of proof-theoretic semantics.
NIAug 26, 2024
Towards Battery-Free Wireless Sensing via Radio-Frequency Energy HarvestingTao Ni, Zehua Sun, Mingda Han et al.
Diverse Wi-Fi-based wireless applications have been proposed, ranging from daily activity recognition to vital sign monitoring. Despite their remarkable sensing accuracy, the high energy consumption and the requirement for customized hardware modification hinder the wide deployment of the existing sensing solutions. In this paper, we propose REHSense, an energy-efficient wireless sensing solution based on Radio-Frequency (RF) energy harvesting. Instead of relying on a power-hungry Wi-Fi receiver, REHSense leverages an RF energy harvester as the sensor and utilizes the voltage signals harvested from the ambient Wi-Fi signals to enable simultaneous context sensing and energy harvesting. We design and implement REHSense using a commercial-off-the-shelf (COTS) RF energy harvester. Extensive evaluation of three fine-grained wireless sensing tasks (i.e., respiration monitoring, human activity, and hand gesture recognition) shows that REHSense can achieve comparable sensing accuracy with conventional Wi-Fi-based solutions while adapting to different sensing environments, reducing the power consumption by 98.7% and harvesting up to 4.5mW of power from RF energy.
CVMar 4, 2024Code
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-onYuhao Xu, Tao Gu, Weifeng Chen et al.
We present OOTDiffusion, a novel network architecture for realistic and controllable image-based virtual try-on (VTON). We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features. Without a redundant warping process, the garment features are precisely aligned with the target human body via the proposed outfitting fusion in the self-attention layers of the denoising UNet. In order to further enhance the controllability, we introduce outfitting dropout to the training process, which enables us to adjust the strength of the garment features through classifier-free guidance. Our comprehensive experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results for arbitrary human and garment images, which outperforms other VTON methods in both realism and controllability, indicating an impressive breakthrough in virtual try-on. Our source code is available at https://github.com/levihsu/OOTDiffusion.
ROFeb 4
HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online DistillationPuyue Wang, Jiawei Hu, Yan Gao et al.
Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at https://tonywang-0517.github.io/hord/.
CVApr 15, 2024Code
Magic Clothing: Controllable Garment-Driven Image SynthesisWeifeng Chen, Tao Gu, Yuhao Xu et al.
We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task. Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue, i.e., to preserve the garment details and maintain faithfulness to the text prompts. To this end, we introduce a garment extractor to capture the detailed garment features, and employ self-attention fusion to incorporate them into the pretrained LDMs, ensuring that the garment details remain unchanged on the target character. Then, we leverage the joint classifier-free guidance to balance the control of garment features and text prompts over the generated results. Meanwhile, the proposed garment extractor is a plug-in module applicable to various finetuned LDMs, and it can be combined with other extensions like ControlNet and IP-Adapter to enhance the diversity and controllability of the generated characters. Furthermore, we design Matched-Points-LPIPS (MP-LPIPS), a robust metric for evaluating the consistency of the target image to the source garment. Extensive experiments demonstrate that our Magic Clothing achieves state-of-the-art results under various conditional controls for garment-driven image synthesis. Our source code is available at https://github.com/ShineChen1024/MagicClothing.
LGOct 15, 2023
FLrce: Resource-Efficient Federated Learning with Early-Stopping StrategyZiru Niu, Hai Dong, A. Kai Qin et al.
Federated Learning (FL) achieves great popularity in the Internet of Things (IoT) as a powerful interface to offer intelligent services to customers while maintaining data privacy. Under the orchestration of a server, edge devices (also called clients in FL) collaboratively train a global deep-learning model without sharing any local data. Nevertheless, the unequal training contributions among clients have made FL vulnerable, as clients with heavily biased datasets can easily compromise FL by sending malicious or heavily biased parameter updates. Furthermore, the resource shortage issue of the network also becomes a bottleneck. Due to overwhelming computation overheads generated by training deep-learning models on edge devices, and significant communication overheads for transmitting deep-learning models across the network, enormous amounts of resources are consumed in the FL process. This encompasses computation resources like energy and communication resources like bandwidth. To comprehensively address these challenges, in this paper, we present FLrce, an efficient FL framework with a relationship-based client selection and early-stopping strategy. FLrce accelerates the FL process by selecting clients with more significant effects, enabling the global model to converge to a high accuracy in fewer rounds. FLrce also leverages an early stopping mechanism that terminates FL in advance to save communication and computation resources. Experiment results show that, compared with existing efficient FL frameworks, FLrce improves the computation and communication efficiency by at least 30% and 43% respectively.
82.2CVMay 8
ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D GenerationHaonan Wang, Hanyu Zhou, Tao Gu et al.
Generative models have achieved success in producing apparently coherent 2D videos, but remain challenging in the physical world due to lack of 4D spatiotemporal scale. Typically, existing 4D generative models directly embed macro scale constraints to enhance overall spatiotemporal consistency. However, these methods only ensure global appearance coherence and fail to reveal the local dynamics of the physical world. Our insight is that global appearance structure and local dynamic topology empower 4D spatiotemporal cognition, thereby enabling 4D generation with spatiotemporal regularities. In this work, we propose ST-Gen4D, a 4D generation framework with 4D spatiotemporal cognition-based world model. Our model is guided by four key designs: 1) Spatiotemporal representation. We encode various modalities into multiple representations as a feature basis. 2) Spatiotemporal cognition. We sculpture these representations into global appearance graph and local dynamic graph, and fuse them via semantic-bridged spatiotemporal fusion to obtain a 4D cognition graph. 3) Spatiotemporal reasoning. We utilize a world model to derive future state based on the 4D cognition. 4) Spatiotemporal generation. We leverage the derived cognition as condition to guide latent diffusion for 4D Gaussian generation. By deeply integrating 4D intrinsic cognition with generative priors, our model guarantees the structural rationality and topological consistency of 4D generation. Moreover, we propose ST-4D datasets by aggregating public 4D datasets and self-built subset. Extensive experiments demonstrate the superiority of our ST-Gen4D across 3D and 4D generation tasks.
LGDec 29, 2025
Energy and Memory-Efficient Federated Learning With Ordered Layer FreezingZiru Niu, Hai Dong, A. K. Qin et al.
Federated Learning (FL) has emerged as a privacy-preserving paradigm for training machine learning models across distributed edge devices in the Internet of Things (IoT). By keeping data local and coordinating model training through a central server, FL effectively addresses privacy concerns and reduces communication overhead. However, the limited computational power, memory, and bandwidth of IoT edge devices pose significant challenges to the efficiency and scalability of FL, especially when training deep neural networks. Various FL frameworks have been proposed to reduce computation and communication overheads through dropout or layer freezing. However, these approaches often sacrifice accuracy or neglect memory constraints. To this end, in this work, we introduce Federated Learning with Ordered Layer Freezing (FedOLF). FedOLF consistently freezes layers in a predefined order before training, significantly mitigating computation and memory requirements. To further reduce communication and energy costs, we incorporate Tensor Operation Approximation (TOA), a lightweight alternative to conventional quantization that better preserves model accuracy. Experimental results demonstrate that over non-iid data, FedOLF achieves at least 0.3%, 6.4%, 5.81%, 4.4%, 6.27% and 1.29% higher accuracy than existing works respectively on EMNIST (with CNN), CIFAR-10 (with AlexNet), CIFAR-100 (with ResNet20 and ResNet44), and CINIC-10 (with ResNet20 and ResNet44), along with higher energy efficiency and lower memory footprint.
LGFeb 16
Traceable Latent Variable Discovery Based on Multi-Agent CollaborationHuaming Du, Tao Hu, Yijie Huang et al.
Revealing the underlying causal mechanisms in the real world is crucial for scientific and technological progress. Despite notable advances in recent decades, the lack of high-quality data and the reliance of traditional causal discovery algorithms (TCDA) on the assumption of no latent confounders, as well as their tendency to overlook the precise semantics of latent variables, have long been major obstacles to the broader application of causal discovery. To address this issue, we propose a novel causal modeling framework, TLVD, which integrates the metadata-based reasoning capabilities of large language models (LLMs) with the data-driven modeling capabilities of TCDA for inferring latent variables and their semantics. Specifically, we first employ a data-driven approach to construct a causal graph that incorporates latent variables. Then, we employ multi-LLM collaboration for latent variable inference, modeling this process as a game with incomplete information and seeking its Bayesian Nash Equilibrium (BNE) to infer the possible specific latent variables. Finally, to validate the inferred latent variables across multiple real-world web-based data sources, we leverage LLMs for evidence exploration to ensure traceability. We comprehensively evaluate TLVD on three de-identified real patient datasets provided by a hospital and two benchmark datasets. Extensive experimental results confirm the effectiveness and reliability of TLVD, with average improvements of 32.67% in Acc, 62.21% in CAcc, and 26.72% in ECit across the five datasets.
CVMar 6
Cog2Gen3D: Sculpturing 3D Semantic-Geometric Cognition for 3D GenerationHaonan Wang, Hanyu Zhou, Haoyue Liu et al.
Generative models have achieved success in producing semantically plausible 2D images, but it remains challenging in 3D generation due to the absence of spatial geometry constraints. Typically, existing methods utilize geometric features as conditions to enhance spatial awareness. However, these methods can only model relative relationships and are prone to scale inconsistency of absolute geometry. Thus, we argue that semantic information and absolute geometry empower 3D cognition, thereby enabling controllable 3D generation for the physical world. In this work, we propose Cog2Gen3D, a 3D cognition-guided diffusion framework for 3D generation. Our model is guided by three key designs: 1) Cognitive Feature Embeddings. We encode different modalities into semantic and geometric representations and further extract logical representations. 2) 3D Latent Cognition Graph. We structure different representations into dual-stream semantic-geometric graphs and fuse them via common-based cross-attention to obtain a 3D cognition graph. 3) Cognition-Guided Latent Diffusion. We leverage the fused 3D cognition graph as the condition to guide the latent diffusion process for 3D Gaussian generation. Under this unified framework, the 3D cognition graph ensures the physical plausibility and structural rationality of 3D generation. Moreover, we construct a validation subset based on the Marble World Labs. Extensive experiments demonstrate that our Cog2Gen3D significantly outperforms existing methods in both semantic fidelity and geometric plausibility.
CVJun 28, 2025
XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the EdgeYu Zhang, Xi Zhang, Hualin zhou et al.
Deep learning for human sensing on edge systems presents significant potential for smart applications. However, its training and development are hindered by the limited availability of sensor data and resource constraints of edge systems. While transferring pre-trained models to different sensing applications is promising, existing methods often require extensive sensor data and computational resources, resulting in high costs and poor adaptability in practice. In this paper, we propose XTransfer, a first-of-its-kind method enabling modality-agnostic, few-shot model transfer with resource-efficient design. XTransfer flexibly uses single or multiple pre-trained models and transfers knowledge across different modalities by (i) model repairing that safely mitigates modality shift by adapting pre-trained layers with only few sensor data, and (ii) layer recombining that efficiently searches and recombines layers of interest from source models in a layer-wise manner to create compact models. We benchmark various baselines across diverse human sensing datasets spanning different modalities. Comprehensive results demonstrate that XTransfer achieves state-of-the-art performance while significantly reducing the costs of sensor data collection, model training, and edge deployment.
DCMar 6, 2020
An Ontology-based Context Model in Intelligent EnvironmentsTao Gu, Xiao Hang Wang, Hung Keng Pung et al.
Computing becomes increasingly mobile and pervasive today; these changes imply that applications and services must be aware of and adapt to their changing contexts in highly dynamic environments. Today, building context-aware systems is a complex task due to lack of an appropriate infrastructure support in intelligent environments. A context-aware infrastructure requires an appropriate context model to represent, manipulate and access context information. In this paper, we propose a formal context model based on ontology using OWL to address issues including semantic context representation, context reasoning and knowledge sharing, context classification, context dependency and quality of context. The main benefit of this model is the ability to reason about various contexts. Based on our context model, we also present a Service-Oriented Context-Aware Middleware (SOCAM) architecture for building of context-aware services.
LGFeb 7, 2020
MDLdroid: a ChainSGD-reduce Approach to Mobile Deep Learning for Personal Mobile SensingYu Zhang, Tao Gu, Xi Zhang
Personal mobile sensing is fast permeating our daily lives to enable activity monitoring, healthcare and rehabilitation. Combined with deep learning, these applications have achieved significant success in recent years. Different from conventional cloud-based paradigms, running deep learning on devices offers several advantages including data privacy preservation and low-latency response for both model inference and update. Since data collection is costly in reality, Google's Federated Learning offers not only complete data privacy but also better model robustness based on multiple user data. However, personal mobile sensing applications are mostly user-specific and highly affected by environment. As a result, continuous local changes may seriously affect the performance of a global model generated by Federated Learning. In addition, deploying Federated Learning on a local server, e.g., edge server, may quickly reach the bottleneck due to resource constraint and serious failure by attacks. Towards pushing deep learning on devices, we present MDLdroid, a novel decentralized mobile deep learning framework to enable resource-aware on-device collaborative learning for personal mobile sensing applications. To address resource limitation, we propose a ChainSGD-reduce approach which includes a novel chain-directed Synchronous Stochastic Gradient Descent algorithm to effectively reduce overhead among multiple devices. We also design an agent-based multi-goal reinforcement learning mechanism to balance resources in a fair and efficient manner. Our evaluations show that our model training on off-the-shelf mobile devices achieves 2x to 3.5x faster than single-device training, and 1.5x faster than the master-slave approach.
HCMay 17, 2018
Interpretable Parallel Recurrent Neural Networks with Convolutional Attentions for Multi-Modality Activity ModelingKaixuan Chen, Lina Yao, Xianzhi Wang et al.
Multimodal features play a key role in wearable sensor-based human activity recognition (HAR). Selecting the most salient features adaptively is a promising way to maximize the effectiveness of multimodal sensor data. In this regard, we propose a "collect fully and select wisely" principle as well as an interpretable parallel recurrent model with convolutional attentions to improve the recognition performance. We first collect modality features and the relations between each pair of features to generate activity frames, and then introduce an attention mechanism to select the most prominent regions from activity frames precisely. The selected frames not only maximize the utilization of valid features but also reduce the number of features to be computed effectively. We further analyze the accuracy and interpretability of the proposed model based on extensive experiments. The results show that our model achieves competitive performance on two benchmarked datasets and works well in real life scenarios.
HCNov 21, 2017
Fullie and Wiselie: A Dual-Stream Recurrent Convolutional Attention Model for Activity RecognitionKaixuan Chen, Lina Yao, Tao Gu et al.
Multimodal features play a key role in wearable sensor based Human Activity Recognition (HAR). Selecting the most salient features adaptively is a promising way to maximize the effectiveness of multimodal sensor data. In this regard, we propose a "collect fully and select wisely (Fullie and Wiselie)" principle as well as a dual-stream recurrent convolutional attention model, Recurrent Attention and Activity Frame (RAAF), to improve the recognition performance. We first collect modality features and the relations between each pair of features to generate activity frames, and then introduce an attention mechanism to select the most prominent regions from activity frames precisely. The selected frames not only maximize the utilization of valid features but also reduce the number of features to be computed effectively. We further analyze the hyper-parameters, accuracy, interpretability, and annotation dependency of the proposed model based on extensive experiments. The results show that RAAF achieves competitive performance on two benchmarked datasets and works well in real life scenarios.
HCNov 16, 2017
MindID: Person Identification from Brain Waves through Attention-based Recurrent Neural NetworkXiang Zhang, Lina Yao, Salil S. Kanhere et al.
Person identification technology recognizes individuals by exploiting their unique, measurable physiological and behavioral characteristics. However, the state-of-the-art person identification systems have been shown to be vulnerable, e.g., contact lenses can trick iris recognition and fingerprint films can deceive fingerprint sensors. EEG (Electroencephalography)-based identification, which utilizes the users' brainwave signals for identification and offers a more resilient solution, draw a lot of attention recently. However, the accuracy still requires improvement and very little work is focusing on the robustness and adaptability of the identification system. We propose MindID, an EEG-based biometric identification approach, achieves higher accuracy and better characteristics. At first, the EEG data patterns are analyzed and the results show that the Delta pattern contains the most distinctive information for user identification. Then the decomposed Delta pattern is fed into an attention-based Encoder-Decoder RNNs (Recurrent Neural Networks) structure which assigns various attention weights to different EEG channels based on the importance of channels. The discriminative representations learned from the attention-based RNN are used to recognize the user identification through a boosting classifier. The proposed approach is evaluated over 3 datasets (two local and one public). One local dataset (EID-M) is used for performance assessment and the result illustrates that our model achieves an accuracy of 0.982 which outperforms the baselines and the state-of-the-art. Another local dataset (EID-S) and a public dataset (EEG-S) are utilized to demonstrate the robustness and adaptability, respectively. The results indicate that the proposed approach has the potential to be largely deployed in the practice environment.
HCSep 26, 2017
Multi-Person Brain Activity Recognition via Comprehensive EEG Signal AnalysisXiang Zhang, Lina Yao, Dalin Zhang et al.
An electroencephalography (EEG) based brain activity recognition is a fundamental field of study for a number of significant applications such as intention prediction, appliance control, and neurological disease diagnosis in smart home and smart healthcare domains. Existing techniques mostly focus on binary brain activity recognition for a single person, which limits their deployment in wider and complex practical scenarios. Therefore, multi-person and multi-class brain activity recognition has obtained popularity recently. Another challenge faced by brain activity recognition is the low recognition accuracy due to the massive noises and the low signal-to-noise ratio in EEG signals. Moreover, the feature engineering in EEG processing is time-consuming and highly re- lies on the expert experience. In this paper, we attempt to solve the above challenges by proposing an approach which has better EEG interpretation ability via raw Electroencephalography (EEG) signal analysis for multi-person and multi-class brain activity recognition. Specifically, we analyze inter-class and inter-person EEG signal characteristics, based on which to capture the discrepancy of inter-class EEG data. Then, we adopt an Autoencoder layer to automatically refine the raw EEG signals by eliminating various artifacts. We evaluate our approach on both a public and a local EEG datasets and conduct extensive experiments to explore the effect of several factors (such as normalization methods, training data size, and Autoencoder hidden neuron size) on the recognition results. The experimental results show that our approach achieves a high accuracy comparing to competitive state-of-the-art methods, indicating its potential in promoting future research on multi-person EEG recognition.
HCSep 26, 2017
Converting Your Thoughts to Texts: Enabling Brain Typing via Deep Feature Learning of EEG SignalsXiang Zhang, Lina Yao, Quan Z. Sheng et al.
An electroencephalography (EEG) based Brain Computer Interface (BCI) enables people to communicate with the outside world by interpreting the EEG signals of their brains to interact with devices such as wheelchairs and intelligent robots. More specifically, motor imagery EEG (MI-EEG), which reflects a subjects active intent, is attracting increasing attention for a variety of BCI applications. Accurate classification of MI-EEG signals while essential for effective operation of BCI systems, is challenging due to the significant noise inherent in the signals and the lack of informative correlation between the signals and brain activities. In this paper, we propose a novel deep neural network based learning framework that affords perceptive insights into the relationship between the MI-EEG data and brain activities. We design a joint convolutional recurrent neural network that simultaneously learns robust high-level feature presentations through low-dimensional dense embeddings from raw MI-EEG signals. We also employ an Autoencoder layer to eliminate various artifacts such as background activities. The proposed approach has been evaluated extensively on a large- scale public MI-EEG dataset and a limited but easy-to-deploy dataset collected in our lab. The results show that our approach outperforms a series of baselines and the competitive state-of-the- art methods, yielding a classification accuracy of 95.53%. The applicability of our proposed approach is further demonstrated with a practical BCI system for typing.
LGJun 6, 2017
DeepKey: An EEG and Gait Based Dual-Authentication SystemXiang Zhang, Lina Yao, Chaoran Huang et al.
Biometric authentication involves various technologies to identify individuals by exploiting their unique, measurable physiological and behavioral characteristics. However, traditional biometric authentication systems (e.g., face recognition, iris, retina, voice, and fingerprint) are facing an increasing risk of being tricked by biometric tools such as anti-surveillance masks, contact lenses, vocoder, or fingerprint films. In this paper, we design a multimodal biometric authentication system named Deepkey, which uses both Electroencephalography (EEG) and gait signals to better protect against such risk. Deepkey consists of two key components: an Invalid ID Filter Model to block unauthorized subjects and an identification model based on attention-based Recurrent Neural Network (RNN) to identify a subject`s EEG IDs and gait IDs in parallel. The subject can only be granted access while all the components produce consistent evidence to match the user`s proclaimed identity. We implement Deepkey with a live deployment in our university and conduct extensive empirical experiments to study its technical feasibility in practice. DeepKey achieves the False Acceptance Rate (FAR) and the False Rejection Rate (FRR) of 0 and 1.0%, respectively. The preliminary results demonstrate that Deepkey is feasible, show consistent superior performance compared to a set of methods, and has the potential to be applied to the authentication deployment in real world settings.
AIApr 29, 2016
"Knowing value" logic as a normal modal logicTao Gu, Yanjing Wang
Recent years witness a growing interest in nonstandard epistemic logics of "knowing whether", "knowing what", "knowing how", and so on. These logics are usually not normal, i.e., the standard axioms and reasoning rules for modal logic may be invalid. In this paper, we show that the conditional "knowing value" logic proposed by Wang and Fan \cite{WF13} can be viewed as a disguised normal modal logic by treating the negation of the Kv operator as a special diamond. Under this perspective, it turns out that the original first-order Kripke semantics can be greatly simplified by introducing a ternary relation $R_i^c$ in standard Kripke models, which associates one world with two $i$-accessible worlds that do not agree on the value of constant $c$. Under intuitive constraints, the modal logic based on such Kripke models is exactly the one studied by Wang and Fan (2013,2014}. Moreover, there is a very natural binary generalization of the "knowing value" diamond, which, surprisingly, does not increase the expressive power of the logic. The resulting logic with the binary diamond has a transparent normal modal system, which sharpens our understanding of the "knowing value" logic and simplifies some previously hard problems.