Bingzhang Hu

h-index5

6papers

34citations

Novelty53%

AI Score41

Ranked #68,524 of 194,257 authors (top 35%)#23,335 in CV (top 39%)

6 Papers

1.8LGNov 2, 2022

Knowing the Past to Predict the Future: Reinforcement Virtual Learning

Peng Zhang, Yawen Huang, Bingzhang Hu et al. · tencent-ai

Reinforcement Learning (RL)-based control system has received considerable attention in recent decades. However, in many real-world problems, such as Batch Process Control, the environment is uncertain, which requires expensive interaction to acquire the state and reward values. In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space using the predictive models with only historical data. The proposed framework enables a step-by-step RL model to predict the future state and select optimal actions for long-sight decisions. The main focuses are summarized as: 1) how to balance the long-sight and short-sight rewards with an optimal strategy; 2) how to make the virtual model interacting with real environment to converge to a final learning policy. Under the experimental settings of Fed-Batch Process, our method consistently outperforms the existing state-of-the-art methods.

7.6CVJul 20, 2019Code

Order Matters: Shuffling Sequence Generation for Video Prediction

Junyan Wang, Bingzhang Hu, Yang Long et al.

Predicting future frames in natural video sequences is a new challenge that is receiving increasing attention in the computer vision community. However, existing models suffer from severe loss of temporal information when the predicted sequence is long. Compared to previous methods focusing on generating more realistic contents, this paper extensively studies the importance of sequential order information for video generation. A novel Shuffling sEquence gEneration network (SEE-Net) is proposed that can learn to discriminate unnatural sequential orders by shuffling the video frames and comparing them to the real video sequence. Systematic experiments on three datasets with both synthetic and real-world videos manifest the effectiveness of shuffling sequence generation for video prediction in our proposed model and demonstrate state-of-the-art performance by both qualitative and quantitative evaluations. The source code is available at https://github.com/andrewjywang/SEENet.

9.0CVMar 13

Mastering Negation: Boosting Grounding Models via Grouped Opposition-Based Learning

Zesheng Yang, Xi Jiang, Bingzhang Hu et al.

Current vision-language detection and grounding models predominantly focus on prompts with positive semantics and often struggle to accurately interpret and ground complex expressions containing negative semantics. A key reason for this limitation is the lack of high-quality training data that explicitly captures discriminative negative samples and negation-aware language descriptions. To address this challenge, we introduce D-Negation, a new dataset that provides objects annotated with both positive and negative semantic descriptions. Building upon the observation that negation reasoning frequently appears in natural language, we further propose a grouped opposition-based learning framework that learns negation-aware representations from limited samples. Specifically, our method organizes opposing semantic descriptions from D-Negation into structured groups and formulates two complementary loss functions that encourage the model to reason about negation and semantic qualifiers. We integrate the proposed dataset and learning strategy into a state-of-the-art language-based grounding model. By fine-tuning fewer than 10 percent of the model parameters, our approach achieves improvements of up to 4.4 mAP and 5.7 mAP on positive and negative semantic evaluations, respectively. These results demonstrate that explicitly modeling negation semantics can substantially enhance the robustness and localization accuracy of vision-language grounding models.

0.9CVAug 7, 2019

Dual-reference Age Synthesis

Yuan Zhou, Bingzhang Hu, and Jun He et al.

Age synthesis methods typically take a single image as input and use a specific number to control the age of the generated image. In this paper, we propose a novel framework taking two images as inputs, named dual-reference age synthesis (DRAS), which approaches the task differently; instead of using "hard" age information, i.e. a fixed number, our model determines the target age in a "soft" way, by employing a second reference image. Specifically, the proposed framework consists of an identity agent, an age agent and a generative adversarial network. It takes two images as input - an identity reference and an age reference - and outputs a new image that shares corresponding features with each. Experimental results on two benchmark datasets (UTKFace and CACD) demonstrate the appealing performance and flexibility of the proposed framework.

5.2CVNov 26, 2018

Robust Cross-View Gait Recognition with Evidence: A Discriminant Gait GAN (DiGGAN) Approach

BingZhang Hu, Yu Guan, Yan Gao et al.

Gait as a biometric trait has attracted much attention in many security and privacy applications such as identity recognition and authentication, during the last few decades. Because of its nature as a long-distance biometric trait, gait can be easily collected and used to identify individuals non-intrusively through CCTV cameras. However, it is very difficult to develop robust automated gait recognition systems, since gait may be affected by many covariate factors such as clothing, walking speed, camera view angle etc. Out of them, large view angle changes has been deemed as the most challenging factor as it can alter the overall gait appearance substantially. Existing works on gait recognition are far from enough to provide satisfying performances because of such view changes. Furthermore, very few works have considered evidences -- the demonstrable information revealing the reliabilities of decisions, which are regarded as important demands in machine learning-based recognition/authentication applications. To address these issues, in this paper we propose a Discriminant Gait Generative Adversarial Network, namely DiGGAN, which can effectively extract view-invariant features for cross-view gait recognition; and more importantly, to transfer gait images to different views -- serving as evidences and showing how the decisions have been made. Quantitative experiments have been conducted on the two most popular cross-view gait datasets, the OU-MVLP and CASIA-B, where the proposed DiGGAN has outperformed state-of-the-art methods. Qualitative analysis has also been provided and demonstrates the proposed DiGGAN's capability in providing evidences.

1.7CVJun 2, 2017

Dual-reference Face Retrieval

BingZhang Hu, Feng Zheng, Ling Shao

Face retrieval has received much attention over the past few decades, and many efforts have been made in retrieving face images against pose, illumination, and expression variations. However, the conventional works fail to meet the requirements of a potential and novel task --- retrieving a person's face image at a specific age, especially when the specific 'age' is not given as a numeral, i.e. 'retrieving someone's image at the similar age period shown by another person's image'. To tackle this problem, we propose a dual reference face retrieval framework in this paper, where the system takes two inputs: an identity reference image which indicates the target identity and an age reference image which reflects the target age. In our framework, the raw images are first projected on a joint manifold, which preserves both the age and identity locality. Then two similarity metrics of age and identity are exploited and optimized by utilizing our proposed quartet-based model. The experiments show promising results, outperforming hierarchical methods.