Bincheng Wang

CV
h-index61
5papers
141citations
Novelty43%
AI Score40

5 Papers

AIMar 17, 2025
The Amazon Nova Family of Models: Technical Report and Model Card

Amazon AGI, Aaron Langford, Aayush Shah et al. · amazon-science

We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.

94.4CVMar 19Code
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

Yibo Shi, Jungang Li, Linghao Zhang et al.

Long-horizon GUI agents are a key step toward real-world deployment, yet effective interaction memory under prevailing paradigms remains under-explored. Replaying full interaction sequences is redundant and amplifies noise, while summaries often erase dependency-critical information and traceability. We present AndroTMem, a diagnostic framework for anchored memory in long-horizon Android GUI agents. Its core benchmark, AndroTMem-Bench, comprises 1,069 tasks with 34,473 interaction steps (avg. 32.1 per task, max. 65). We evaluate agents with TCR (Task Complete Rate), focusing on tasks whose completion requires carrying forward critical intermediate state; AndroTMem-Bench is designed to enforce strong step-to-step causal dependencies, making sparse yet essential intermediate states decisive for downstream actions and centering interaction memory in evaluation. Across open- and closed-source GUI agents, we observe a consistent pattern: as interaction sequences grow longer, performance drops are driven mainly by within-task memory failures, not isolated perception errors or local action mistakes. Guided by this diagnosis, we propose Anchored State Memory (ASM), which represents interaction sequences as a compact set of causally linked intermediate-state anchors to enable subgoal-targeted retrieval and attribution-aware decision making. Across multiple settings and 12 evaluated GUI agents, ASM consistently outperforms full-sequence replay and summary-based baselines, improving TCR by 5%-30.16% and AMS by 4.93%-24.66%, indicating that anchored, structured memory effectively mitigates the interaction-memory bottleneck in long-horizon GUI tasks. The code, benchmark, and related resources are publicly available at [https://github.com/CVC2233/AndroTMem](https://github.com/CVC2233/AndroTMem).

CVJul 19, 2023
Hierarchical Spatio-Temporal Representation Learning for Gait Recognition

Lei Wang, Bo Liu, Fangfang Liang et al.

Gait recognition is a biometric technique that identifies individuals by their unique walking styles, which is suitable for unconstrained environments and has a wide range of applications. While current methods focus on exploiting body part-based representations, they often neglect the hierarchical dependencies between local motion patterns. In this paper, we propose a hierarchical spatio-temporal representation learning (HSTL) framework for extracting gait features from coarse to fine. Our framework starts with a hierarchical clustering analysis to recover multi-level body structures from the whole body to local details. Next, an adaptive region-based motion extractor (ARME) is designed to learn region-independent motion features. The proposed HSTL then stacks multiple ARMEs in a top-down manner, with each ARME corresponding to a specific partition level of the hierarchy. An adaptive spatio-temporal pooling (ASTP) module is used to capture gait features at different levels of detail to perform hierarchical feature mapping. Finally, a frame-level temporal aggregation (FTA) module is employed to reduce redundant information in gait sequences through multi-scale temporal downsampling. Extensive experiments on CASIA-B, OUMVLP, GREW, and Gait3D datasets demonstrate that our method outperforms the state-of-the-art while maintaining a reasonable balance between model accuracy and complexity.

CVSep 18, 2022
GaitMM: Multi-Granularity Motion Sequence Learning for Gait Recognition

Lei Wang, Bo Liu, Bincheng Wang et al.

Gait recognition aims to identify individual-specific walking patterns by observing the different periodic movements of each body part. However, most existing methods treat each part equally and fail to account for the data redundancy caused by the different step frequencies and sampling rates of gait sequences. In this study, we propose a multi-granularity motion representation network (GaitMM) for gait sequence learning. In GaitMM, we design a combined full-body and fine-grained sequence learning module (FFSL) to explore part-independent spatio-temporal representations. Moreover, we utilize a frame-wise compression strategy, referred to as multi-scale motion aggregation (MSMA), to capture discriminative information in the gait sequence. Experiments on two public datasets, CASIA-B and OUMVLP, show that our approach reaches state-of-the-art performances.

CRJan 1, 2021
PHOENIX: Device-Centric Cellular Network Protocol Monitoring using Runtime Verification

Mitziu Echeverria, Zeeshan Ahmed, Bincheng Wang et al.

End-user-devices in the current cellular ecosystem are prone to many different vulnerabilities across different generations and protocol layers. Fixing these vulnerabilities retrospectively can be expensive, challenging, or just infeasible. A pragmatic approach for dealing with such a diverse set of vulnerabilities would be to identify attack attempts at runtime on the device side, and thwart them with mitigating and corrective actions. Towards this goal, in the paper we propose a general and extendable approach called Phoenix for identifying n-day cellular network control-plane vulnerabilities as well as dangerous practices of network operators from the device vantage point. Phoenix monitors the device-side cellular network traffic for performing signature-based unexpected behavior detection through lightweight runtime verification techniques. Signatures in Phoenix can be manually-crafted by a cellular network security expert or can be automatically synthesized using an optional component of Phoenix, which reduces the signature synthesis problem to the language learning from the informant problem. Based on the corrective actions that are available to Phoenix when an undesired behavior is detected, different instantiations of Phoenix are possible: a full-fledged defense when deployed inside a baseband processor; a user warning system when deployed as a mobile application; a probe for identifying attacks in the wild. One such instantiation of Phoenix was able to identify all 15 representative n-day vulnerabilities and unsafe practices of 4G LTE networks considered in our evaluation with a high packet processing speed (~68000 packets/second) while inducing only a moderate amount of energy overhead (~4mW).