CVApr 30Code
FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State SettingFengxian Ji, Jingpu Yang, Zirui Song et al.
Despite the rapid progress of large vision-language models (LVLMs), fine-grained, state-conditioned GUI interaction remains challenging. Current evaluations offer limited coverage, imprecise target-state definitions, and an overreliance on final-task success, obscuring where and why agents fail. To address this gap, we introduce \textbf{FineState-Bench}, a benchmark that evaluates whether an agent can correctly ground an instruction to the intended UI control and reach the exact target state. FineState-Bench comprises 2,209 instances across desktop, web, and mobile platforms, spanning four interaction families and 23 UI component types, with each instance explicitly specifying an exact target state for fine-grained state setting. We further propose \textit{FineState-Metrics}, a four-stage diagnostic pipeline with stage-wise success rates: Localization Success Rate (SR@Loc), Interaction Success Rate (SR@Int), Exact State Success Rate at Locate (ES-SR@Loc), and Exact State Success Rate at Interact (ES-SR@Int), and a plug-and-play \textit{Visual Diagnostic Assistant} (VDA) that generates a Description and a bounding-box Localization Hint to diagnose visual grounding reason via controlled w/ vs.\ w/o comparisons. On FineState-Bench, exact goal-state success remains low: ES-SR@Int peaks at 32.8\% on Web and 22.8\% on average across platforms. With VDA localization hints, Gemini-2.5-Flash gains +14.9 ES-SR@Int points, suggesting substantial headroom from improved visual grounding, yet overall accuracy is still insufficient for reliable fine-grained state-conditioned interaction \href{https://github.com/FengxianJi/FineState-Bench}{Github.}
CVAug 12, 2025Code
FineState-Bench: A Comprehensive Benchmark for Fine-Grained State Control in GUI AgentsFengxian Ji, Jingpu Yang, Zirui Song et al.
With the rapid advancement of generative artificial intelligence technology, Graphical User Interface (GUI) agents have demonstrated tremendous potential for autonomously managing daily tasks through natural language instructions. However, current evaluation frameworks for GUI agents suffer from fundamental flaws: existing benchmarks overly focus on coarse-grained task completion while neglecting fine-grained control capabilities crucial for real-world applications. To address this, we introduce FineState-Bench, the first evaluation and diagnostic standard for fine-grained GUI proxy operations, designed to quantify fine-grained control. This multi-platform (desktop, Web, mobile) framework includes 2257 task benchmarks in four components and uses a four-phase indicator for comprehensive perception-to-control assessment. To analyze perception and positioning for refined operations, we developed the plug-and-play Visual Diagnostic Assistant (VDA), enabling the first quantitative decoupling analysis of these capabilities. Experimental results on our benchmark show that the most advanced models achieve only 32.8% fine-grained interaction accuracy. Using our VDA in controlled experiments, quantifying the impact of visual capabilities, we showed that ideal visual localization boosts Gemini-2.5-Flash's success rate by 14.9\%. Our diagnostic framework confirms for the first time that the primary bottleneck for current GUI proxies is basic visual positioning capability.All resources are fully open-source. github: https://github.com/AnonymousThewarehouse/FineState-Bench huggingface: https://huggingface.co/datasets/Willtime2006/Static-FineBench
SPOct 29, 2024
Leveraging Recurrent Neural Networks for Predicting Motor Movements from Primate Motor Cortex Neural RecordingsYuanxi Wang, Zuowen Wang, Shih-Chii Liu
This paper presents an efficient deep learning solution for decoding motor movements from neural recordings in non-human primates. An Autoencoder Gated Recurrent Unit (AEGRU) model was adopted as the model architecture for this task. The autoencoder is only used during the training stage to achieve better generalization. Together with the preprocessing techniques, our model achieved 0.71 $R^2$ score, surpassing the baseline models in Neurobench and is ranked first for $R^2$ in the IEEE BioCAS 2024 Grand Challenge on Neural Decoding. Model pruning is also applied leading to a reduction of 41.4% of the multiply-accumulate (MAC) operations with little change in the $R^2$ score compared to the unpruned model.