Yinghao Wang

CV
h-index1
3papers
9citations
Novelty45%
AI Score39

3 Papers

LGMar 20, 2023
Optimized preprocessing and Tiny ML for Attention State Classification

Yinghao Wang, Rémi Nahon, Enzo Tartaglione et al.

In this paper, we present a new approach to mental state classification from EEG signals by combining signal processing techniques and machine learning (ML) algorithms. We evaluate the performance of the proposed method on a dataset of EEG recordings collected during a cognitive load task and compared it to other state-of-the-art methods. The results show that the proposed method achieves high accuracy in classifying mental states and outperforms state-of-the-art methods in terms of classification accuracy and computational efficiency.

CVMay 27, 2025Code
Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning

Lintao Xu, Yinghao Wang, Chaohui Wang

Occlusion Boundary Estimation (OBE) identifies boundaries arising from both inter-object occlusions and self-occlusion within individual objects, distinguishing them from ordinary edges and semantic contours to support more accurate scene understanding. This task is closely related to Monocular Depth Estimation (MDE), which infers depth from a single image, as Occlusion Boundaries (OBs) provide critical geometric cues for resolving depth ambiguities, while depth can conversely refine occlusion reasoning. In this paper, we propose MoDOT, a novel method that jointly estimates depth and OBs from a single image for the first time. MoDOT incorporates a new module, CASM, which combines cross-attention and multi-scale strip convolutions to leverage mid-level OB features for improved depth prediction. It also includes an occlusion-aware loss, OBDCL, which encourages more accurate boundaries in the predicted depth map. Extensive experiments demonstrate the mutual benefits of jointly estimating depth and OBs, and validate the effectiveness of MoDOT's design. Our method achieves state-of-the-art (SOTA) performance on two synthetic datasets and the widely used NYUD-v2 real-world dataset, significantly outperforming multi-task baselines. Furthermore, the cross-domain results of MoDOT on real-world depth prediction - trained solely on our synthetic dataset - yield promising results, preserving sharp OBs in the predicted depth maps and demonstrating improved geometric fidelity compared to competitors. We will release our code, pre-trained models, and dataset at [link].

82.4HCMar 27
The Observability Gap: Why Output-Level Human Feedback Fails for LLM Coding Agents

Yinghao Wang, Cheng Wang

Large language model (LLM) multi-agent coding systems typically fix agent capabilities at design time. We study an alternative setting, earned autonomy, in which a coding agent starts with zero pre-defined functions and incrementally builds a reusable function library through lightweight human feedback on visual output alone. We evaluate this setup in a Blender-based 3D scene generation task requiring both spatial reasoning and programmatic geometric control. Although the agent rediscovered core utility functions comparable to a human reference implementation, it achieved 0% full-scene success under output-only feedback across multiple instruction granularities, where success required satisfying object completeness, ground contact, collision avoidance, and scale plausibility simultaneously. Our analysis identifies a structural observability gap: bugs originate in code logic and execution state, while human evaluation occurs only at the output layer, and the many-to-one mapping from internal states to visible outcomes prevents symptom-level feedback from reliably identifying root causes. This mismatch leads to persistent failure mode oscillation rather than convergence. A diagnostic intervention that injected minimal code-level knowledge restored convergence, strongly supporting the interpretation that the main bottleneck lies in feedback observability rather than programming competence. We formalize this phenomenon as a feedback paradox in domains with deep causal chains between internal code logic and perceptual outcomes, and argue that effective human-agent collaboration in such settings requires intermediate observability beyond output-only evaluation.