Yiyuan Wang

8papers

110citations

Novelty49%

AI Score53

Ranked #29,464 of 205,806 authors (top 14%)#12,015 in CV (top 20%)

8 Papers

LGNov 24, 2022Code

Learning with Partial Labels from Semi-supervised Perspective

Ximing Li, Yuanzhi Jiang, Changchun Li et al.

Partial Label (PL) learning refers to the task of learning from the partially labeled data, where each training instance is ambiguously equipped with a set of candidate labels but only one is valid. Advances in the recent deep PL learning literature have shown that the deep learning paradigms, e.g., self-training, contrastive learning, or class activate values, can achieve promising performance. Inspired by the impressive success of deep Semi-Supervised (SS) learning, we transform the PL learning problem into the SS learning problem, and propose a novel PL learning method, namely Partial Label learning with Semi-supervised Perspective (PLSP). Specifically, we first form the pseudo-labeled dataset by selecting a small number of reliable pseudo-labeled instances with high-confidence prediction scores and treating the remaining instances as pseudo-unlabeled ones. Then we design a SS learning objective, consisting of a supervised loss for pseudo-labeled instances and a semantic consistency regularization for pseudo-unlabeled instances. We further introduce a complementary regularization for those non-candidate labels to constrain the model predictions on them to be as small as possible. Empirical results demonstrate that PLSP significantly outperforms the existing PL baseline methods, especially on high ambiguity levels. Code available: https://github.com/changchunli/PLSP.

21.1HCMar 30

From Passersby to Placemaking: Designing Autonomous Vehicle-Pedestrian Encounters for an Urban Shared Space

Yiyuan Wang, Martin Tomitsch, Marius Hoggenmüller et al.

Autonomous vehicles (AVs) tend to disrupt the atmosphere and pedestrian experience in urban shared spaces, undermining the focus of these spaces on people and placemaking. We investigate how external human-machine interfaces (eHMIs) supporting AV-pedestrian interaction can be extended to consider the characteristics of an urban shared space. Inspired by urban HCI, we devised three place-based eHMI designs that (i) enhance a conventional intent eHMI and (ii) exhibit content and physical integration with the space. In an evaluation study, 25 participants experienced the eHMIs in an immersive simulation of the space via virtual reality and shared their impressions through think-aloud, interviews, and questionnaires. Results showed that the place-based eHMIs had a notable effect on influencing the perception of AV interaction, including aspects like visual aesthetics and sense of reassurance, and on fostering a sense of place, such as social interactivity and the intentionality to coexist. In measuring qualities of pedestrian experience, we found that perceived safety significantly correlated with user experience and affect, including the attractiveness of eHMIs and feelings of pleasantness. The paper opens the avenue for exploring how eHMIs may contribute to the placemaking goals of pedestrian-centric spaces and improve the experience of people encountering AVs within these environments.

75.9CVMar 18Code

S-VGGT: Structure-Aware Subscene Decomposition for Scalable 3D Foundation Models

Xinze Li, Pengxu Chen, Yiyuan Wang et al.

Feed-forward 3D foundation models face a key challenge: the quadratic computational cost introduced by global attention, which severely limits scalability as input length increases. Concurrent acceleration methods, such as token merging, operate at the token level. While they offer local savings, the required nearest-neighbor searches introduce undesirable overhead. Consequently, these techniques fail to tackle the fundamental issue of structural redundancy dominant in dense capture data. In this work, we introduce \textbf{S-VGGT}, a novel approach that addresses redundancy at the structural frame level, drastically shifting the optimization focus. We first leverage the initial features to build a dense scene graph, which characterizes structural scene redundancy and guides the subsequent scene partitioning. Using this graph, we softly assign frames to a small number of subscenes, guaranteeing balanced groups and smooth geometric transitions. The core innovation lies in designing the subscenes to share a common reference frame, establishing a parallel geometric bridge that enables independent and highly efficient processing without explicit geometric alignment. This structural reorganization provides strong intrinsic acceleration by cutting the global attention cost at its source. Crucially, S-VGGT is entirely orthogonal to token-level acceleration methods, allowing the two to be seamlessly combined for compounded speedups without compromising reconstruction fidelity. Code is available at https://github.com/Powertony102/S-VGGT.

62.9CVMay 6Code

QuadBox: Accelerating 3D Gaussian Splatting with Geometry-Aware Boxes

Xinze Li, Bohan Yang, Pengxu Chen et al.

3D Gaussian Splatting (3DGS) has emerged as an advanced technique for real-time novel view synthesis by representing scene geometry and appearance using differentiable Gaussian primitives. However, efficiently computing precise Gaussian-tile intersections remains a critical task in the rasterization pipeline. To this end, we propose QuadBox, a method that leverages four axis-aligned bounding boxes to tightly encapsulate projected Gaussians in a discrete manner. First, we derive a geometry-aware stretching factor that enables the construction of a tile-aligned QuadBox, which covers the elliptical projection and largely excludes irrelevant tiles. Second, we introduce QPass, a single-pass tile traversal algorithm that exhaustively exploits the discrete nature of QuadBox, ensuring that the tile intersection check is performed with simple interval tests. Experiments on public datasets show that our method accelerates the rendering speed of 3DGS by 1.85$\times$. Code is available at \href{https://github.com/Powertony102/QuadBox}{https://github.com/Powertony102/QuadBox}.

39.8HCMar 11

Conversational AI-Enhanced Exploration System to Query Large-Scale Digitised Collections of Natural History Museums

Yiyuan Wang, Andrew Johnston, Zoë Sadokierski et al.

Recent digitisation efforts in natural history museums have produced large volumes of collection data, yet their scale and scientific complexity often hinder public access and understanding. Conventional data management tools, such as databases, restrict exploration through keyword-based search or require specialised schema knowledge. This paper presents a system design that uses conversational AI to query nearly 1.7 million digitised specimen records from the life-science collections of the Australian Museum. Designed and developed through a human-centred design process, the system contains an interactive map for visual-spatial exploration and a natural-language conversational agent that retrieves detailed specimen data and answers collection-specific questions. The system leverages function-calling capabilities of contemporary large language models to dynamically retrieve structured data from external APIs, enabling fast, real-time interaction with extensive yet frequently updated datasets. Our work provides a new approach of connecting large museum collections with natural language-based queries and informs future designs of scientific AI agents for natural history museums.

LGOct 23, 2020

Generating Long Financial Report using Conditional Variational Autoencoders with Knowledge Distillation

Yunpeng Ren, Ziao Wang, Yiyuan Wang et al.

Automatically generating financial report from a piece of news is quite a challenging task. Apparently, the difficulty of this task lies in the lack of sufficient background knowledge to effectively generate long financial report. To address this issue, this paper proposes the conditional variational autoencoders (CVAE) based approach which distills external knowledge from a corpus of news-report data. Particularly, we choose Bi-GRU as the encoder and decoder component of CVAE, and learn the latent variable distribution from input news. A higher level latent variable distribution is learnt from a corpus set of news-report data, respectively extr acted for each input news, to provide background knowledge to previously learnt latent variable distribution. Then, a teacher-student network is employed to distill knowledge to refine theoutput of the decoder component. To evaluate the model performance of the proposed approach, extensive experiments are preformed on a public dataset and two widely adopted evaluation criteria, i.e., BLEU and ROUGE, are chosen in the experiment. The promising experimental results demonstrate that the proposed approach is superior to the rest compared methods.

CRDec 10, 2019

Lightweight Sybil-Resilient Multi-Robot Networks by Multipath Manipulation

Yong Huang, Wei Wang, Yiyuan Wang et al.

Wireless networking opens up many opportunities to facilitate miniaturized robots in collaborative tasks, while the openness of wireless medium exposes robots to the threats of Sybil attackers, who can break the fundamental trust assumption in robotic collaboration by forging a large number of fictitious robots. Recent advances advocate the adoption of bulky multi-antenna systems to passively obtain fine-grained physical layer signatures, rendering them unaffordable to miniaturized robots. To overcome this conundrum, this paper presents ScatterID, a lightweight system that attaches featherlight and batteryless backscatter tags to single-antenna robots to defend against Sybil attacks. Instead of passively "observing" signatures, ScatterID actively "manipulates" multipath propagation by using backscatter tags to intentionally create rich multipath features obtainable to a single-antenna robot. These features are used to construct a distinct profile to detect the real signal source, even when the attacker is mobile and power-scaling. We implement ScatterID on the iRobot Create platform and evaluate it in typical indoor and outdoor environments. The experimental results show that our system achieves a high AUROC of 0.988 and an overall accuracy of 96.4% for identity verification.

AIFeb 15, 2017

Local Search for Minimum Weight Dominating Set with Two-Level Configuration Checking and Frequency Based Scoring Function

Yiyuan Wang, Shaowei Cai, Minghao Yin

The Minimum Weight Dominating Set (MWDS) problem is an important generalization of the Minimum Dominating Set (MDS) problem with extensive applications. This paper proposes a new local search algorithm for the MWDS problem, which is based on two new ideas. The first idea is a heuristic called two-level configuration checking (CC2), which is a new variant of a recent powerful configuration checking strategy (CC) for effectively avoiding the recent search paths. The second idea is a novel scoring function based on the frequency of being uncovered of vertices. Our algorithm is called CC2FS, according to the names of the two ideas. The experimental results show that, CC2FS performs much better than some state-of-the-art algorithms in terms of solution quality on a broad range of MWDS benchmarks.