IROct 26, 2022
A Transformer-based Framework for POI-level Social Post GeolocationMenglin Li, Kwan Hui Lim, Teng Guo et al.
POI-level geo-information of social posts is critical to many location-based applications and services. However, the multi-modality, complexity and diverse nature of social media data and their platforms limit the performance of inferring such fine-grained locations and their subsequent applications. To address this issue, we present a transformer-based general framework, which builds upon pre-trained language models and considers non-textual data, for social post geolocation at the POI level. To this end, inputs are categorized to handle different social data, and an optimal combination strategy is provided for feature representations. Moreover, a uniform representation of hierarchy is proposed to learn temporal information, and a concatenated version of encodings is employed to capture feature-wise positions better. Experimental results on various social datasets demonstrate that three variants of our proposed framework outperform multiple state-of-art baselines by a large margin in terms of accuracy and distance error metrics.
CVDec 11, 2023Code
SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable Pedestrian Attribute RecognitionJifeng Shen, Teng Guo, Xin Zuo et al.
Global feature based Pedestrian Attribute Recognition (PAR) models are often poorly localized when using Grad-CAM for attribute response analysis, which has a significant impact on the interpretability, generalizability and performance. Previous researches have attempted to improve generalization and interpretation through meticulous model design, yet they often have neglected or underutilized effective prior information crucial for PAR. To this end, a novel Scale and Spatial Priors Guided Network (SSPNet) is proposed for PAR, which is mainly composed of the Adaptive Feature Scale Selection (AFSS) and Prior Location Extraction (PLE) modules. The AFSS module learns to provide reasonable scale prior information for different attribute groups, allowing the model to focus on different levels of feature maps with varying semantic granularity. The PLE module reveals potential attribute spatial prior information, which avoids unnecessary attention on irrelevant areas and lowers the risk of model over-fitting. More specifically, the scale prior in AFSS is adaptively learned from different layers of feature pyramid with maximum accuracy, while the spatial priors in PLE can be revealed from part feature with different granularity (such as image blocks, human pose keypoint and sparse sampling points). Besides, a novel IoU based attribute localization metric is proposed for Weakly-supervised Pedestrian Attribute Localization (WPAL) based on the improved Grad-CAM for attribute response mask. The experimental results on the intra-dataset and cross-dataset evaluations demonstrate the effectiveness of our proposed method in terms of mean accuracy (mA). Furthermore, it also achieves superior performance on the PCS dataset for attribute localization in terms of IoU. Code will be released at https://github.com/guotengg/SSPNet.
CRNov 6, 2025
P-MIA: A Profiled-Based Membership Inference Attack on Cognitive Diagnosis ModelsMingliang Hou, Yinuo Wang, Teng Guo et al.
Cognitive diagnosis models (CDMs) are pivotal for creating fine-grained learner profiles in modern intelligent education platforms. However, these models are trained on sensitive student data, raising significant privacy concerns. While membership inference attacks (MIA) have been studied in various domains, their application to CDMs remains a critical research gap, leaving their privacy risks unquantified. This paper is the first to systematically investigate MIA against CDMs. We introduce a novel and realistic grey box threat model that exploits the explainability features of these platforms, where a model's internal knowledge state vectors are exposed to users through visualizations such as radar charts. We demonstrate that these vectors can be accurately reverse-engineered from such visualizations, creating a potent attack surface. Based on this threat model, we propose a profile-based MIA (P-MIA) framework that leverages both the model's final prediction probabilities and the exposed internal knowledge state vectors as features. Extensive experiments on three real-world datasets against mainstream CDMs show that our grey-box attack significantly outperforms standard black-box baselines. Furthermore, we showcase the utility of P-MIA as an auditing tool by successfully evaluating the efficacy of machine unlearning techniques and revealing their limitations.
LGNov 6, 2025
PrivacyCD: Hierarchical Unlearning for Protecting Student Privacy in Cognitive DiagnosisMingliang Hou, Yinuo Wang, Teng Guo et al.
The need to remove specific student data from cognitive diagnosis (CD) models has become a pressing requirement, driven by users' growing assertion of their "right to be forgotten". However, existing CD models are largely designed without privacy considerations and lack effective data unlearning mechanisms. Directly applying general purpose unlearning algorithms is suboptimal, as they struggle to balance unlearning completeness, model utility, and efficiency when confronted with the unique heterogeneous structure of CD models. To address this, our paper presents the first systematic study of the data unlearning problem for CD models, proposing a novel and efficient algorithm: hierarchical importanceguided forgetting (HIF). Our key insight is that parameter importance in CD models exhibits distinct layer wise characteristics. HIF leverages this via an innovative smoothing mechanism that combines individual and layer, level importance, enabling a more precise distinction of parameters associated with the data to be unlearned. Experiments on three real world datasets show that HIF significantly outperforms baselines on key metrics, offering the first effective solution for CD models to respond to user data removal requests and for deploying high-performance, privacy preserving AI systems
CVSep 29, 2025Code
UI-UG: A Unified MLLM for UI Understanding and GenerationHao Yang, Weijie Qiu, Ru Zhang et al.
Although Multimodal Large Language Models (MLLMs) have been widely applied across domains, they are still facing challenges in domain-specific tasks, such as User Interface (UI) understanding accuracy and UI generation quality. In this paper, we introduce UI-UG (a unified MLLM for UI Understanding and Generation), integrating both capabilities. For understanding tasks, we employ Supervised Fine-tuning (SFT) combined with Group Relative Policy Optimization (GRPO) to enhance fine-grained understanding on the modern complex UI data. For generation tasks, we further use Direct Preference Optimization (DPO) to make our model generate human-preferred UIs. In addition, we propose an industrially effective workflow, including the design of an LLM-friendly domain-specific language (DSL), training strategies, rendering processes, and evaluation metrics. In experiments, our model achieves state-of-the-art (SOTA) performance on understanding tasks, outperforming both larger general-purpose MLLMs and similarly-sized UI-specialized models. Our model is also on par with these larger MLLMs in UI generation performance at a fraction of the computational cost. We also demonstrate that integrating understanding and generation tasks can improve accuracy and quality for both tasks. Code and Model: https://github.com/neovateai/UI-UG
CVJun 20, 2025Code
RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and TrackingTeng Guo, Jingjin Yu
We introduce a robust framework, RGBTrack, for real-time 6D pose estimation and tracking that operates solely on RGB data, thereby eliminating the need for depth input for such dynamic and precise object pose tracking tasks. Building on the FoundationPose architecture, we devise a novel binary search strategy combined with a render-and-compare mechanism to efficiently infer depth and generate robust pose hypotheses from true-scale CAD models. To maintain stable tracking in dynamic scenarios, including rapid movements and occlusions, RGBTrack integrates state-of-the-art 2D object tracking (XMem) with a Kalman filter and a state machine for proactive object pose recovery. In addition, RGBTrack's scale recovery module dynamically adapts CAD models of unknown scale using an initial depth estimate, enabling seamless integration with modern generative reconstruction techniques. Extensive evaluations on benchmark datasets demonstrate that RGBTrack's novel depth-free approach achieves competitive accuracy and real-time performance, making it a promising practical solution candidate for application areas including robotics, augmented reality, and computer vision. The source code for our implementation will be made publicly available at https://github.com/GreatenAnoymous/RGBTrack.git.
ROFeb 3, 2022Code
Interleaving Monte Carlo Tree Search and Self-Supervised Learning for Object Retrieval in ClutterBaichuan Huang, Teng Guo, Abdeslam Boularias et al.
In this study, working with the task of object retrieval in clutter, we have developed a robot learning framework in which Monte Carlo Tree Search (MCTS) is first applied to enable a Deep Neural Network (DNN) to learn the intricate interactions between a robot arm and a complex scene containing many objects, allowing the DNN to partially clone the behavior of MCTS. In turn, the trained DNN is integrated into MCTS to help guide its search effort. We call this approach learning-guided Monte Carlo tree search for Object REtrieval (MORE), which delivers significant computational efficiency gains and added solution optimality. MORE is a self-supervised robotics framework/pipeline capable of working in the real world that successfully embodies the System 2 to System 1 learning philosophy proposed by Kahneman, where learned knowledge, used properly, can help greatly speed up a time-consuming decision process over time. Videos and supplementary material can be found at https://github.com/arc-l/more
ROJun 20, 2025
Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot GraspingTeng Guo, Baichuan Huang, Jingjin Yu
Accurate 6D object pose estimation is a prerequisite for successfully completing robotic prehensile and non-prehensile manipulation tasks. At present, 6D pose estimation for robotic manipulation generally relies on depth sensors based on, e.g., structured light, time-of-flight, and stereo-vision, which can be expensive, produce noisy output (as compared with RGB cameras), and fail to handle transparent objects. On the other hand, state-of-the-art monocular depth estimation models (MDEMs) provide only affine-invariant depths up to an unknown scale and shift. Metric MDEMs achieve some successful zero-shot results on public datasets, but fail to generalize. We propose a novel framework, Monocular One-shot Metric-depth Alignment (MOMA), to recover metric depth from a single RGB image, through a one-shot adaptation building on MDEM techniques. MOMA performs scale-rotation-shift alignments during camera calibration, guided by sparse ground-truth depth points, enabling accurate depth estimation without additional data collection or model retraining on the testing setup. MOMA supports fine-tuning the MDEM on transparent objects, demonstrating strong generalization capabilities. Real-world experiments on tabletop 2-finger grasping and suction-based bin-picking applications show MOMA achieves high success rates in diverse tasks, confirming its effectiveness.
AIFeb 18, 2022
Surf or sleep? Understanding the influence of bedtime patterns on campusTeng Guo, Linhong Li, Dongyu Zhang et al.
Poor sleep habits may cause serious problems of mind and body, and it is a commonly observed issue for college students due to study workload as well as peer and social influence. Understanding its impact and identifying students with poor sleep habits matters a lot in educational management. Most of the current research is either based on self-reports and questionnaires, suffering from a small sample size and social desirability bias, or the methods used are not suitable for the education system. In this paper, we develop a general data-driven method for identifying students' sleep patterns according to their Internet access pattern stored in the education management system and explore its influence from various aspects. First, we design a Possion-based probabilistic mixture model to cluster students according to the distribution of bedtime and identify students who are used to staying up late. Second, we profile students from five aspects (including eight dimensions) based on campus-behavior data and build Bayesian networks to explore the relationship between behavioral characteristics and sleeping habits. Finally, we test the predictability of sleeping habits. This paper not only contributes to the understanding of student sleep from a cognitive and behavioral perspective but also presents a new approach that provides an effective framework for various educational institutions to detect the sleeping patterns of students.
ROJan 22, 2022
Sub-1.5 Time-Optimal Multi-Robot Path Planning on Grids in Polynomial TimeTeng Guo, Jingjin Yu
Graph-based multi-robot path planning (MRPP) is NP-hard to optimally solve. In this work, we propose the first low polynomial-time algorithm for MRPP achieving 1--1.5 asymptotic optimality guarantees on makespan for random instances under very high robot density, with high probability. The dual guarantee on computational efficiency and solution optimality suggests our proposed general method is promising in significantly scaling up multi-robot applications for logistics, e.g., at large robotic warehouses. Specifically, on an $m_1\times m_2$ gird, $m_1 \ge m_2$, our RTH (Rubik Table with Highways) algorithm computes solutions for routing up to $\frac{m_1m_2}{3}$ robots with uniformly randomly distributed start and goal configurations with a makespan of $m_1 + 2m_2 + o(m_1)$, with high probability. Because the minimum makespan for such instances is $m_1 + m_2 - o(m_1)$, also with high probability, RTH guarantees $\frac{m_1+2m_2}{m_1+m_2}$ optimality as $m_1 \to \infty$ for random instances with up to $\frac{1}{3}$ robot density, with high probability. $\frac{m_1+2m_2}{m_1+m_2} \in (1, 1.5]$. Alongside this key result, we also establish a series of related results supporting even higher robot densities and environments with regularly distributed obstacles, which directly map to real-world parcel sorting scenarios. Building on the baseline methods with provable guarantees, we have developed effective, principled heuristics that further improve the computed optimality of the RTH algorithms. In extensive numerical evaluations, RTH and its variants demonstrate exceptional scalability as compared with methods including ECBS and DDM, scaling to over $450 \times 300$ grids with $45,000$ robots, and consistently achieves makespan around $1.5$ optimal or better, as predicted by our theoretical analysis.
ROMar 25, 2021
Spatial and Temporal Splitting Heuristics for Multi-Robot Motion PlanningTeng Guo, Shuai D. Han, Jingjin Yu
In this work, we systematically examine the application of spatio-temporal splitting heuristics to the Multi-Robot Motion Planning (MRMP) problem in a graph-theoretic setting: a problem known to be NP-hard to optimally solve. Following the divide-and-conquer principle, we design multiple spatial and temporal splitting schemes that can be applied to any existing MRMP algorithm, including integer programming solvers and Enhanced Conflict Based Search, in an orthogonal manner. The combination of a good baseline MRMP algorithm with a proper splitting heuristic proves highly effective, allowing the resolution of problems 10+ times than what is possible previously, as corroborated by extensive numerical evaluations. Notably, spatial partition of problem fusing with the temporal splitting heuristic and the enhanced conflict based search (ECBS) algorithm increases the scalability of ECBS on large and challenging DAO maps by 5--15 folds with negligible impact on solution optimality.
LGDec 27, 2019
Graduate Employment Prediction with BiasTeng Guo, Feng Xia, Shihao Zhen et al.
The failure of landing a job for college students could cause serious social consequences such as drunkenness and suicide. In addition to academic performance, unconscious biases can become one key obstacle for hunting jobs for graduating students. Thus, it is necessary to understand these unconscious biases so that we can help these students at an early stage with more personalized intervention. In this paper, we develop a framework, i.e., MAYA (Multi-mAjor emploYment stAtus) to predict students' employment status while considering biases. The framework consists of four major components. Firstly, we solve the heterogeneity of student courses by embedding academic performance into a unified space. Then, we apply a generative adversarial network (GAN) to overcome the class imbalance problem. Thirdly, we adopt Long Short-Term Memory (LSTM) with a novel dropout mechanism to comprehensively capture sequential information among semesters. Finally, we design a bias-based regularization to capture the job market biases. We conduct extensive experiments on a large-scale educational dataset and the results demonstrate the effectiveness of our prediction framework.