AIMay 5, 2022
Multi-Graph based Multi-Scenario Recommendation in Large-scale Online Video ServicesFan Zhang, Qiuying Peng, Yulin Wu et al.
Recently, industrial recommendation services have been boosted by the continual upgrade of deep learning methods. However, they still face de-biasing challenges such as exposure bias and cold-start problem, where circulations of machine learning training on human interaction history leads algorithms to repeatedly suggest exposed items while ignoring less-active ones. Additional problems exist in multi-scenario platforms, e.g. appropriate data fusion from subsidiary scenarios, which we observe could be alleviated through graph structured data integration via message passing. In this paper, we present a multi-graph structured multi-scenario recommendation solution, which encapsulates interaction data across scenarios with multi-graph and obtains representation via graph learning. Extensive offline and online experiments on real-world datasets are conducted where the proposed method demonstrates an increase of 0.63% and 0.71% in CTR and Video Views per capita on new users over deployed set of baselines and outperforms regular method in increasing the number of outer-scenario videos by 25% and video watches by 116%, validating its superiority in activating cold videos and enriching target recommendation.
LGJul 26, 2023
HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated LearningXinting Liao, Weiming Liu, Chaochao Chen et al.
Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i.e., hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID set.
CVAug 30, 2024
OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense MappingMeng Wang, Junyi Wang, Changqun Xia et al.
3D Gaussian splatting (3DGS) has recently demonstrated promising advancements in RGB-D online dense mapping. Nevertheless, existing methods excessively rely on per-pixel depth cues to perform map densification, which leads to significant redundancy and increased sensitivity to depth noise. Additionally, explicitly storing 3D Gaussian parameters of room-scale scene poses a significant storage challenge. In this paper, we introduce OG-Mapping, which leverages the robust scene structural representation capability of sparse octrees, combined with structured 3D Gaussian representations, to achieve efficient and robust online dense mapping. Moreover, OG-Mapping employs an anchor-based progressive map refinement strategy to recover the scene structures at multiple levels of detail. Instead of maintaining a small number of active keyframes with a fixed keyframe window as previous approaches do, a dynamic keyframe window is employed to allow OG-Mapping to better tackle false local minima and forgetting issues. Experimental results demonstrate that OG-Mapping delivers more robust and superior realism mapping results than existing Gaussian-based RGB-D online mapping methods with a compact model, and no additional post-processing is required.
CVSep 6, 2024
SCARF: Scalable Continual Learning Framework for Memory-efficient Multiple Neural Radiance FieldsYuze Wang, Junyi Wang, Chen Wang et al.
This paper introduces a novel continual learning framework for synthesising novel views of multiple scenes, learning multiple 3D scenes incrementally, and updating the network parameters only with the training data of the upcoming new scene. We build on Neural Radiance Fields (NeRF), which uses multi-layer perceptron to model the density and radiance field of a scene as the implicit function. While NeRF and its extensions have shown a powerful capability of rendering photo-realistic novel views in a single 3D scene, managing these growing 3D NeRF assets efficiently is a new scientific problem. Very few works focus on the efficient representation or continuous learning capability of multiple scenes, which is crucial for the practical applications of NeRF. To achieve these goals, our key idea is to represent multiple scenes as the linear combination of a cross-scene weight matrix and a set of scene-specific weight matrices generated from a global parameter generator. Furthermore, we propose an uncertain surface knowledge distillation strategy to transfer the radiance field knowledge of previous scenes to the new model. Representing multiple 3D scenes with such weight matrices significantly reduces memory requirements. At the same time, the uncertain surface distillation strategy greatly overcomes the catastrophic forgetting problem and maintains the photo-realistic rendering quality of previous scenes. Experiments show that the proposed approach achieves state-of-the-art rendering quality of continual learning NeRF on NeRF-Synthetic, LLFF, and TanksAndTemples datasets while preserving extra low storage cost.
GRSep 25, 2025Code
ArchGPT: Understanding the World's Architectures with Large Multimodal ModelsYuze Wang, Luo Yang, Junyi Wang et al.
Architecture embodies aesthetic, cultural, and historical values, standing as a tangible testament to human civilization. Researchers have long leveraged virtual reality (VR), mixed reality (MR), and augmented reality (AR) to enable immersive exploration and interpretation of architecture, enhancing accessibility, public understanding, and creative workflows around architecture in education, heritage preservation, and professional design practice. However, existing VR/MR/AR systems are often developed case-by-case, relying on hard-coded annotations and task-specific interactions that do not scale across diverse built environments. In this work, we present ArchGPT, a multimodal architectural visual question answering (VQA) model, together with a scalable data-construction pipeline for curating high-quality, architecture-specific VQA annotations. This pipeline yields Arch-300K, a domain-specialized dataset of approximately 315,000 image-question-answer triplets. Arch-300K is built via a multi-stage process: first, we curate architectural scenes from Wikimedia Commons and filter unconstrained tourist photo collections using a novel coarse-to-fine strategy that integrates 3D reconstruction and semantic segmentation to select occlusion-free, structurally consistent architectural images. To mitigate noise and inconsistency in raw textual metadata, we propose an LLM-guided text verification and knowledge-distillation pipeline to generate reliable, architecture-specific question-answer pairs. Using these curated images and refined metadata, we further synthesize formal analysis annotations-including detailed descriptions and aspect-guided conversations-to provide richer semantic variety while remaining faithful to the data. We perform supervised fine-tuning of an open-source multimodal backbone ,ShareGPT4V-7B, on Arch-300K, yielding ArchGPT.
CLMay 28, 2025Code
ICH-Qwen: A Large Language Model Towards Chinese Intangible Cultural HeritageWenhao Ye, Tiansheng Zheng, Yue Qi et al.
The intangible cultural heritage (ICH) of China, a cultural asset transmitted across generations by various ethnic groups, serves as a significant testament to the evolution of human civilization and holds irreplaceable value for the preservation of historical lineage and the enhancement of cultural self-confidence. However, the rapid pace of modernization poses formidable challenges to ICH, including threats damage, disappearance and discontinuity of inheritance. China has the highest number of items on the UNESCO Intangible Cultural Heritage List, which is indicative of the nation's abundant cultural resources and emphasises the pressing need for ICH preservation. In recent years, the rapid advancements in large language modelling have provided a novel technological approach for the preservation and dissemination of ICH. This study utilises a substantial corpus of open-source Chinese ICH data to develop a large language model, ICH-Qwen, for the ICH domain. The model employs natural language understanding and knowledge reasoning capabilities of large language models, augmented with synthetic data and fine-tuning techniques. The experimental results demonstrate the efficacy of ICH-Qwen in executing tasks specific to the ICH domain. It is anticipated that the model will provide intelligent solutions for the protection, inheritance and dissemination of intangible cultural heritage, as well as new theoretical and practical references for the sustainable development of intangible cultural heritage. Furthermore, it is expected that the study will open up new paths for digital humanities research.
LGMay 19, 2023Code
Graph Propagation Transformer for Graph Representation LearningZhe Chen, Hao Tan, Tao Wang et al.
This paper presents a novel transformer architecture for graph representation learning. The core insight of our method is to fully consider the information propagation among nodes and edges in a graph when building the attention module in the transformer blocks. Specifically, we propose a new attention mechanism called Graph Propagation Attention (GPA). It explicitly passes the information among nodes and edges in three ways, i.e. node-to-node, node-to-edge, and edge-to-node, which is essential for learning graph-structured data. On this basis, we design an effective transformer architecture named Graph Propagation Transformer (GPTrans) to further help learn graph data. We verify the performance of GPTrans in a wide range of graph learning experiments on several benchmark datasets. These results show that our method outperforms many state-of-the-art transformer-based graph models with better performance. The code will be released at https://github.com/czczup/GPTrans.
CVDec 25, 2024
Generative Landmarks Guided Eyeglasses Removal 3D Face ReconstructionDapeng Zhao, Yue Qi
Single-view 3D face reconstruction is a fundamental Computer Vision problem of extraordinary difficulty. Current systems often assume the input is unobstructed faces which makes their method not suitable for in-the-wild conditions. We present a method for performing a 3D face that removes eyeglasses from a single image. Existing facial reconstruction methods fail to remove eyeglasses automatically for generating a photo-realistic 3D face "in-the-wild".The innovation of our method lies in a process for identifying the eyeglasses area robustly and remove it intelligently. In this work, we estimate the 2D face structure of the reasonable position of the eyeglasses area, which is used for the construction of 3D texture. An excellent anti-eyeglasses face reconstruction method should ensure the authenticity of the output, including the topological structure between the eyes, nose, and mouth. We achieve this via a deep learning architecture that performs direct regression of a 3DMM representation of the 3D facial geometry from a single 2D image. We also demonstrate how the related face parsing task can be incorporated into the proposed framework and help improve reconstruction quality. We conduct extensive experiments on existing 3D face reconstruction tasks as concrete examples to demonstrate the method's superior regulation ability over existing methods often break down.
IVApr 20, 2024
SSVT: Self-Supervised Vision Transformer For Eye Disease Diagnosis Based On Fundus ImagesJiaqi Wang, Mengtian Kang, Yong Liu et al.
Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results. However, current methods are commonly based on supervised methods, bringing in a heavy workload to biomedical staff and hence suffering in expanding effective databases. To address this issue, in this article, we established a label-free method, name 'SSVT',which can automatically analyze un-labeled fundus images and generate high evaluation accuracy of 97.0% of four main eye diseases based on six public datasets and two datasets collected by Beijing Tongren Hospital. The promising results showcased the effectiveness of the proposed unsupervised learning method, and the strong application potential in biomedical resource shortage regions to improve global eye health.
GRJul 26, 2025
Taking Language Embedded 3D Gaussian Splatting into the WildYuze Wang, Yue Qi
Recent advances in leveraging large-scale Internet photo collections for 3D reconstruction have enabled immersive virtual exploration of landmarks and historic sites worldwide. However, little attention has been given to the immersive understanding of architectural styles and structural knowledge, which remains largely confined to browsing static text-image pairs. Therefore, can we draw inspiration from 3D in-the-wild reconstruction techniques and use unconstrained photo collections to create an immersive approach for understanding the 3D structure of architectural components? To this end, we extend language embedded 3D Gaussian splatting (3DGS) and propose a novel framework for open-vocabulary scene understanding from unconstrained photo collections. Specifically, we first render multiple appearance images from the same viewpoint as the unconstrained image with the reconstructed radiance field, then extract multi-appearance CLIP features and two types of language feature uncertainty maps-transient and appearance uncertainty-derived from the multi-appearance features to guide the subsequent optimization process. Next, we propose a transient uncertainty-aware autoencoder, a multi-appearance language field 3DGS representation, and a post-ensemble strategy to effectively compress, learn, and fuse language features from multiple appearances. Finally, to quantitatively evaluate our method, we introduce PT-OVS, a new benchmark dataset for assessing open-vocabulary segmentation performance on unconstrained photo collections. Experimental results show that our method outperforms existing methods, delivering accurate open-vocabulary segmentation and enabling applications such as interactive roaming with open-vocabulary queries, architectural style pattern recognition, and 3D scene editing.
IVApr 20, 2024
Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine LearningYong Liu, Mengtian Kang, Shuo Gao et al.
Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To address this dilemma, we propose a general self-supervised machine learning framework that can handle diverse fundus diseases from unlabeled fundus images. Our method's AUC surpasses existing supervised approaches by 15.7%, and even exceeds performance of a single human expert. Furthermore, our model adapts well to various datasets from different regions, races, and heterogeneous image sources or qualities from multiple cameras or devices. Our method offers a label-free general framework to diagnose fundus diseases, which could potentially benefit telehealth programs for early screening of people at risk of vision loss.
CVDec 25, 2024
3D Face Reconstruction With Geometry Details From a Single Color Image Under Occluded ScenesDapeng Zhao, Yue Qi
3D face reconstruction technology aims to generate a face stereo model naturally and realistically. Previous deep face reconstruction approaches are typically designed to generate convincing textures and cannot generalize well to multiple occluded scenarios simultaneously. By introducing bump mapping, we successfully added mid-level details to coarse 3D faces. More innovatively, our method takes into account occlusion scenarios. Thus on top of common 3D face reconstruction approaches, we in this paper propose a unified framework to handle multiple types of obstruction simultaneously (e.g., hair, palms and glasses et al.).Extensive experiments and comparisons demonstrate that our method can generate high-quality reconstruction results with geometry details from captured facial images under occluded scenes.
CVDec 25, 2024
Generative Face Parsing Map Guided 3D Face Reconstruction Under Occluded ScenesDapeng Zhao, Yue Qi
Over the past few years, single-view 3D face reconstruction methods can produce beautiful 3D models. Nevertheless,the input of these works is unobstructed faces.We describe a system designed to reconstruct convincing face texture in the case of occlusion.Motivated by parsing facial features,we propose a complete face parsing map generation method guided by landmarks.We estimate the 2D face structure of the reasonable position of the occlusion area,which is used for the construction of 3D texture.An excellent anti-occlusion face reconstruction method should ensure the authenticity of the output,including the topological structure between the eyes,nose, and mouth. We extensively tested our method and its components, qualitatively demonstrating the rationality of our estimated facial structure. We conduct extensive experiments on general 3D face reconstruction tasks as concrete examples to demonstrate the method's superior regulation ability over existing methods often break down.We further provide numerous quantitative examples showing that our method advances both the quality and the robustness of 3D face reconstruction under occlusion scenes.
CVJun 4, 2024
WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo CollectionsYuze Wang, Junyi Wang, Yue Qi
Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics. Recently, 3D Gaussian Splatting (3DGS) has shown promise for photorealistic and real-time NVS of static scenes. Building on 3DGS, we propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections. Our key innovation is a residual-based spherical harmonic coefficients transfer module that adapts 3DGS to varying lighting conditions and photometric post-processing. This lightweight module can be pre-computed and ensures efficient gradient propagation from rendered images to 3D Gaussian attributes. Additionally, we observe that the appearance encoder and the transient mask predictor, the two most critical parts of NVS from unconstrained photo collections, can be mutually beneficial. We introduce a plug-and-play lightweight spatial attention module to simultaneously predict transient occluders and latent appearance representation for each image. After training and preprocessing, our method aligns with the standard 3DGS format and rendering pipeline, facilitating seamlessly integration into various 3DGS applications. Extensive experiments on diverse datasets show our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.
ITAug 13, 2020
Secure Transmission in MIMO-NOMA NetworksYue Qi, Mojtaba Vaezi
This letter focuses on the physical layer security over two-user multiple-input multiple-output (MIMO) non-orthogonal multiple access (NOMA) networks. A linear precoding technique is designed to ensure the confidentiality of the message of each user from its counterpart. This technique first splits the base station power between the two users and, based on that, decomposes the secure MIMO-NOMA channel into two MIMO wiretap channels, and designs the transmit covariance matrix for each channel separately. The proposed method substantially enlarges the secrecy rate compared to existing linear precoding methods and strikes a balance between performance and computation cost. Simulation results verify the effectiveness of the proposed method.
COMP-PHJun 1, 2020
Wavelet Scattering Networks for Atomistic Systems with Extrapolation of Material PropertiesPaul Sinz, Michael W. Swift, Xavier Brumwell et al.
The dream of machine learning in materials science is for a model to learn the underlying physics of an atomic system, allowing it to move beyond interpolation of the training set to the prediction of properties that were not present in the original training data. In addition to advances in machine learning architectures and training techniques, achieving this ambitious goal requires a method to convert a 3D atomic system into a feature representation that preserves rotational and translational symmetry, smoothness under small perturbations, and invariance under re-ordering. The atomic orbital wavelet scattering transform preserves these symmetries by construction, and has achieved great success as a featurization method for machine learning energy prediction. Both in small molecules and in the bulk amorphous $\text{Li}_α\text{Si}$ system, machine learning models using wavelet scattering coefficients as features have demonstrated a comparable accuracy to Density Functional Theory at a small fraction of the computational cost. In this work, we test the generalizability of our $\text{Li}_α\text{Si}$ energy predictor to properties that were not included in the training set, such as elastic constants and migration barriers. We demonstrate that statistical feature selection methods can reduce over-fitting and lead to remarkable accuracy in these extrapolation tasks.
COMP-PHNov 21, 2018
Steerable Wavelet Scattering for 3D Atomic Systems with Application to Li-Si Energy PredictionXavier Brumwell, Paul Sinz, Kwang Jin Kim et al.
A general machine learning architecture is introduced that uses wavelet scattering coefficients of an inputted three dimensional signal as features. Solid harmonic wavelet scattering transforms of three dimensional signals were previously introduced in a machine learning framework for the regression of properties of small organic molecules. Here this approach is extended for general steerable wavelets which are equivariant to translations and rotations, resulting in a sparse model of the target function. The scattering coefficients inherit from the wavelets invariance to translations and rotations. As an illustration of this approach a linear regression model is learned for the formation energy of amorphous lithium-silicon material states trained over a database generated using plane-wave Density Functional Theory methods. State-of-the-art results are produced as compared to other machine learning approaches over similarly generated databases.
CRJun 19, 2013
An Advanced Survey on Secure Energy-Efficient Hierarchical Routing Protocols in Wireless Sensor NetworksAbdoulaye Diop, Yue Qi, Qin Wang et al.
Wireless Sensor Networks (WSNs) are often deployed in hostile environments, which make such networks highly vulnerable and increase the risk of attacks against this type of network. WSN comprise of large number of sensor nodes with different hardware abilities and functions. Due to the limited memory resources and energy constraints, complex security algorithms cannot be used in sensor networks. Therefore, it is necessary to balance between the security level and the associated energy consumption overhead to mitigate the security risks. Hierarchical routing protocol is more energy-efficient than other routing protocols in WSNs. Many secure cluster-based routing protocols have been proposed in the literature to overcome these constraints. In this paper, we discuss Secure Energy-Efficient Hierarchical Routing Protocols in WSNs and compare them in terms of security, performance and efficiency. Security issues for WSNs and their solutions are also discussed.