GRJun 3
PureLight: Learning Complex Luminaires with Light TracingPedro Figueiredo, Zixuan Li, Beibei Wang et al.
We propose a neural formulation for estimating the appearance of complex luminaires. We focus on challenging luminaires with complex light transport (e.g., small emitters enclosed by multiple specular layers) that are difficult for (bidirectional) path tracing. To this end, we use light tracing to construct paths from emitters to the exit surfaces and formulate appearance estimation as a distribution learning problem. Specifically, we model the probability density function (pdf) of outgoing radiance on the exit surfaces using a large normalizing flow network, and recover the outgoing radiance as the product of the estimated pdf and flux. To enable efficient inference, we distill the learned appearance into a lightweight MLP that directly estimates radiance on the exit surfaces. We additionally train a sampling network for effective direct illumination computation from the luminaire, and a blending network to composite the luminaire into the scene. Our formulation makes it feasible to render challenging luminaires using low sample counts in arbitrary scenes.
LGOct 27, 2023
A Survey on Trustworthy Edge Intelligence: From Security and Reliability To Transparency and SustainabilityXiaojie Wang, Beibei Wang, Yu Wu et al.
Edge Intelligence (EI) integrates Edge Computing (EC) and Artificial Intelligence (AI) to push the capabilities of AI to the network edge for real-time, efficient and secure intelligent decision-making and computation. However, EI faces various challenges due to resource constraints, heterogeneous network environments, and diverse service requirements of different applications, which together affect the trustworthiness of EI in the eyes of stakeholders. This survey comprehensively summarizes the characteristics, architecture, technologies, and solutions of trustworthy EI. Specifically, we first emphasize the need for trustworthy EI in the context of the trend toward large models. We then provide an initial definition of trustworthy EI, explore its key characteristics and give a multi-layered architecture for trustworthy EI. Then, we summarize several important issues that hinder the achievement of trustworthy EI. Subsequently, we present enabling technologies for trustworthy EI systems and provide an in-depth literature review of the state-of-the-art solutions for realizing the trustworthiness of EI. Finally, we discuss the corresponding research challenges and open issues.
CVJun 1
Beyond Low-Rank: Low-Rank Sparse Prompting via Spiking Neural Network and Prompt FactorizationYumiao Zhao, Bo Jiang, Beibei Wang et al.
Visual Prompting (VP) has emerged as an efficient paradigm for adapting large-scale pre-trained vision models to downstream tasks by incorporating learnable prompts at the input level. However, existing VP methods typically employ dense pixel-level prompts, which often suffer from redundant perturbations, limited generalization and energy inefficiency. To overcome these limitations, we propose to integrate brain-inspired spiking learning into visual prompt learning tasks. As we know that spiking neuron can perform inexpensive information processing by transmitting the input data into discrete spike trains and return sparse outputs. Inspired by this, we propose \textbf{Lo}w-\textbf{R}ank visual \textbf{S}pike \textbf{P}rompting (LoRSP), a novel framework that learns dynamic low-rank sparse visual prompts naturally via a Spiking neuron learning mechanism. The core idea of LoRSP is to exploit the brain-inspired sparse firing mechanism of spiking neurons to generate pixel-level sparse prompt for each instance. To be specific, we first construct a series of prompt factors via low-rank factorization to capture distinct prompt subspaces. These prompt factors are then fed into an SNN architecture, which performs the integrate-and-fire process to emit spikes. As a result, our LoRSP generates a \emph{sparse} visual prompt while maintaining the low-rank constraint. This design enables instance-specific selective prompting, leading to more compact and robust adaptation across diverse downstream tasks. Extensive experiments on five heterogeneous vision backbones and multiple benchmarks demonstrate that LoRSP achieves competitive performance while requiring fewer tunable parameters compared to existing VP methods.
GRMay 17
Macrofacet Theory for Gaussian Process Statistical SurfacesMinghao Huang, Yuang Cui, Beibei Wang et al.
We present macrofacet theory to extend microfacet theory from the micro-space to the macro-space. This is achieved by transforming surfaces into volumetric representations that preserve microfacet characteristics. Therefore, we formulate a macroscopic microfacet model using a classic exponential participating medium. Meanwhile, we observe that traditional microfacet models are equivalent to Gaussian processes by definition but ignore the correlation along the geometric normal of the macro-surface. We extend microfacet theory to address this limitation. Our formulation represents Gaussian process implicit surfaces in a statistical manner, which we refer to as Gaussian process statistical surfaces. As a result, our approach converts Gaussian process statistical surfaces into classic exponential media to render surfaces, volumes and in-betweens without realizations. This enables efficient rendering and improves performance compared to realization-based approaches, while theoretically bridging microfacet models and Gaussian processes. Moreover, our approach is easy to implement.
GRMay 25
F-RNG: Feed-Forward Relightable Neural GaussiansGuangming Fu, Jiahui Fan, Jian Yang et al.
Capturing relightable 3D assets from real-world objects is a widely researched problem. Several per-scene optimization-based methods, based on 3D Gaussian splatting (3DGS), support relighting; however, they usually require dense input views, and their overfitting nature makes it difficult to generalize across scenes. Unlike per-scene optimization methods, generalized feed-forward models can directly reconstruct Gaussians from sparse input views. However, the resulting assets have baked-in illumination and cannot be easily used for relighting. In this paper, we present F-RNG, a feed-forward framework that directly generates relightable 3DGS assets from sparse-view inputs. Training such a model from scratch can require massive data and computing resources, and it is especially challenging to generate relightable assets in a feed-forward manner with acceptable cost. We develop F-RNG upon an existing large reconstruction model (LRM) to extract relightable representations, while also utilizing priors from an intrinsic decomposition model (IDM). Specifically, we first introduce a latent-interpolated fine-grained geometry synthesis to enhance the LRM's geometry representation. Second, we propose a prior-guided relightable appearance distillation to extract relightable neural representations by incorporating IDM priors. Finally, a universal neural renderer enables flexible and high-fidelity relighting. F-RNG requires neither re-training nor fine-tuning of the underlying LRMs, thus can automatically benefit from better LRMs and IDMs in the future. With only small networks that can be trained with affordable data and computational resources, F-RNG avoids the repetitive inference of large models under different light conditions. By comparison to the state-of-the-art LRM-based relighting method, F-RNG achieves ~25x faster relighting, as well as superior quality (~+2.0 dB).
GRMay 6
PureSample: Neural Materials Learned by Sampling MicrogeometryZixuan Li, Zixiong Wang, Jian Yang et al.
Traditional physically-based material models rely on analytically derived bidirectional reflectance distribution functions (BRDFs), typically by considering statistics of micro-primitives such as facets, flakes, or spheres, sometimes combined with multi-bounce interactions such as layering and multiple scattering. These derivations are often complex and model-specific. Once an analytic BRDF evaluation is defined, one still needs to design an importance sampling method for it and evaluate the probability density function (pdf) of that sampling distribution, requiring further model-specific derivations. We present PureSample: a novel neural BRDF representation that allows learning a material's appearance purely by sampling forward random walks on the microgeometry, which is usually straightforward to implement. Our representation allows for efficient BRDF evaluation, importance sampling, and pdf evaluation, for homogeneous as well as spatially varying materials. We achieve this by two learnable components: first, the sampling distribution is modeled using a flow matching neural network, which allows both importance sampling and pdf evaluation; second, we introduce a view-dependent albedo term, captured by a lightweight neural network, which allows for converting a pdf value to a BRDF value for any pair of view and light directions. We demonstrate PureSample on challenging materials, including various microgeometries, multi-layered materials, and multiple-scattering microfacet materials.
CVOct 25, 2023
EdgeCalib: Multi-Frame Weighted Edge Features for Automatic Targetless LiDAR-Camera CalibrationXingchen Li, Yifan Duan, Beibei Wang et al.
In multimodal perception systems, achieving precise extrinsic calibration between LiDAR and camera is of critical importance. Previous calibration methods often required specific targets or manual adjustments, making them both labor-intensive and costly. Online calibration methods based on features have been proposed, but these methods encounter challenges such as imprecise feature extraction, unreliable cross-modality associations, and high scene-specific requirements. To address this, we introduce an edge-based approach for automatic online calibration of LiDAR and cameras in real-world scenarios. The edge features, which are prevalent in various environments, are aligned in both images and point clouds to determine the extrinsic parameters. Specifically, stable and robust image edge features are extracted using a SAM-based method and the edge features extracted from the point cloud are weighted through a multi-frame weighting strategy for feature filtering. Finally, accurate extrinsic parameters are optimized based on edge correspondence constraints. We conducted evaluations on both the KITTI dataset and our dataset. The results show a state-of-the-art rotation accuracy of 0.086° and a translation accuracy of 0.977 cm, outperforming existing edge-based calibration methods in both precision and robustness.
CVMay 23
SRUG: Shadow-Guided Relightable Urban Scene with Generation ModelYonghao Zhao, Zexin Yin, Jian Yang et al.
Creating relightable urban scenes from images or videos is widely useful but highly ill-posed. Urban environments are typically unbounded and extend beyond the visible regions. As a result, many portions of the scene remain unobserved, yet these invisible regions can cast shadows onto visible areas. Reasonably modeling shadows cast by such invisible regions is challenging and poses a significant obstacle to creating relightable urban scenes. At the same time, sparse input views and complex illumination conditions further complicate relighting, as they introduce severe ambiguities in material decomposition. In this paper, we propose Shadow-guided Relightable Urban Scene with Generation model (SRUG), a novel framework designed to address relighting challenges in urban scenes. SRUG leverages shadows to guide a 3D completion model for recovering the geometry of invisible regions, promoting the synthesis of physically reasonable shadows. In addition, SRUG employs an iterative material decomposition scheme that applies the large material model (LMM) to provide material supervision and iteratively decompose the scene's material properties, enabling robust material decomposition. Building upon these components, we introduce a physically-based lighting model that captures the complex illumination of urban scenes and supports reliable relighting. Extensive quantitative evaluations and visual comparisons demonstrate that our method outperforms existing approaches in both novel view synthesis and relighting tasks.
CVMay 7Code
Relit-LiVE: Relight Video by Jointly Learning Environment VideoWeiqing Xiao, Hong Li, Xiuyu Yang et al.
Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novel environment video prediction formulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a single diffusion process. This joint prediction enforces strong geometric-illumination alignment and naturally supports dynamic lighting and camera motion, significantly improving physical consistency in video relighting while easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-art video relighting and neural rendering methods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streaming video relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.
CVSep 29, 2024
RNG: Relightable Neural GaussiansJiahui Fan, Fujun Luan, Jian Yang et al.
3D Gaussian Splatting (3DGS) has shown impressive results for the novel view synthesis task, where lighting is assumed to be fixed. However, creating relightable 3D assets, especially for objects with ill-defined shapes (fur, fabric, etc.), remains a challenging task. The decomposition between light, geometry, and material is ambiguous, especially if either smooth surface assumptions or surfacebased analytical shading models do not apply. We propose Relightable Neural Gaussians (RNG), a novel 3DGS-based framework that enables the relighting of objects with both hard surfaces or soft boundaries, while avoiding assumptions on the shading model. We condition the radiance at each point on both view and light directions. We also introduce a shadow cue, as well as a depth refinement network to improve shadow accuracy. Finally, we propose a hybrid forward-deferred fitting strategy to balance geometry and appearance quality. Our method achieves significantly faster training (1.3 hours) and rendering (60 frames per second) compared to a prior method based on neural radiance fields and produces higher-quality shadows than a concurrent 3DGS-based method. Project page: https://www.whois-jiahui.fun/project_pages/RNG.
LGApr 13
Unified Graph Prompt Learning via Low-Rank Graph Message PromptingBeibei Wang, Bo Jiang, Ziyan Zhang et al.
Graph Data Prompt (GDP), which introduces specific prompts in graph data for efficiently adapting pre-trained GNNs, has become a mainstream approach to graph fine-tuning learning problem. However, existing GDPs have been respectively designed for distinct graph component (e.g., node features, edge features, edge weights) and thus operate within limited prompt spaces for graph data. To the best of our knowledge, it still lacks a unified prompter suitable for targeting all graph components simultaneously. To address this challenge, in this paper, we first propose to reinterpret a wide range of existing GDPs from an aspect of Graph Message Prompt (GMP) paradigm. Based on GMP, we then introduce a novel graph prompt learning approach, termed Low-Rank GMP (LR-GMP), which leverages low-rank prompt representation to achieve an effective and compact graph prompt learning. Unlike traditional GDPs that target distinct graph components separately, LR-GMP concurrently performs prompting on all graph components in a unified manner, thereby achieving significantly superior generalization and robustness on diverse downstream tasks. Extensive experiments on several graph benchmark datasets demonstrate the effectiveness and advantages of our proposed LR-GMP.
NESep 5, 2022
Spiking GATs: Learning Graph Attentions via Spiking Neural NetworkBeibei Wang, Bo Jiang
Graph Attention Networks (GATs) have been intensively studied and widely used in graph data learning tasks. Existing GATs generally adopt the self-attention mechanism to conduct graph edge attention learning, requiring expensive computation. It is known that Spiking Neural Networks (SNNs) can perform inexpensive computation by transmitting the input signal data into discrete spike trains and can also return sparse outputs. Inspired by the merits of SNNs, in this work, we propose a novel Graph Spiking Attention Network (GSAT) for graph data representation and learning. In contrast to self-attention mechanism in existing GATs, the proposed GSAT adopts a SNN module architecture which is obvious energy-efficient. Moreover, GSAT can return sparse attention coefficients in natural and thus can perform feature aggregation on the selective neighbors which makes GSAT perform robustly w.r.t graph edge noises. Experimental results on several datasets demonstrate the effectiveness, energy efficiency and robustness of the proposed GSAT model.
GRJul 21, 2025Code
Gaussian Splatting with Discretized SDF for Relightable AssetsZuo-Liang Zhu, Jian Yang, Beibei Wang
3D Gaussian splatting (3DGS) has shown its detailed expressive ability and highly efficient rendering speed in the novel view synthesis (NVS) task. The application to inverse rendering still faces several challenges, as the discrete nature of Gaussian primitives makes it difficult to apply geometry constraints. Recent works introduce the signed distance field (SDF) as an extra continuous representation to regularize the geometry defined by Gaussian primitives. It improves the decomposition quality, at the cost of increasing memory usage and complicating training. Unlike these works, we introduce a discretized SDF to represent the continuous SDF in a discrete manner by encoding it within each Gaussian using a sampled value. This approach allows us to link the SDF with the Gaussian opacity through an SDF-to-opacity transformation, enabling rendering the SDF via splatting and avoiding the computational cost of ray marching.The key challenge is to regularize the discrete samples to be consistent with the underlying SDF, as the discrete representation can hardly apply the gradient-based constraints (\eg Eikonal loss). For this, we project Gaussians onto the zero-level set of SDF and enforce alignment with the surface from splatting, namely a projection-based consistency loss. Thanks to the discretized SDF, our method achieves higher relighting quality, while requiring no extra memory beyond GS and avoiding complex manually designed optimization. The experiments reveal that our method outperforms existing Gaussian-based inverse rendering methods. Our code is available at https://github.com/NK-CS-ZZL/DiscretizedSDF.
LGSep 3, 2025Code
Loong: Synthesize Long Chain-of-Thoughts at Scale through VerifiersXingyue Huang, Rishabh, Gregor Franke et al.
Recent advances in Large Language Models (LLMs) have shown that their reasoning capabilities can be significantly improved through Reinforcement Learning with Verifiable Reward (RLVR), particularly in domains like mathematics and programming, where ground-truth correctness can be automatically evaluated. However, extending this success to other reasoning-intensive domains remains challenging due to the scarcity of high-quality, verifiable datasets and the high cost of human supervision. In this work, we introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification across a diverse range of reasoning-intensive domains. The framework consists of two key components: (1) LoongBench, a curated seed dataset containing 8,729 human-vetted examples across 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired with executable code and rich metadata; and (2) LoongEnv, a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples. Together, these components form an agent-environment loop that enables reinforcement learning, where an LLM-based agent is rewarded for generating Chain-of-Thought (CoT) solutions that align with code-executed answers. Empirically, we benchmark LoongBench on a broad suite of both open-source and proprietary LLMs to evaluate domain coverage and reveal performance bottlenecks. In addition, we conduct a comprehensive analysis of synthetic data generated by LoongEnv, examining correctness, difficulty, and diversity. Code and documentation are available at https://github.com/camel-ai/loong.
CVMay 4, 2024Code
Woven Fabric Capture with a Reflection-Transmission Photo PairYingjie Tang, Zixuan Li, Miloš Hašan et al.
Digitizing woven fabrics would be valuable for many applications, from digital humans to interior design. Previous work introduces a lightweight woven fabric acquisition approach by capturing a single reflection image and estimating the fabric parameters with a differentiable geometric and shading model. The renderings of the estimated fabric parameters can closely match the photo; however, the captured reflection image is insufficient to fully characterize the fabric sample reflectance. For instance, fabrics with different thicknesses might have similar reflection images but lead to significantly different transmission. We propose to recover the woven fabric parameters from two captured images: reflection and transmission. At the core of our method is a differentiable bidirectional scattering distribution function (BSDF) model, handling reflection and transmission, including single and multiple scattering. We propose a two-layer model, where the single scattering uses an SGGX phase function as in previous work, and multiple scattering uses a new azimuthally-invariant microflake definition, which we term ASGGX. This new fabric BSDF model closely matches real woven fabrics in both reflection and transmission. We use a simple setup for capturing reflection and transmission photos with a cell phone camera and two point lights, and estimate the fabric parameters via a lightweight network, together with a differentiable optimization. We also model the out-of-focus effects explicitly with a simple solution to match the thin-lens camera better. As a result, the renderings of the estimated parameters can agree with the input images on both reflection and transmission for the first time. The code for this paper is at https://github.com/lxtyin/FabricBTDF-Recovery.
CVJul 7, 2025Code
UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-IdentificationXixi Wan, Aihua Zheng, Bo Jiang et al.
Multi-modal object Re-IDentification (ReID) has gained considerable attention with the goal of retrieving specific targets across cameras using heterogeneous visual data sources. At present, multi-modal object ReID faces two core challenges: (1) learning robust features under fine-grained local noise caused by occlusion, frame loss, and other disruptions; and (2) effectively integrating heterogeneous modalities to enhance multi-modal representation. To address the above challenges, we propose a robust approach named Uncertainty-Guided Graph model for multi-modal object ReID (UGG-ReID). UGG-ReID is designed to mitigate noise interference and facilitate effective multi-modal fusion by estimating both local and sample-level aleatoric uncertainty and explicitly modeling their dependencies. Specifically, we first propose the Gaussian patch-graph representation model that leverages uncertainty to quantify fine-grained local cues and capture their structural relationships. This process boosts the expressiveness of modal-specific information, ensuring that the generated embeddings are both more informative and robust. Subsequently, we design an uncertainty-guided mixture of experts strategy that dynamically routes samples to experts exhibiting low uncertainty. This strategy effectively suppresses noise-induced instability, leading to enhanced robustness. Meanwhile, we design an uncertainty-guided routing to strengthen the multi-modal interaction, improving the performance. UGG-ReID is comprehensively evaluated on five representative multi-modal object ReID datasets, encompassing diverse spectral modalities. Experimental results show that the proposed method achieves excellent performance on all datasets and is significantly better than current methods in terms of noise immunity. Our code is available at https://github.com/wanxixi11/UGG-ReID.
LGMar 29, 2025Code
MSNGO: multi-species protein function annotation based on 3D protein structure and network propagationBeibei Wang, Boyue Cui, Shiqu Chen et al.
Motivation: In recent years, protein function prediction has broken through the bottleneck of sequence features, significantly improving prediction accuracy using high-precision protein structures predicted by AlphaFold2. While single-species protein function prediction methods have achieved remarkable success, multi-species protein function prediction methods are still in the stage of using PPI networks and sequence features. Providing effective cross-species label propagation for species with sparse protein annotations remains a challenging issue. To address this problem, we propose the MSNGO model, which integrates structural features and network propagation methods. Our validation shows that using structural features can significantly improve the accuracy of multi-species protein function prediction. Results: We employ graph representation learning techniques to extract amino acid representations from protein structure contact maps and train a structural model using a graph convolution pooling module to derive protein-level structural features. After incorporating the sequence features from ESM-2, we apply a network propagation algorithm to aggregate information and update node representations within a heterogeneous network. The results demonstrate that MSNGO outperforms previous multi-species protein function prediction methods that rely on sequence features and PPI networks. Availability: https://github.com/blingbell/MSNGO.
ROMay 7
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in SimulationYixin Zhu, Zixiong Wang, Jian Yang et al.
Reliable simulation evaluation of robot manipulation policies serves as a high-fidelity proxy for real-world performance. Although existing benchmarks cover a wide range of task categories, they lack visual realism, creating a large domain gap between simulation and reality. This undermines the reliability of simulation-based evaluation in predicting real-world performance. To mitigate the sim-to-real visual gap, we conduct a systematic analysis to isolate the effects of lighting and material. Our results show that these factors play a critical role in geometric reasoning and spatial grounding, yet are largely overlooked in existing benchmarks. Motivated by the analysis, we propose VISER, a visually realistic benchmark for evaluating robot manipulation in simulation. VISER features a high-fidelity dataset of over 1,000 3D assets with physically-based rendering (PBR) materials, along with 3D scenes created from these assets through curated layouts or generation. To this end, we propose an automated pipeline leveraging Multi-modal Large Language Models (MLLMs) for material-aware part segmentation and material retrieval, enabling scalable generation of physically plausible assets. Building on the high-fidelity 3D asset dataset, we construct diverse evaluation tasks, such as grasping, placing, and long-horizon tasks, enabling scalable and reproducible assessment of Vision-Language-Action (VLA) models. Our benchmark shows a strong correlation between simulation and real-world performance, achieving an average Pearson correlation coefficient of 0.92 across different policies.
CVMay 1
GOR-IS: 3D Gaussian Object Removal in the Intrinsic SpaceYonghao Zhao, Yupeng Gao, Jian Yang et al.
Recent advances in Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have made it standard practice to reconstruct 3D scenes from multi-view images. Removing objects from such 3D representations is a fundamental editing task that requires complete and seamless inpainting of occluded regions, ensuring consistency in geometry and appearance. Although existing methods have made notable progress in improving inpainting consistency, they often neglect global lighting effects, leading to physically implausible results. Moreover, these methods struggle with view-dependent non-Lambertian surfaces, where appearance varies across viewpoints, leading to unreliable inpainting. In this paper, we present 3D Gaussian Object Removal in the Intrinsic Space (GOR-IS), a novel framework for physically consistent and visually coherent 3D object removal. Our approach decomposes the scene into intrinsic components and explicitly models light transport to maintain global lighting effects consistency. Furthermore, we introduce an intrinsic-space inpainting module that operates directly in the material and lighting domains, effectively addressing the challenges posed by non-Lambertian surfaces. Extensive experiments on both synthetic and real-world datasets demonstrate that our framework substantially improves the physical consistency and visual coherence of object removal, outperforming existing methods by 13% in perceptual similarity (LPIPS) and 2dB in peak signal-to-noise ratio (PSNR). Code is publicly available at https://applezyh.github.io/GOR-IS-project-page/
CVApr 4, 2024
CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception TasksBeibei Wang, Shuang Meng, Lu Zhang et al.
Numerous roadside perception datasets have been introduced to propel advancements in autonomous driving and intelligent transportation systems research and development. However, it has been observed that the majority of their concentrates is on urban arterial roads, inadvertently overlooking residential areas such as parks and campuses that exhibit entirely distinct characteristics. In light of this gap, we propose CORP, which stands as the first public benchmark dataset tailored for multi-modal roadside perception tasks under campus scenarios. Collected in a university campus, CORP consists of over 205k images plus 102k point clouds captured from 18 cameras and 9 LiDAR sensors. These sensors with different configurations are mounted on roadside utility poles to provide diverse viewpoints within the campus region. The annotations of CORP encompass multi-dimensional information beyond 2D and 3D bounding boxes, providing extra support for 3D seamless tracking and instance segmentation with unique IDs and pixel masks for identifying targets, to enhance the understanding of objects and their behaviors distributed across the campus premises. Unlike other roadside datasets about urban traffic, CORP extends the spectrum to highlight the challenges for multi-modal perception in campuses and other residential areas.
CVMay 4, 2024
Real-time Neural Woven Fabric RenderingXiang Chen, Lu Wang, Beibei Wang
Woven fabrics are widely used in applications of realistic rendering, where real-time capability is also essential. However, rendering realistic woven fabrics in real time is challenging due to their complex structure and optical appearance, which cause aliasing and noise without many samples. The core of this issue is a multi-scale representation of the fabric shading model, which allows for a fast range query. Some previous neural methods deal with the issue at the cost of training on each material, which limits their practicality. In this paper, we propose a lightweight neural network to represent different types of woven fabrics at different scales. Thanks to the regularity and repetitiveness of woven fabric patterns, our network can encode fabric patterns and parameters as a small latent vector, which is later interpreted by a small decoder, enabling the representation of different types of fabrics. By applying the pixel's footprint as input, our network achieves multi-scale representation. Moreover, our network is fast and occupies little storage because of its lightweight structure. As a result, our method achieves rendering and editing woven fabrics at nearly 60 frames per second on an RTX 3090, showing a quality close to the ground truth and being free from visible aliasing and noise.
CVFeb 1, 2024
mmID: High-Resolution mmWave Imaging for Human IdentificationSakila S. Jayaweera, Sai Deepika Regani, Yuqian Hu et al.
Achieving accurate human identification through RF imaging has been a persistent challenge, primarily attributed to the limited aperture size and its consequent impact on imaging resolution. The existing imaging solution enables tasks such as pose estimation, activity recognition, and human tracking based on deep neural networks by estimating skeleton joints. In contrast to estimating joints, this paper proposes to improve imaging resolution by estimating the human figure as a whole using conditional generative adversarial networks (cGAN). In order to reduce training complexity, we use an estimated spatial spectrum using the MUltiple SIgnal Classification (MUSIC) algorithm as input to the cGAN. Our system generates environmentally independent, high-resolution images that can extract unique physical features useful for human identification. We use a simple convolution layers-based classification network to obtain the final identification result. From the experimental results, we show that resolution of the image produced by our trained generator is high enough to enable human identification. Our finding indicates high-resolution accuracy with 5% mean silhouette difference to the Kinect device. Extensive experiments in different environments on multiple testers demonstrate that our system can achieve 93% overall test accuracy in unseen environments for static human target identification.
CVMay 22, 2024
GS-ROR$^2$: Bidirectional-guided 3DGS and SDF for Reflective Object Relighting and ReconstructionZuo-Liang Zhu, Beibei Wang, Jian Yang
3D Gaussian Splatting (3DGS) has shown a powerful capability for novel view synthesis due to its detailed expressive ability and highly efficient rendering speed. Unfortunately, creating relightable 3D assets and reconstructing faithful geometry with 3DGS is still problematic, particularly for reflective objects, as its discontinuous representation raises difficulties in constraining geometries. Volumetric signed distance field (SDF) methods provide robust geometry reconstruction, while the expensive ray marching hinders its real-time application and slows the training. Besides, these methods struggle to capture sharp geometric details. To this end, we propose to guide 3DGS and SDF bidirectionally in a complementary manner, including an SDF-aided Gaussian splatting for efficient optimization of the relighting model and a GS-guided SDF enhancement for high-quality geometry reconstruction. At the core of our SDF-aided Gaussian splatting is the mutual supervision of the depth and normal between blended Gaussians and SDF, which avoids the expensive volume rendering of SDF. Thanks to this mutual supervision, the learned blended Gaussians are well-constrained with a minimal time cost. As the Gaussians are rendered in a deferred shading mode, the alpha-blended Gaussians are smooth, while individual Gaussians may still be outliers, yielding floater artifacts. Therefore, we introduce an SDF-aware pruning strategy to remove Gaussian outliers located distant from the surface defined by SDF, avoiding floater issue. This way, our GS framework provides reasonable normal and achieves realistic relighting, while the mesh from depth is still problematic. Therefore, we design a GS-guided SDF refinement, which utilizes the blended normal from Gaussians to finetune SDF. With this enhancement, our method can further provide high-quality meshes for reflective objects at the cost of 17% extra training time.
SPMay 28, 2025
CSI-Bench: A Large-Scale In-the-Wild Dataset for Multi-task WiFi SensingGuozhen Zhu, Yuqian Hu, Weihang Gao et al.
WiFi sensing has emerged as a compelling contactless modality for human activity monitoring by capturing fine-grained variations in Channel State Information (CSI). Its ability to operate continuously and non-intrusively while preserving user privacy makes it particularly suitable for health monitoring. However, existing WiFi sensing systems struggle to generalize in real-world settings, largely due to datasets collected in controlled environments with homogeneous hardware and fragmented, session-based recordings that fail to reflect continuous daily activity. We present CSI-Bench, a large-scale, in-the-wild benchmark dataset collected using commercial WiFi edge devices across 26 diverse indoor environments with 35 real users. Spanning over 461 hours of effective data, CSI-Bench captures realistic signal variability under natural conditions. It includes task-specific datasets for fall detection, breathing monitoring, localization, and motion source recognition, as well as a co-labeled multitask dataset with joint annotations for user identity, activity, and proximity. To support the development of robust and generalizable models, CSI-Bench provides standardized evaluation splits and baseline results for both single-task and multi-task learning. CSI-Bench offers a foundation for scalable, privacy-preserving WiFi sensing systems in health and broader human-centric applications.
CVApr 9, 2025
SVG-IR: Spatially-Varying Gaussian Splatting for Inverse RenderingHanxiao Sun, YuPeng Gao, Jin Xie et al.
Reconstructing 3D assets from images, known as inverse rendering (IR), remains a challenging task due to its ill-posed nature. 3D Gaussian Splatting (3DGS) has demonstrated impressive capabilities for novel view synthesis (NVS) tasks. Methods apply it to relighting by separating radiance into BRDF parameters and lighting, yet produce inferior relighting quality with artifacts and unnatural indirect illumination due to the limited capability of each Gaussian, which has constant material parameters and normal, alongside the absence of physical constraints for indirect lighting. In this paper, we present a novel framework called Spatially-vayring Gaussian Inverse Rendering (SVG-IR), aimed at enhancing both NVS and relighting quality. To this end, we propose a new representation-Spatially-varying Gaussian (SVG)-that allows per-Gaussian spatially varying parameters. This enhanced representation is complemented by a SVG splatting scheme akin to vertex/fragment shading in traditional graphics pipelines. Furthermore, we integrate a physically-based indirect lighting model, enabling more realistic relighting. The proposed SVG-IR framework significantly improves rendering quality, outperforming state-of-the-art NeRF-based methods by 2.5 dB in peak signal-to-noise ratio (PSNR) and surpassing existing Gaussian-based techniques by 3.5 dB in relighting tasks, all while maintaining a real-time rendering speed.
LGOct 29, 2024
Reliable and Compact Graph Fine-tuning via GraphSparse PromptingBo Jiang, Hao Wu, Beibei Wang et al.
Recently, graph prompt learning has garnered increasing attention in adapting pre-trained GNN models for downstream graph learning tasks. However, existing works generally conduct prompting over all graph elements (e.g., nodes, edges, node attributes, etc.), which is suboptimal and obviously redundant. To address this issue, we propose exploiting sparse representation theory for graph prompting and present Graph Sparse Prompting (GSP). GSP aims to adaptively and sparsely select the optimal elements (e.g., certain node attributes) to achieve compact prompting for downstream tasks. Specifically, we propose two kinds of GSP models, termed Graph Sparse Feature Prompting (GSFP) and Graph Sparse multi-Feature Prompting (GSmFP). Both GSFP and GSmFP provide a general scheme for tuning any specific pre-trained GNNs that can achieve attribute selection and compact prompt learning simultaneously. A simple yet effective algorithm has been designed for solving GSFP and GSmFP models. Experiments on 16 widely-used benchmark datasets validate the effectiveness and advantages of the proposed GSFPs.
CVJan 4
Language as Prior, Vision as Calibration: Metric Scale Recovery for Monocular Depth EstimationMingxing Zhan, Li Zhang, Beibei Wang et al.
Relative-depth foundation models transfer well, yet monocular metric depth remains ill-posed due to unidentifiable global scale and heightened domain-shift sensitivity. Under a frozen-backbone calibration setting, we recover metric depth via an image-specific affine transform in inverse depth and train only lightweight calibration heads while keeping the relative-depth backbone and the CLIP text encoder fixed. Since captions provide coarse but noisy scale cues that vary with phrasing and missing objects, we use language to predict an uncertainty-aware envelope that bounds feasible calibration parameters in an unconstrained space, rather than committing to a text-only point estimate. We then use pooled multi-scale frozen visual features to select an image-specific calibration within this envelope. During training, a closed-form least-squares oracle in inverse depth provides per-image supervision for learning the envelope and the selected calibration. Experiments on NYUv2 and KITTI improve in-domain accuracy, while zero-shot transfer to SUN-RGBD and DDAD demonstrates improved robustness over strong language-only baselines.
CVMar 7
FabricGen: Microstructure-Aware Woven Fabric GenerationYingjie Tang, Di Luo, Zixiong Wang et al.
Woven fabric materials are widely used in rendering applications, yet designing realistic examples typically involves multiple stages, requiring expertise in weaving principles and texture authoring. Recent advances have explored diffusion models to streamline this process; however, pre-trained diffusion models often struggle to generate intricate yarn-level details that conform to weaving rules. To address this, we present FabricGen, an end-to-end framework for generating high-quality woven fabric materials from textual descriptions. A key insight of our method is the decomposition of macro-scale textures and micro-scale weaving patterns. To generate macro-scale textures free from microstructures, we fine-tune pre-trained diffusion models on a collected dataset of microstructure-free fabrics. As for micro-scale weaving patterns, we develop an enhanced procedural geometric model capable of synthesizing natural yarn-level geometry with yarn sliding and flyaway fibers. The procedural model is driven by a specialized large language model, WeavingLLM, which is fine-tuned on an annotated dataset of formatted weaving drafts, and prompt-tuned with domain-specific fabric expertise. Through fine-tuning and prompt tuning, WeavingLLM learns to design weaving drafts and fabric parameters from textual prompts, enabling the procedural model to produce diverse weaving patterns that stick to weaving principles. The generated macro-scale texture, along with the micro-scale geometry, can be used for fabric rendering. Consequently, our framework produces materials with significantly richer detail and realism compared to prior generative models.
LGNov 24, 2025
Robust and Generalizable GNN Fine-Tuning via Uncertainty-aware Adapter LearningBo Jiang, Weijun Zhao, Beibei Wang et al.
Recently, fine-tuning large-scale pre-trained GNNs has yielded remarkable attention in adapting pre-trained GNN models for downstream graph learning tasks. One representative fine-tuning method is to exploit adapter (termed AdapterGNN) which aims to 'augment' the pre-trained model by inserting a lightweight module to make the 'augmented' model better adapt to the downstream tasks. However, graph data may contain various types of noise in downstream tasks, such as noisy edges and ambiguous node attributes. Existing AdapterGNNs are often prone to graph noise and exhibit limited generalizability. How to enhance the robustness and generalization ability of GNNs' fine tuning remains an open problem. In this paper, we show that the above problem can be well addressed by integrating uncertainty learning into the GNN adapter. We propose the Uncertainty-aware Adapter (UAdapterGNN) that fortifies pre-trained GNN models against noisy graph data in the fine-tuning process. Specifically, in contrast to regular AdapterGNN, our UAdapterGNN exploits Gaussian probabilistic adapter to augment the pre-trained GNN model. In this way, when the graph contains various noises,our method can automatically absorb the effects of changes in the variances of the Gaussian distribution, thereby significantly enhancing the model's robustness. Also, UAdapterGNN can further improve the generalization ability of the model on the downstream tasks. Extensive experiments on several benchmarks demonstrate the effectiveness, robustness and high generalization ability of the proposed UAdapterGNN method.
CVNov 21, 2025
MatPedia: A Universal Generative Foundation for High-Fidelity Material SynthesisDi Luo, Shuhui Yang, Mingxin Yang et al.
Physically-based rendering (PBR) materials are fundamental to photorealistic graphics, yet their creation remains labor-intensive and requires specialized expertise. While generative models have advanced material synthesis, existing methods lack a unified representation bridging natural image appearance and PBR properties, leading to fragmented task-specific pipelines and inability to leverage large-scale RGB image data. We present MatPedia, a foundation model built upon a novel joint RGB-PBR representation that compactly encodes materials into two interdependent latents: one for RGB appearance and one for the four PBR maps encoding complementary physical properties. By formulating them as a 5-frame sequence and employing video diffusion architectures, MatPedia naturally captures their correlations while transferring visual priors from RGB generation models. This joint representation enables a unified framework handling multiple material tasks--text-to-material generation, image-to-material generation, and intrinsic decomposition--within a single architecture. Trained on MatHybrid-410K, a mixed corpus combining PBR datasets with large-scale RGB images, MatPedia achieves native $1024\times1024$ synthesis that substantially surpasses existing approaches in both quality and diversity.
CVAug 9, 2025
HiMat: DiT-based Ultra-High Resolution SVBRDF GenerationZixiong Wang, Jian Yang, Yiwei Hu et al.
Creating ultra-high-resolution spatially varying bidirectional reflectance functions (SVBRDFs) is critical for photorealistic 3D content creation, to faithfully represent fine-scale surface details required for close-up rendering. However, achieving 4K generation faces two key challenges: (1) the need to synthesize multiple reflectance maps at full resolution, which multiplies the pixel budget and imposes prohibitive memory and computational cost, and (2) the requirement to maintain strong pixel-level alignment across maps at 4K, which is particularly difficult when adapting pretrained models designed for the RGB image domain. We introduce HiMat, a diffusion-based framework tailored for efficient and diverse 4K SVBRDF generation. To address the first challenge, HiMat performs generation in a high-compression latent space via DC-AE, and employs a pretrained diffusion transformer with linear attention to improve per-map efficiency. To address the second challenge, we propose CrossStitch, a lightweight convolutional module that enforces cross-map consistency without incurring the cost of global attention. Our experiments show that HiMat achieves high-fidelity 4K SVBRDF generation with superior efficiency, structural consistency, and diversity compared to prior methods. Beyond materials, our framework also generalizes to related applications such as intrinsic decomposition.
CVAug 9, 2025
WeatherDiffusion: Weather-Guided Diffusion Model for Forward and Inverse RenderingYixin Zhu, Zuoliang Zhu, Miloš Hašan et al.
Forward and inverse rendering have emerged as key techniques for enabling understanding and reconstruction in the context of autonomous driving (AD). However, complex weather and illumination pose great challenges to this task. The emergence of large diffusion models has shown promise in achieving reasonable results through learning from 2D priors, but these models are difficult to control and lack robustness. In this paper, we introduce WeatherDiffusion, a diffusion-based framework for forward and inverse rendering on AD scenes with various weather and lighting conditions. Our method enables authentic estimation of material properties, scene geometry, and lighting, and further supports controllable weather and illumination editing through the use of predicted intrinsic maps guided by text descriptions. We observe that different intrinsic maps should correspond to different regions of the original image. Based on this observation, we propose Intrinsic map-aware attention (MAA) to enable high-quality inverse rendering. Additionally, we introduce a synthetic dataset (\ie WeatherSynthetic) and a real-world dataset (\ie WeatherReal) for forward and inverse rendering on AD scenes with diverse weather and lighting. Extensive experiments show that our WeatherDiffusion outperforms state-of-the-art methods on several benchmarks. Moreover, our method demonstrates significant value in downstream tasks for AD, enhancing the robustness of object detection and image segmentation in challenging weather scenarios.
CVJul 14, 2025
Beyond Graph Model: Reliable VLM Fine-Tuning via Random Graph AdapterBo Jiang, Xueyang Ze, Beibei Wang et al.
Textual adapter-based tuning methods have shown significant potential in transferring knowledge from pre-trained Vision-Language Models (VLMs) to downstream tasks. Existing works generally employ the deterministic textual feature adapter to refine each category textual representation. However, due to inherent factors such as different attributes and contexts, there exists significant diversity in textual descriptions for each category. Such description diversity offers rich discriminative semantic knowledge that can benefit downstream visual learning tasks. Obviously, traditional deterministic adapter model cannot adequately capture this varied semantic information. Also, it is desirable to exploit the inter-class relationships in VLM adapter. To address these issues, we propose to exploit random graph model into VLM adapter and develop a novel Vertex Random Graph Adapter (VRGAdapter). VRGAdapter first models the inherent diverse descriptions of each category and inter-class relationships of different categories simultaneously by leveraging a Vertex Random Knowledge Graph (VRKG) model. Then, it employs probabilistic message propagation on VRKG to learn context-aware distribution representation for each class node. Finally, it adopts a reparameterized sampling function to achieve textual adapter learning. Note that, VRGAdapter provides a more general adapter solution that encompasses traditional graph-based adapter as a special case. In addition, to enable more robust performance for downstream tasks, we also introduce a new Uncertainty-guided Multi-branch Fusion (UMF) scheme that dynamically integrates multiple pre-trained models for ensemble prediction. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our approach.
CVApr 21, 2025
VistaDepth: Improving far-range Depth Estimation with Spectral Modulation and Adaptive ReweightingMingxia Zhan, Li Zhang, Yingjie Wang et al.
Monocular depth estimation (MDE) aims to infer per-pixel depth from a single RGB image. While diffusion models have advanced MDE with impressive generalization, they often exhibit limitations in accurately reconstructing far-range regions. This difficulty arises from two key challenges. First, the implicit multi-scale processing in standard spatial-domain models can be insufficient for preserving the fine-grained, high-frequency details crucial for distant structures. Second, the intrinsic long-tail distribution of depth data imposes a strong training bias towards more prevalent near-range regions. To address these, we propose VistaDepth, a novel diffusion framework designed for balanced and accurate depth perception. We introduce two key innovations. First, the Latent Frequency Modulation (LFM) module enhances the model's ability to represent high-frequency details. It operates by having a lightweight network predict a dynamic, content-aware spectral filter to refine latent features, thereby improving the reconstruction of distant structures. Second, our BiasMap mechanism introduces an adaptive reweighting of the diffusion loss strategically scaled across diffusion timesteps. It further aligns the supervision with the progressive denoising process, establishing a more consistent learning signal. As a result, it mitigates data bias without sacrificing training stability. Experiments show that VistaDepth achieves state-of-the-art performance for diffusion-based MDE, particularly excelling in reconstructing detailed and accurate depth in far-range regions.
LGJun 21, 2024
Graph Edge Representation via Tensor Product Graph Convolutional RepresentationBo Jiang, Sheng Ge, Ziyan Zhang et al.
Graph Convolutional Networks (GCNs) have been widely studied. The core of GCNs is the definition of convolution operators on graphs. However, existing Graph Convolution (GC) operators are mainly defined on adjacency matrix and node features and generally focus on obtaining effective node embeddings which cannot be utilized to address the graphs with (high-dimensional) edge features. To address this problem, by leveraging tensor contraction representation and tensor product graph diffusion theories, this paper analogously defines an effective convolution operator on graphs with edge features which is named as Tensor Product Graph Convolution (TPGC). The proposed TPGC aims to obtain effective edge embeddings. It provides a complementary model to traditional graph convolutions (GCs) to address the more general graph data analysis with both node and edge features. Experimental results on several graph learning tasks demonstrate the effectiveness of the proposed TPGC.
LGJun 15, 2024
A Unified Graph Selective Prompt Learning for Graph Neural NetworksBo Jiang, Hao Wu, Ziyan Zhang et al.
In recent years, graph prompt learning/tuning has garnered increasing attention in adapting pre-trained models for graph representation learning. As a kind of universal graph prompt learning method, Graph Prompt Feature (GPF) has achieved remarkable success in adapting pre-trained models for Graph Neural Networks (GNNs). By fixing the parameters of a pre-trained GNN model, the aim of GPF is to modify the input graph data by adding some (learnable) prompt vectors into graph node features to better align with the downstream tasks on the smaller dataset. However, existing GPFs generally suffer from two main limitations. First, GPFs generally focus on node prompt learning which ignore the prompting for graph edges. Second, existing GPFs generally conduct the prompt learning on all nodes equally which fails to capture the importances of different nodes and may perform sensitively w.r.t noisy nodes in aligning with the downstream tasks. To address these issues, in this paper, we propose a new unified Graph Selective Prompt Feature learning (GSPF) for GNN fine-tuning. The proposed GSPF integrates the prompt learning on both graph node and edge together, which thus provides a unified prompt model for the graph data. Moreover, it conducts prompt learning selectively on nodes and edges by concentrating on the important nodes and edges for prompting which thus make our model be more reliable and compact. Experimental results on many benchmark datasets demonstrate the effectiveness and advantages of the proposed GSPF method.
LGFeb 18, 2022
Generalizing Aggregation Functions in GNNs:High-Capacity GNNs via Nonlinear Neighborhood AggregatorsBeibei Wang, Bo Jiang
Graph neural networks (GNNs) have achieved great success in many graph learning tasks. The main aspect powering existing GNNs is the multi-layer network architecture to learn the nonlinear graph representations for the specific learning tasks. The core operation in GNNs is message propagation in which each node updates its representation by aggregating its neighbors' representations. Existing GNNs mainly adopt either linear neighborhood aggregation (mean,sum) or max aggregator in their message propagation. (1) For linear aggregators, the whole nonlinearity and network's capacity of GNNs are generally limited due to deeper GNNs usually suffer from over-smoothing issue. (2) For max aggregator, it usually fails to be aware of the detailed information of node representations within neighborhood. To overcome these issues, we re-think the message propagation mechanism in GNNs and aim to develop the general nonlinear aggregators for neighborhood information aggregation in GNNs. One main aspect of our proposed nonlinear aggregators is that they provide the optimally balanced aggregators between max and mean/sum aggregations. Thus, our aggregators can inherit both (i) high nonlinearity that increases network's capacity and (ii) detail-sensitivity that preserves the detailed information of representations together in GNNs' message propagation. Promising experiments on several datasets show the effectiveness of the proposed nonlinear aggregators.
GRNov 6, 2021
Neural BRDFs: Representation and OperationsJiahui Fan, Beibei Wang, Miloš Hašan et al.
Bidirectional reflectance distribution functions (BRDFs) are pervasively used in computer graphics to produce realistic physically-based appearance. In recent years, several works explored using neural networks to represent BRDFs, taking advantage of neural networks' high compression rate and their ability to fit highly complex functions. However, once represented, the BRDFs will be fixed and therefore lack flexibility to take part in follow-up operations. In this paper, we present a form of "Neural BRDF algebra", and focus on both representation and operations of BRDFs at the same time. We propose a representation neural network to compress BRDFs into latent vectors, which is able to represent BRDFs accurately. We further propose several operations that can be applied solely in the latent space, such as layering and interpolation. Spatial variation is straightforward to achieve by using textures of latent vectors. Furthermore, our representation can be efficiently evaluated and sampled, providing a competitive solution to more expensive Monte Carlo layering approaches.
CVOct 29, 2021
SVBRDF Recovery From a Single Image With Highlights using a Pretrained Generative Adversarial NetworkTao Wen, Beibei Wang, Lei Zhang et al.
Spatially-varying bi-directional reflectance distribution functions (SVBRDFs) are crucial for designers to incorporate new materials in virtual scenes, making them look more realistic. Reconstruction of SVBRDFs is a long-standing problem. Existing methods either rely on extensive acquisition system or require huge datasets which are nontrivial to acquire. We aim to recover SVBRDFs from a single image, without any datasets. A single image contains incomplete information about the SVBRDF, making the reconstruction task highly ill-posed. It is also difficult to separate between the changes in color that are caused by the material and those caused by the illumination, without the prior knowledge learned from the dataset. In this paper, we use an unsupervised generative adversarial neural network (GAN) to recover SVBRDFs maps with a single image as input. To better separate the effects due to illumination from the effects due to the material, we add the hypothesis that the material is stationary and introduce a new loss function based on Fourier coefficients to enforce this stationarity. For efficiency, we train the network in two stages: reusing a trained model to initialize the SVBRDFs and fine-tune it based on the input image. Our method generates high-quality SVBRDFs maps from a single input photograph, and provides more vivid rendering results compared to previous work. The two-stage training boosts runtime performance, making it 8 times faster than previous work.
SPAug 6, 2021
RadioMic: Sound Sensing via mmWave SignalsMuhammed Zahid Ozturk, Chenshu Wu, Beibei Wang et al.
Voice interfaces has become an integral part of our lives, with the proliferation of smart devices. Today, IoT devices mainly rely on microphones to sense sound. Microphones, however, have fundamental limitations, such as weak source separation, limited range in the presence of acoustic insulation, and being prone to multiple side-channel attacks. In this paper, we propose RadioMic, a radio-based sound sensing system to mitigate these issues and enrich sound applications. RadioMic constructs sound based on tiny vibrations on active sources (e.g., a speaker or human throat) or object surfaces (e.g., paper bag), and can work through walls, even a soundproof one. To convert the extremely weak sound vibration in the radio signals into sound signals, RadioMic introduces radio acoustics, and presents training-free approaches for robust sound detection and high-fidelity sound recovery. It then exploits a neural network to further enhance the recovered sound by expanding the recoverable frequencies and reducing the noises. RadioMic translates massive online audios to synthesized data to train the network, and thus minimizes the need of RF data. We thoroughly evaluate RadioMic under different scenarios using a commodity mmWave radar. The results show RadioMic outperforms the state-of-the-art systems significantly. We believe RadioMic provides new horizons for sound sensing and inspires attractive sensing capabilities of mmWave sensing devices
CHEM-PHNov 6, 2020
Physics-informed Neural-Network Software for Molecular Dynamics ApplicationsTaufeq Mohammed Razakh, Beibei Wang, Shane Jackson et al.
We have developed a novel differential equation solver software called PND based on the physics-informed neural network for molecular dynamics simulators. Based on automatic differentiation technique provided by Pytorch, our software allows users to flexibly implement equation of atom motions, initial and boundary conditions, and conservation laws as loss function to train the network. PND comes with a parallel molecular dynamics (MD) engine in order for users to examine and optimize loss function design, and different conservation laws and boundary conditions, and hyperparameters, thereby accelerate the PINN-based development for molecular applications.
CVSep 4, 2019
GmCN: Graph Mask Convolutional NetworkBo Jiang, Beibei Wang, Jin Tang et al.
Graph Convolutional Networks (GCNs) have shown very powerful for graph data representation and learning tasks. Existing GCNs usually conduct feature aggregation on a fixed neighborhood graph in which each node computes its representation by aggregating the feature representations of all its neighbors which is biased by its own representation. However, this fixed aggregation strategy is not guaranteed to be optimal for GCN based graph learning and also can be affected by some graph structure noises, such as incorrect or undesired edge connections. To address these issues, we propose a novel Graph mask Convolutional Network (GmCN) in which nodes can adaptively select the optimal neighbors in their feature aggregation to better serve GCN learning. GmCN can be theoretically interpreted by a regularization framework, based on which we derive a simple update algorithm to determine the optimal mask adaptively in GmCN training process. Experiments on several datasets validate the effectiveness of GmCN.