Jia Lu

CV
h-index35
12papers
73citations
Novelty55%
AI Score54

12 Papers

CRApr 10
S3CDM: A secret-sharing-scheme-based cyberattack detection model and its simulation implementation

Chi Sing Chum, Jia Lu, Claire Tang et al.

We design and develop a secret-sharing-scheme-based cyberattack detection model(S3CDM)that can detect unauthorized or illegal activities (especially insider attacks) and protect sensitive information within complex network infrastructures of large organizations. The model splits a secret among a group of legitimate participants or components for authentication, integration and detection of unauthorized activities. Traditional Shamir's polynomial interpolation based and our own hash function based schemes are utilized in the model, they both are practical and efficient to make sure the communications between different components are secure and any unauthorized activities can be detected. The model offers a flexible multi-factor authentication method to enhance the overall system security. Probability analysis [3] shows that multiple component model is more resistant against cyberattacks than the single component one. To demonstrate the feasibility, we implement the S3CDM in three parts on Google Cloud Platform, i.e., the front end UI (User Interface) running on an HTTP server, the back end individual services written in Python, and a PostgreSQL database. Docker is used to manage the start and stop of individual services and their URLs. We demonstrate how to use the UI with a use case of simulation of broken path in details.

SPApr 2, 2025Code
Decoding Covert Speech from EEG Using a Functional Areas Spatio-Temporal Transformer

Muyun Jiang, Yi Ding, Wei Zhang et al.

Covert speech involves imagining speaking without audible sound or any movements. Decoding covert speech from electroencephalogram (EEG) is challenging due to a limited understanding of neural pronunciation mapping and the low signal-to-noise ratio of the signal. In this study, we developed a large-scale multi-utterance speech EEG dataset from 57 right-handed native English-speaking subjects, each performing covert and overt speech tasks by repeating the same word in five utterances within a ten-second duration. Given the spatio-temporal nature of the neural activation process during speech pronunciation, we developed a Functional Areas Spatio-temporal Transformer (FAST), an effective framework for converting EEG signals into tokens and utilizing transformer architecture for sequence encoding. Our results reveal distinct and interpretable speech neural features by the visualization of FAST-generated activation maps across frontal and temporal brain regions with each word being covertly spoken, providing new insights into the discriminative features of the neural representation of covert speech. This is the first report of such a study, which provides interpretable evidence for speech decoding from EEG. The code for this work has been made public at https://github.com/Jiang-Muyun/FAST

AIMay 9
SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference

Ziqi Zhou, Peng Yang, Yuxin Liang et al.

The expansion of Artificial Intelligence-generated content service requires diffusion model serving to simultaneously achieve high throughput and low task end-to-end (E2E) latency. However, existing continuous batching methods suffer from severe resource contention during UNet-VAE concurrency, leading to latency spikes. Furthermore, concurrent multi-task scheduling entails a trade-off between UNet throughput and VAE latency across varying scheduling strategies. To address these, we propose SynerDiff, an efficient continuous batching system built on intra-inter level synergy. At the intra-concurrency level, SynerDiff alleviates resource contention by pruning component-specific resource bottlenecks via VAE Chunking and Adaptive Skip-CFG. At the inter-concurrency level, leveraging components' differential sensitivity to scheduling granularities, a threshold-aware scheduler plans concurrent sequences and tunes intra-concurrency decisions to minimize VAE latency while maintaining UNet within high-throughput threshold. Additionally, a feedback controller dynamically adjusts this threshold based on queue loads to boost system capacity ceiling. Experimental results show that, SynerDiff improves throughput by 1.6$\times$ and decreases both average E2E and P99 tail latencies by up to 78.7\%, compared to benchmarks while guaranteeing high image fidelity.

ROFeb 12
ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation

Zedong Chu, Shichao Xie, Xiaolong Wu et al.

Embodied navigation has long been fragmented by task-specific architectures. We introduce ABot-N0, a unified Vision-Language-Action (VLA) foundation model that achieves a ``Grand Unification'' across 5 core tasks: Point-Goal, Object-Goal, Instruction-Following, POI-Goal, and Person-Following. ABot-N0 utilizes a hierarchical ``Brain-Action'' architecture, pairing an LLM-based Cognitive Brain for semantic reasoning with a Flow Matching-based Action Expert for precise, continuous trajectory generation. To support large-scale learning, we developed the ABot-N0 Data Engine, curating 16.9M expert trajectories and 5.0M reasoning samples across 7,802 high-fidelity 3D scenes (10.7 $\text{km}^2$). ABot-N0 achieves new SOTA performance across 7 benchmarks, significantly outperforming specialized models. Furthermore, our Agentic Navigation System integrates a planner with hierarchical topological memory, enabling robust, long-horizon missions in dynamic real-world environments.

CVSep 29, 2025
UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation

Guanjun Wu, Jiemin Fang, Chen Yang et al.

High-fidelity 3D asset generation is crucial for various industries. While recent 3D pretrained models show strong capability in producing realistic content, most are built upon diffusion models and follow a two-stage pipeline that first generates geometry and then synthesizes appearance. Such a decoupled design tends to produce geometry-texture misalignment and non-negligible cost. In this paper, we propose UniLat3D, a unified framework that encodes geometry and appearance in a single latent space, enabling direct single-stage generation. Our key contribution is a geometry-appearance Unified VAE, which compresses high-resolution sparse features into a compact latent representation -- UniLat. UniLat integrates structural and visual information into a dense low-resolution latent, which can be efficiently decoded into diverse 3D formats, e.g., 3D Gaussians and meshes. Based on this unified representation, we train a single flow-matching model to map Gaussian noise directly into UniLat, eliminating redundant stages. Trained solely on public datasets, UniLat3D produces high-quality 3D assets in seconds from a single image, achieving superior appearance fidelity and geometric quality. More demos \& code are available at https://unilat3d.github.io/

GRAug 20, 2025
Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds

Jia Lu, Taoran Yi, Jiemin Fang et al.

Reconstructing 3D human bodies from sparse views has been an appealing topic, which is crucial to broader the related applications. In this paper, we propose a quite challenging but valuable task to reconstruct the human body from only two images, i.e., the front and back view, which can largely lower the barrier for users to create their own 3D digital humans. The main challenges lie in the difficulty of building 3D consistency and recovering missing information from the highly sparse input. We redesign a geometry reconstruction model based on foundation reconstruction models to predict consistent point clouds even input images have scarce overlaps with extensive human data training. Furthermore, an enhancement algorithm is applied to supplement the missing color information, and then the complete human point clouds with colors can be obtained, which are directly transformed into 3D Gaussians for better rendering quality. Experiments show that our method can reconstruct the entire human in 190 ms on a single NVIDIA RTX 4090, with two images at a resolution of 1024x1024, demonstrating state-of-the-art performance on the THuman2.0 and cross-domain datasets. Additionally, our method can complete human reconstruction even with images captured by low-cost mobile devices, reducing the requirements for data collection. Demos and code are available at https://hustvl.github.io/Snap-Snap/.

CVOct 24, 2025
WorldGrow: Generating Infinite 3D World

Sikuang Li, Chen Yang, Jiemin Fang et al.

We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance. Existing methods face key challenges: 2D-lifting approaches suffer from geometric and appearance inconsistencies across views, 3D implicit representations are hard to scale up, and current 3D foundation models are mostly object-centric, limiting their applicability to scene-level generation. Our key insight is leveraging strong generation priors from pre-trained 3D models for structured scene block generation. To this end, we propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis. Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity. Evaluated on the large-scale 3D-FRONT dataset, WorldGrow achieves SOTA performance in geometry reconstruction, while uniquely supporting infinite scene generation with photorealistic and structurally consistent outputs. These results highlight its capability for constructing large-scale virtual environments and potential for building future world models.

LGMay 31, 2025
BatteryBERT for Realistic Battery Fault Detection Using Point-Masked Signal Modeling

Songqi Zhou, Ruixue Liu, Yixing Wang et al.

Accurate fault detection in lithium-ion batteries is essential for the safe and reliable operation of electric vehicles and energy storage systems. However, existing methods often struggle to capture complex temporal dependencies and cannot fully leverage abundant unlabeled data. Although large language models (LLMs) exhibit strong representation capabilities, their architectures are not directly suited to the numerical time-series data common in industrial settings. To address these challenges, we propose a novel framework that adapts BERT-style pretraining for battery fault detection by extending the standard BERT architecture with a customized time-series-to-token representation module and a point-level Masked Signal Modeling (point-MSM) pretraining task tailored to battery applications. This approach enables self-supervised learning on sequential current, voltage, and other charge-discharge cycle data, yielding distributionally robust, context-aware temporal embeddings. We then concatenate these embeddings with battery metadata and feed them into a downstream classifier for accurate fault classification. Experimental results on a large-scale real-world dataset show that models initialized with our pretrained parameters significantly improve both representation quality and classification accuracy, achieving an AUROC of 0.945 and substantially outperforming existing approaches. These findings validate the effectiveness of BERT-style pretraining for time-series fault detection.

CVJun 21, 2024
You Only Acquire Sparse-channel (YOAS): A Unified Framework for Dense-channel EEG Generation

Hongyu Chen, Weiming Zeng, Luhui Cai et al.

High-precision acquisition of dense-channel electroencephalogram (EEG) signals is often impeded by the costliness and lack of portability of equipment. In contrast, generating dense-channel EEG signals effectively from sparse channels shows promise and economic viability. However, sparse-channel EEG poses challenges such as reduced spatial resolution, information loss, signal mixing, and heightened susceptibility to noise and interference. To address these challenges, we first theoretically formulate the dense-channel EEG generation problem as by optimizing a set of cross-channel EEG signal generation problems. Then, we propose the YOAS framework for generating dense-channel data from sparse-channel EEG signals. The YOAS totally consists of four sequential stages: Data Preparation, Data Preprocessing, Biased-EEG Generation, and Synthetic EEG Generation. Data Preparation and Preprocessing carefully consider the distribution of EEG electrodes and low signal-to-noise ratio problem of EEG signals. Biased-EEG Generation includes sub-modules of BiasEEGGanFormer and BiasEEGDiffFormer, which facilitate long-term feature extraction with attention and generate signals by combining electrode position alignment with diffusion model, respectively. Synthetic EEG Generation synthesizes the final signals, employing a deduction paradigm for multi-channel EEG generation. Extensive experiments confirmed YOAS's feasibility, efficiency, and theoretical validity, even remarkably enhancing data discernibility. This breakthrough in dense-channel EEG signal generation from sparse-channel data opens new avenues for exploration in EEG signal processing and application.

LGApr 13, 2020
MLPSVM:A new parallel support vector machine to multi-label learning

Yanghong Liu, Jia Lu, Tingting Li

Multi-label learning has attracted the attention of the machine learning community. The problem conversion method Binary Relevance converts a familiar single label into a multi-label algorithm. The binary relevance method is widely used because of its simple structure and efficient algorithm. But binary relevance does not consider the links between labels, making it cumbersome to handle some tasks. This paper proposes a multi-label learning algorithm that can also be used for single-label classification. It is based on standard support vector machines and changes the original single decision hyperplane into two parallel decision hyper-planes, which call multi-label parallel support vector machine (MLPSVM). At the end of the article, MLPSVM is compared with other multi-label learning algorithms. The experimental results show that the algorithm performs well on data sets.

CVDec 3, 2018
ZerNet: Convolutional Neural Networks on Arbitrary Surfaces via Zernike Local Tangent Space Estimation

Zhiyu Sun, Ethan Rooke, Jerome Charton et al.

In this paper, we propose a novel formulation to extend CNNs to two-dimensional (2D) manifolds using orthogonal basis functions, called Zernike polynomials. In many areas, geometric features play a key role in understanding scientific phenomena. Thus, an ability to codify geometric features into a mathematical quantity can be critical. Recently, convolutional neural networks (CNNs) have demonstrated the promising capability of extracting and codifying features from visual information. However, the progress has been concentrated in computer vision applications where there exists an inherent grid-like structure. In contrast, many geometry processing problems are defined on curved surfaces, and the generalization of CNNs is not quite trivial. The difficulties are rooted in the lack of key ingredients such as the canonical grid-like representation, the notion of consistent orientation, and a compatible local topology across the domain. In this paper, we prove that the convolution of two functions can be represented as a simple dot product between Zernike polynomial coefficients; and the rotation of a convolution kernel is essentially a set of 2-by-2 rotation matrices applied to the coefficients. As such, the key contribution of this work resides in a concise but rigorous mathematical generalization of the CNN building blocks.

LGJun 20, 2018
Wall Stress Estimation of Cerebral Aneurysm based on Zernike Convolutional Neural Networks

Zhiyu Sun, Jia Lu, Stephen Baek

Convolutional neural networks (ConvNets) have demonstrated an exceptional capacity to discern visual patterns from digital images and signals. Unfortunately, such powerful ConvNets do not generalize well to arbitrary-shaped manifolds, where data representation does not fit into a tensor-like grid. Hence, many fields of science and engineering, where data points possess some manifold structure, cannot enjoy the full benefits of the recent advances in ConvNets. The aneurysm wall stress estimation problem introduced in this paper is one of many such problems. The problem is well-known to be of a paramount clinical importance, but yet, traditional ConvNets cannot be applied due to the manifold structure of the data, neither does the state-of-the-art geometric ConvNets perform well. Motivated by this, we propose a new geometric ConvNet method named ZerNet, which builds upon our novel mathematical generalization of convolution and pooling operations on manifolds. Our study shows that the ZerNet outperforms the other state-of-the-art geometric ConvNets in terms of accuracy.