69.7ROMay 27
World Models for Robotic Manipulation: A SurveyFangyuan Wang, Ziyuan Wang, Guorui Pei et al.
Robotic manipulation depends on the ability to anticipate how actions reshape objects, contacts, and scene geometry before execution. Learned world models provide this capability by predicting task-relevant future evolution under robot intervention, yet the term now spans latent dynamics models, action-conditioned video generators, three- and four-dimensional scene predictors, physics-informed simulators, and predictive modules inside vision-language-action systems. This breadth has fragmented the literature and obscured the design choices that matter for manipulation. We survey world models for robotic manipulation through three questions: what future representation is predicted, how prediction is connected to action, and when prediction is used in the robot-learning pipeline. We operationally define a world model as an action-conditioned predictive system and distinguish it from perception modules, inverse models, policies, rewards, and value functions. We then organize existing work into five representation families, develop a functional taxonomy that separates integrated prediction-action models from explicit predictive planners, and characterize infrastructure roles including synthetic experience generation, candidate filtering, search-based evaluation, learned environments, and outcome verification. We further map these roles across pretraining, post-training, and inference adaptation, review 34 manipulation datasets, and synthesize evaluation protocols for predictive fidelity, task performance, and simulator reliability. This survey shows that world models are evolving from task-specific dynamics predictors into predictive infrastructure for robot learning, while exposing open challenges in contact modeling, hallucination control, action alignment, and benchmarking under closed-loop use.
73.6LGMay 29
Self-Certifying Transport MCMC via Dual Spectral-Gap CertificatesJun Hu
We propose CerT-MCMC, a framework that equips learned-transport Markov chain Monte Carlo with automatic, rigorous convergence certificates. A normalising flow maps a Gaussian reference to an approximation of the target posterior; the same flow then serves as both the independence Metropolis-Hastings proposal and the basis for a computable spectral-gap bound. We develop two complementary certificates. The covering certificate bounds the weight-ratio oscillation over the full proposal support via finite-sample covering arguments, yielding full-support spectral-gap bounds when a conservative gradient bound is available; its correction term scales as O(n^{-1/D}), making it rapidly weak and eventually vacuous as dimension increases. We prove a matching Omega(n^{-1/D}) lower bound, establishing that this barrier is intrinsic to pointwise Lipschitz certification. The quantile-core certificate restricts attention to a high-probability residual core on which the oscillation is controlled by one-dimensional empirical quantiles, with a finite-sample probability slack of O(n^{-1/2}), independent of the ambient dimension. On synthetic targets (D=2-20), structural-engineering posteriors (D=6,8), real-data logistic regression on the Heart Disease data set (D=13), and synthetic Bayesian logistic regression (D=20), the quantile-core certificate delivers non-vacuous spectral-gap bounds where the covering certificate is vacuous, and its spectral-gap proxy tracks empirical effective sample sizes within 7%. A negative control experiment confirms that the certificate discriminates flow quality by a factor exceeding 10x, whereas acceptance rates differ by only 1.15x. To our knowledge, the dual-certificate framework is the first to provide automatic, dimension-aware convergence certificates for learned-transport MCMC, distinguishing genuine transport failure from proof-technique limitations.
64.5LGJun 3
Folded Transport MCMC: Certifiable Quotient Posterior Computation for Symmetric Bayesian ModelsJun Hu
Bayesian models with finite symmetry - mixture models with exchangeable components, structural identification with closely-spaced modes - define posteriors that are invariant under a group of label permutations, creating redundant multimodality that degrades MCMC convergence diagnostics. We introduce Folded Transport MCMC (FolT-MCMC), which performs inference directly on the quotient posterior by constructing an independence sampler on the fundamental domain of the symmetry group. The quotient proposal is formed by symmetrising a learned normalising flow over the group orbits. We prove that the LCNF oscillation-based certification framework transfers to the quotient metric with a stabiliser-corrected ball-mass bound and improved covering radius, and that the quantile-core certified lower bound improves whenever the unfolded flow exhibits cross-mode proposal deficiency. On Gaussian mixtures (d = 2 - 20), label-switching targets (up to 24 equivalent modes), and a standard Bayesian three-component mixture posterior, the quantile-core certified improvement ratio ranges from 2x to 145x, with the folded certificate empirically nearly dimension-free. On real accelerometer data from a supertall building during Typhoon Mangkhut, FolT-MCMC yields a non-vacuous quantile-core certificate where the unfolded certificate is vacuous.
NAJan 15, 2017
A Butterfly-Based Direct Integral Equation Solver Using Hierarchical LU Factorization for Analyzing Scattering from Electrically Large Conducting ObjectsHan Guo, Yang Liu, Jun Hu et al.
A butterfly-based direct combined-field integral equation (CFIE) solver for analyzing scattering from electrically large, perfect electrically conducting objects is presented. The proposed solver leverages the butterfly scheme to compress blocks of the hierarchical LU-factorized discretized CFIE operator and uses randomized butterfly reconstruction schemes to expedite the factorization. The memory requirements and computational cost of the direct butterfly-CFIE solver scale as $O(N\mathrm{log}^2N)$ and $O(N^{1.5}\mathrm{log}N)$, respectively. These scaling estimates permit significant memory and CPU savings when compared to those realized by low-rank (LR) decomposition-based solvers. The efficacy and accuracy of the proposed solver are demonstrated through its application to the analysis of scattering from canonical and realistic objects involving up to 14 million unknowns.
NAJul 16, 2014
A family of symmetric mixed finite elements for linear elasticity on tetrahedral gridsJun Hu, Shangyou Zhang
A family of stable mixed finite elements for the linear elasticity on tetrahedral grids are constructed, where the stress is approximated by symmetric $H(\d)$-$P_k$ polynomial tensors and the displacement is approximated by $C^{-1}$-$P_{k-1}$ polynomial vectors, for all $k\ge 4$. Numerical tests are provided.
NAJan 21, 2015
A family of conforming mixed finite elements for linear elasticity on triangular gridsJun Hu, Shangyou Zhang
This paper presents a family of mixed finite elements on triangular grids for solving the classical Hellinger-Reissner mixed problem of the elasticity equations. In these elements, the matrix-valued stress field is approximated by the full $C^0$-$P_k$ space enriched by $(k-1)$ $H(\d)$ edge bubble functions on each internal edge, while the displacement field by the full discontinuous $P_{k-1}$ vector-valued space, for the polynomial degree $k\ge 3$. The main challenge is to find the correct stress finite element space matching the full $C^{-1}$-$P_{k-1}$ displacement space. The discrete stability analysis for the inf-sup condition does not rely on the usual Fortin operator, which is difficult to construct. It is done by characterizing the divergence of local stress space which covers the $P_{k-1}$ space of displacement orthogonal to the local rigid-motion. The well-posedness condition and the optimal a priori error estimate are proved for this family of finite elements. Numerical tests are presented to confirm the theoretical results.
CVFeb 28, 2023
IntrinsicNGP: Intrinsic Coordinate based Hash Encoding for Human NeRFBo Peng, Jun Hu, Jingtao Zhou et al.
Recently, many works have been proposed to utilize the neural radiance field for novel view synthesis of human performers. However, most of these methods require hours of training, making them difficult for practical use. To address this challenging problem, we propose IntrinsicNGP, which can train from scratch and achieve high-fidelity results in few minutes with videos of a human performer. To achieve this target, we introduce a continuous and optimizable intrinsic coordinate rather than the original explicit Euclidean coordinate in the hash encoding module of instant-NGP. With this novel intrinsic coordinate, IntrinsicNGP can aggregate inter-frame information for dynamic objects with the help of proxy geometry shapes. Moreover, the results trained with the given rough geometry shapes can be further refined with an optimizable offset field based on the intrinsic coordinate.Extensive experimental results on several datasets demonstrate the effectiveness and efficiency of IntrinsicNGP. We also illustrate our approach's ability to edit the shape of reconstructed subjects.
16.1LGMay 31
Non-Vacuous Certification of Transport MCMC via Oscillation-Controlled Normalizing FlowsJun Hu
Transport MCMC trains a normalizing flow to precondition Metropolis--Hastings proposals, achieving high empirical efficiency on challenging posteriors; yet no prior work produces a numerically non-vacuous, rigorous spectral-gap bound for such samplers. We establish the first such bounds. For independence MH on the banana family we certify (γ^\ast = 0.828) at (D = 2) (covering in the original space) and (γ^\ast \ge 7.6\times 10^{-4}) at (D = 5) (covering in an analytically unwarped Gaussian space with a grid-certified gradient bound under the stated numerical Lipschitz certification), both rigorous at 95% confidence. The framework rests on three pillars: (i) spectral normalization with reduced scale clips constrains the flow Lipschitz constant from (10^{47}) to (10^4); (ii) a coverage-based empirical oscillation bound replaces the vacuous analytical bound with a data-dependent certificate; and (iii) oscillation-regularised training cuts the empirical oscillation by 60--90% at no cost to density fit, extending practical certificates through (D = 20) ((γ^\ast \ge 1.7\times 10^{-4})). Tests on four further targets (Gaussian mixture, shear-building, Neal's funnel, Bayesian logistic regression) identify three precise barriers: boundary curvature, target stiffness, and tail-coverage mismatch. An affine-vs-spline comparison shows that simpler architectures yield tighter certificates at identical NLL, inverting the usual expressiveness hierarchy.
SIApr 5, 2022
MGDCF: Distance Learning via Markov Graph Diffusion for Neural Collaborative FilteringJun Hu, Bryan Hooi, Shengsheng Qian et al.
Graph Neural Networks (GNNs) have recently been utilized to build Collaborative Filtering (CF) models to predict user preferences based on historical user-item interactions. However, there is relatively little understanding of how GNN-based CF models relate to some traditional Network Representation Learning (NRL) approaches. In this paper, we show the equivalence between some state-of-the-art GNN-based CF models and a traditional 1-layer NRL model based on context encoding. Based on a Markov process that trades off two types of distances, we present Markov Graph Diffusion Collaborative Filtering (MGDCF) to generalize some state-of-the-art GNN-based CF models. Instead of considering the GNN as a trainable black box that propagates learnable user/item vertex embeddings, we treat GNNs as an untrainable Markov process that can construct constant context features of vertices for a traditional NRL model that encodes context features with a fully-connected layer. Such simplification can help us to better understand how GNNs benefit CF models. Especially, it helps us realize that ranking losses play crucial roles in GNN-based CF tasks. With our proposed simple yet powerful ranking loss InfoBPR, the NRL model can still perform well without the context features constructed by GNNs. We conduct experiments to perform detailed analysis on MGDCF.
NAMay 21, 2016
Fast Auxiliary Space Preconditioner for Linear Elasticity in Mixed FormLong Chen, Jun Hu, Xuehai Huang
A block diagonal preconditioner with the minimal residual method and a block triangular preconditioner with the generalized minimal residual method are developed for Hu-Zhang mixed finite element methods of linear elasticity. They are based on a new stability result of the saddle point system in mesh-dependent norms. The mesh-dependent norm for the stress corresponds to the mass matrix which is easy to invert while the displacement it is spectral equivalent to Schur complement. A fast auxiliary space preconditioner based on the $H^1$ conforming linear element of the linear elasticity problem is then designed for solving the Schur complement. For both diagonal and triangular preconditioners, it is proved that the conditioning numbers of the preconditioned systems are bounded above by a constant independent of both the crucial Lamé constant and the mesh-size. Numerical examples are presented to support theoretical results. As byproducts, a new stabilized low order mixed finite element method is proposed and analyzed and superconvergence results of Hu-Zhang element are obtained.
CVOct 4, 2022
SelfNeRF: Fast Training NeRF for Human from Monocular Self-rotating VideoBo Peng, Jun Hu, Jingtao Zhou et al.
In this paper, we propose SelfNeRF, an efficient neural radiance field based novel view synthesis method for human performance. Given monocular self-rotating videos of human performers, SelfNeRF can train from scratch and achieve high-fidelity results in about twenty minutes. Some recent works have utilized the neural radiance field for dynamic human reconstruction. However, most of these methods need multi-view inputs and require hours of training, making it still difficult for practical use. To address this challenging problem, we introduce a surface-relative representation based on multi-resolution hash encoding that can greatly improve the training speed and aggregate inter-frame information. Extensive experimental results on several different datasets demonstrate the effectiveness and efficiency of SelfNeRF to challenging monocular videos.
NAOct 27, 2016
Stabilized mixed finite element methods for linear elasticity on simplicial grids in $\mathbb{R}^{n}$Long Chen, Jun Hu, Xuehai Huang
In this paper, we design two classes of stabilized mixed finite element methods for linear elasticity on simplicial grids. In the first class of elements, we use $\boldsymbol{H}(\mathbf{div}, Ω; \mathbb{S})$-$P_k$ and $\boldsymbol{L}^2(Ω; \mathbb{R}^n)$-$P_{k-1}$ to approximate the stress and displacement spaces, respectively, for $1\leq k\leq n$, and employ a stabilization technique in terms of the jump of the discrete displacement over the faces of the triangulation under consideration; in the second class of elements, we use $\boldsymbol{H}_0^1(Ω; \mathbb{R}^n)$-$P_{k}$ to approximate the displacement space for $1\leq k\leq n$, and adopt the stabilization technique suggested by Brezzi, Fortin, and Marini. We establish the discrete inf-sup conditions, and consequently present the a priori error analysis for them. The main ingredient for the analysis is two special interpolation operators, which can be constructed using a crucial $\boldsymbol{H}(\mathbf{div})$ bubble function space of polynomials on each element. The feature of these methods is the low number of global degrees of freedom in the lowest order case. We present some numerical results to demonstrate the theoretical estimates.
LGJul 4, 2024
Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-trainingYufei He, Zhenyu Hou, Yukuo Cen et al. · tsinghua
Graph pre-training has been concentrated on graph-level tasks involving small graphs (e.g., molecular graphs) or learning node representations on a fixed graph. Extending graph pre-trained models to web-scale graphs with billions of nodes in industrial scenarios, while avoiding negative transfer across graphs or tasks, remains a challenge. We aim to develop a general graph pre-trained model with inductive ability that can make predictions for unseen new nodes and even new graphs. In this work, we introduce a scalable transformer-based graph pre-training framework called PGT (Pre-trained Graph Transformer). Based on the masked autoencoder architecture, we design two pre-training tasks: one for reconstructing node features and the other for reconstructing local structures. Unlike the original autoencoder architecture where the pre-trained decoder is discarded, we propose a novel strategy that utilizes the decoder for feature augmentation. Our framework, tested on the publicly available ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges, achieves state-of-the-art performance, showcasing scalability and efficiency. We have deployed our framework on Tencent's online game data, confirming its capability to pre-train on real-world graphs with over 540 million nodes and 12 billion edges and to generalize effectively across diverse static and dynamic downstream tasks.
NADec 17, 2017
Nodal Finite Element de Rham ComplexesSnorre H. Christiansen, Jun Hu, Kaibo Hu
We construct 2D and 3D finite element de Rham sequences of arbitrary polynomial degrees with extra smoothness. Some of these elements have nodal degrees of freedom (DoFs) and can be considered as generalisations of scalar Hermite and Lagrange elements. Using the nodal values, the number of global degrees of freedom is reduced compared with the classical Nédélec and Brezzi-Douglas-Marini (BDM) finite elements, and the basis functions are more canonical and easier to construct. Our finite elements for ${H}(\mathrm{div})$ with regularity $r=2$ coincide with the nonstandard elements given by Stenberg (Numer Math 115(1): 131-139, 2010). We show how regularity decreases in the finite element complexes, so that they branch into known complexes. The standard de Rham complexes of Whitney forms and their higher order version can be regarded as the family with the lowest regularity. The construction of the new families is motivated by the finite element systems.%, and we also establish local exact sequences (geometric decomposition) for the new elements.
NADec 24, 2017
Multigrid Methods for Hellan-Herrmann-Johnson Mixed Method of Kirchhoff Plate Bending ProblemsLong Chen, Jun Hu, Xuehai Huang
A V-cycle multigrid method for the Hellan-Herrmann-Johnson (HHJ) discretization of the Kirchhoff plate bending problems is developed in this paper. It is shown that the contraction number of the V-cycle multigrid HHJ mixed method is bounded away from one uniformly with respect to the mesh size. The uniform convergence is achieved for the V-cycle multigrid method with only one smoothing step and without full elliptic regularity. The key is a stable decomposition of the kernel space which is derived from an exact sequence of the HHJ mixed method, and the strengthened Cauchy Schwarz inequality. Some numerical experiments are provided to confirm the proposed V-cycle multigrid method. The exact sequences of the HHJ mixed method and the corresponding commutative diagram is of some interest independent of the current context.
NAMay 11, 2017
Residual-Based A Posteriori Error Estimates for Symmetric Conforming Mixed Finite Elements for Linear Elasticity ProblemsLong Chen, Jun Hu, Xuehai Huang et al.
A posteriori error estimators for the symmetric mixed finite element methods for linear elasticity problems of Dirichlet and mixed boundary conditions are proposed. Stability and efficiency of the estimators are proved. Finally, we provide numerical examples to verify the theoretical results.
34.0CVApr 18Code
Adaptive receptive field-based spatial-frequency feature reconstruction network for few-shot fine-grained image classificationLinyue Zhang, Wenyi Zeng, Zicheng Pan et al.
Feature reconstruction techniques are widely applied for few-shot fine-grained image classification (FSFGIC). Our research indicates that one of the main challenges facing existing feature-based FSFGIC methods is how to choose the size of the receptive field to extract feature descriptors (including spatial and frequency feature descriptors) from different category input images, thereby better performing the FSFGIC tasks. To address this, an adaptive receptive field-based spatial-frequency feature reconstruction network (ARF-SFR-Net) is proposed. The designed ARF-SFR-Net has the capability to adaptively determine receptive field sizes for obtaining spatial and frequency features, and effectively fuse them for reconstruction and FSFGIC tasks. The designed ARF-SFR-Net can be easily embedded into a given episodic training mechanism for end-to-end training from scratch. Extensive experiments on multiple FSFGIC benchmarks demonstrate the effectiveness and superiority of the proposed ARF-SFR-Net over state-of-the-art approaches. The code is available at: https://github.com/ICL-SUST/ARF-SFR-Net.git.
CVJun 10, 2022
Real-time Hyper-Dimensional Reconfiguration at the Edge using Hardware AcceleratorsIndhumathi Kandaswamy, Saurabh Farkya, Zachary Daniels et al.
In this paper we present Hyper-Dimensional Reconfigurable Analytics at the Tactical Edge (HyDRATE) using low-SWaP embedded hardware that can perform real-time reconfiguration at the edge leveraging non-MAC (free of floating-point MultiplyACcumulate operations) deep neural nets (DNN) combined with hyperdimensional (HD) computing accelerators. We describe the algorithm, trained quantized model generation, and simulated performance of a feature extractor free of multiply-accumulates feeding a hyperdimensional logic-based classifier. Then we show how performance increases with the number of hyperdimensions. We describe the realized low-SWaP FPGA hardware and embedded software system compared to traditional DNNs and detail the implemented hardware accelerators. We discuss the measured system latency and power, noise robustness due to use of learnable quantization and HD computing, actual versus simulated system performance for a video activity classification task and demonstration of reconfiguration on this same dataset. We show that reconfigurability in the field is achieved by retraining only the feed-forward HD classifier without gradient descent backpropagation (gradient-free), using few-shot learning of new classes at the edge. Initial work performed used LRCN DNN and is currently extended to use Two-stream DNN with improved performance.
NAJan 21, 2015
A new family of efficient conforming mixed finite elements on both rectangular and cuboid meshes for linear elasticity in the symmetric formulationJun Hu
A new family of mixed finite elements is proposed for solving the classical Hellinger-Reissner mixed problem of the elasticity equations. For two dimensions, the normal stress of the matrix-valued stress field is approximated by an enriched Brezzi-Douglas-Fortin-Marini element of order $k$, and the shear stress by the serendipity element of order $k$, the displacement field by an enriched discontinuous vector-valued $P_{k-1}$ element. The degrees of freedom on each element of the lowest order element, which is of first order, is $10$ plus $4$. For three dimensions, the normal stress is approximated by an enriched Raviart-Thomas element of order $k$, and each component of the shear stress by a product space of the serendipity element space of two variables and the space of polynomials of degree $\leq k-1$ with respect to the rest variable, the displacement field by an enriched discontinuous vector-valued $Q_{k-1}$ element. The degrees of freedom on each element of the lowest order element, which is of first order, is $21$ plus $6$. A family of reduced elements is also proposed by dropping some interior bubble functions of the stress and employing the discontinuous vector-valued $P_{k-1}$ (resp. $Q_{k-1}$) element for the displacement field on each element. As a result the lowest order elements have $8$ plus $2$ and $18$ plus $3$ degrees of freedom on each element for two and three dimensions, respectively. The well-posedness condition and the optimal a priori error estimate are proved for this family of finite elements. Numerical tests are presented to confirm the theoretical results.
NAAug 6, 2014
Superconvergence of both the Crouzeix-Raviart and Morley elementsJun Hu, Rui Ma
In this paper, a new method is proposed to prove the superconvergence of both the Crouzeix-Raviart and Morley elements. The main idea is to fully employ equivalences with the first order Raviart-Thomas element and the first order Hellan-Herrmann-Johnson element, respectively. In this way, some special conformity of discrete stresses is explored and superconvergence of mixed elements can be used to analyze superconvergence of nonconforming elements. Finally, a half order superconvergence by postprocessing is proved for both nonconforming elements.
LGAug 3, 2023
Efficient Model Adaptation for Continual Learning at the EdgeZachary A. Daniels, Jun Hu, Michael Lomnitz et al.
Most machine learning (ML) systems assume stationary and matching data distributions during training and deployment. This is often a false assumption. When ML models are deployed on real devices, data distributions often shift over time due to changes in environmental factors, sensor characteristics, and task-of-interest. While it is possible to have a human-in-the-loop to monitor for distribution shifts and engineer new architectures in response to these shifts, such a setup is not cost-effective. Instead, non-stationary automated ML (AutoML) models are needed. This paper presents the Encoder-Adaptor-Reconfigurator (EAR) framework for efficient continual learning under domain shifts. The EAR framework uses a fixed deep neural network (DNN) feature encoder and trains shallow networks on top of the encoder to handle novel data. The EAR framework is capable of 1) detecting when new data is out-of-distribution (OOD) by combining DNNs with hyperdimensional computing (HDC), 2) identifying low-parameter neural adaptors to adapt the model to the OOD data using zero-shot neural architecture search (ZS-NAS), and 3) minimizing catastrophic forgetting on previous tasks by progressively growing the neural architecture as needed and dynamically routing data through the appropriate adaptors and reconfigurators for handling domain-incremental and class-incremental continual learning. We systematically evaluate our approach on several benchmark datasets for domain adaptation and demonstrate strong performance compared to state-of-the-art algorithms for OOD detection and few-/zero-shot NAS.
AIJan 30Code
EvoClinician: A Self-Evolving Agent for Multi-Turn Medical Diagnosis via Test-Time Evolutionary LearningYufei He, Juncheng Liu, Zhiyuan Hu et al.
Prevailing medical AI operates on an unrealistic ''one-shot'' model, diagnosing from a complete patient file. However, real-world diagnosis is an iterative inquiry where Clinicians sequentially ask questions and order tests to strategically gather information while managing cost and time. To address this, we first propose Med-Inquire, a new benchmark designed to evaluate an agent's ability to perform multi-turn diagnosis. Built upon a dataset of real-world clinical cases, Med-Inquire simulates the diagnostic process by hiding a complete patient file behind specialized Patient and Examination agents. They force the agent to proactively ask questions and order tests to gather information piece by piece. To tackle the challenges posed by Med-Inquire, we then introduce EvoClinician, a self-evolving agent that learns efficient diagnostic strategies at test time. Its core is a ''Diagnose-Grade-Evolve'' loop: an Actor agent attempts a diagnosis; a Process Grader agent performs credit assignment by evaluating each action for both clinical yield and resource efficiency; finally, an Evolver agent uses this feedback to update the Actor's strategy by evolving its prompt and memory. Our experiments show EvoClinician outperforms continual learning baselines and other self-evolving agents like memory agents. The code is available at https://github.com/yf-he/EvoClinician
NAFeb 3, 2017
High Order Hierarchical Divergence-free Constrained Transport $H(div)$ Finite Element Method for Magnetic Induction EquationWei Cai, Jun Hu, Shangyou Zhang
In this paper, we will use the interior functions of an hierarchical basis for high order $BDM_p$ elements to enforce the divergence-free condition of a magnetic field $B$ approximated by the H(div) $BDM_p$ basis. The resulting constrained finite element method can be used to solve magnetic induction equation in MHD equations. The proposed procedure is based on the fact that the scalar $(p-1)$-th order polynomial space on each element can be decomposed as an orthogonal sum of the subspace defined by the divergence of the interior functions of the $p$-th order $BDM_p$ basis and the constant function. Therefore, the interior functions can be used to remove element-wise all higher order terms except the constant in the divergence error of the finite element solution of $B$-field. The constant terms from each element can be then easily corrected using a first order H(div) basis globally. Numerical results for a 3-D magnetic induction equation show the effectiveness of the proposed method in enforcing divergence-free condition of the magnetic field.
LGOct 23, 2023
Efficient Heterogeneous Graph Learning via Random ProjectionJun Hu, Bryan Hooi, Bingsheng He
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Typical HGNNs require repetitive message passing during training, limiting efficiency for large-scale real-world graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors, enabling efficient mini-batch training. Existing pre-computation-based HGNNs can be mainly categorized into two styles, which differ in how much information loss is allowed and efficiency. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN), which combines the benefits of one style's efficiency with the low information loss of the other style. To achieve efficiency, the main framework of RpHGNN consists of propagate-then-update iterations, where we introduce a Random Projection Squashing step to ensure that complexity increases only linearly. To achieve low information loss, we introduce a Relation-wise Neighbor Collection component with an Even-odd Propagation Scheme, which aims to collect information from neighbors in a finer-grained way. Experimental results indicate that our approach achieves state-of-the-art results on seven small and large benchmark datasets while also being 230% faster compared to the most effective baseline. Surprisingly, our approach not only surpasses pre-processing-based baselines but also outperforms end-to-end methods.
NAJan 2, 2016
A unified analysis of quasi-optimal convergence for adaptive mixed finite element methodsJun Hu, Guozhu Yu
In this paper, we present a unified analysis of both convergence and optimality of adaptive mixed finite element methods for a class of problems when the finite element spaces and corresponding a posteriori error estimates under consideration satisfy five hypotheses. We prove that these five conditions are sufficient for convergence and optimality of the adaptive algorithms under consideration. The main ingredient for the analysis is a new method to analyze both discrete reliability and quasi-orthogonality. This new method arises from an appropriate and natural choice of the norms for both the discrete displacement and stress spaces, namely, a mesh-dependent discrete $H^1$ norm for the former and a $L^2$ norm for the latter, and a newly defined projection operator from the discrete stress space on the coarser mesh onto the discrete divergence free space on the finer mesh. As applications, we prove these five hypotheses for the Raviart--Thomas and Brezzi--Douglas--Marini elements of the Poisson and Stokes problems in both 2D and 3D.
NAApr 19, 2013
The Lower Bounds for Eigenvalues of Elliptic Operators --By Nonconforming Finite Element MethodsJun Hu, Yunqing Huang, Qun Lin
The aim of the paper is to introduce a new systematic method that can produce lower bounds for eigenvalues. The main idea is to use nonconforming finite element methods. The general conclusion herein is that if local approximation properties of nonconforming finite element spaces $V_h$ are better than global continuity properties of $V_h$, corresponding methods will produce lower bounds for eigenvalues. More precisely, under three conditions on continuity and approximation properties of nonconforming finite element spaces we first show abstract error estimates of approximate eigenvalues and eigenfunctions. Subsequently, we propose one more condition and prove that it is sufficient to guarantee nonconforming finite element methods to produce lower bounds for eigenvalues of symmetric elliptic operators. As one application, we show that this condition hold for most nonconforming elements in literature. As another important application, this condition provides a guidance to modify known nonconforming elements in literature and to propose new nonconforming elements. In fact, we enrich locally the Crouzeix-Raviart element such that the new element satisfies the condition; we propose a new nonconforming element for second order elliptic operators and prove that it will yield lower bounds for eigenvalues. Finally, we prove the saturation condition for most nonconforming elements.
AINov 7, 2023Code
The NeurIPS 2022 Neural MMO Challenge: A Massively Multiagent Competition with Specialization and TradeEnhong Liu, Joseph Suarez, Chenhui You et al.
In this paper, we present the results of the NeurIPS-2022 Neural MMO Challenge, which attracted 500 participants and received over 1,600 submissions. Like the previous IJCAI-2022 Neural MMO Challenge, it involved agents from 16 populations surviving in procedurally generated worlds by collecting resources and defeating opponents. This year's competition runs on the latest v1.6 Neural MMO, which introduces new equipment, combat, trading, and a better scoring system. These elements combine to pose additional robustness and generalization challenges not present in previous competitions. This paper summarizes the design and results of the challenge, explores the potential of this environment as a benchmark for learning methods, and presents some practical reinforcement learning training approaches for complex tasks with sparse rewards. Additionally, we have open-sourced our baselines, including environment wrappers, benchmarks, and visualization tools for future research.
NAJul 29, 2013
The lower bound of the error estimate in the L2 norm for the Adini element of the biharmonic equationJun Hu, Zhongci Shi
This paper is devoted to the $L^2$ norm error estimate of the Adini element for the biharmonic equation. Surprisingly, a lower bound is established which proves that the $ L^2$ norm convergence rate can not be higher than that in the energy norm. This proves the conjecture of [Lascaux and Lesaint, Some nonconforming finite elements for the plate bending problem, RAIRO Anal. Numer. 9 (1975), pp. 9--53.] that the convergence rates in both $L^2$ and $H^1$ norms can not be higher than that in the energy norm for this element.
NAJan 11, 2015
Superconvergence of both two and three dimensional rectangular Morley elements for biharmonic equationsJun Hu, Zhongci Shi, Xueqin Yang
In the present paper, superconvergence of second order, after an appropriate postprocessing, is achieved for both the two and three dimensional first order rectangular Morley elements of biharmonic equations. The analysis is dependent on superconvergence of second order for the consistency error and a corrected canonical interpolation operator, which help to establish supercloseness of second order for the corrected canonical interpolation. Then the final superconvergence follows a standard postprocessing. For first order nonconforming finite element methods of both two and three dimensional fourth order elliptic problems, it is the first time that full superconvergence of second order is obtained without an extra boundary condition imposed on exact solutions. It is also the first time that superconvergence is established for nonconforming finite element methods of three dimensional fourth order elliptic problems. Numerical results are presented to demonstrate the theoretical results.
NAMar 16, 2018
Two low-order nonconforming finite element methods for the Stokes flow in 3DJun Hu, Mira Schedensack
In this paper, we propose two low order nonconforming finite element methods (FEMs) for the three-dimensional Stokes flow that generalize the non-conforming FEM of Kouhia and Stenberg (1995, Comput. Methods Appl. Mech. Engrg.). The finite element spaces proposed in this paper consist of two globally continuous components (one piecewise affine and one enriched component) and one component that is continuous at the midpoints of interior faces. We prove that the discrete Korn inequality and a discrete inf-sup condition hold uniformly in the meshsize and also for a non-empty Neumann boundary. Based on these two results, we show the well-posedness of the discrete problem. Two counterexamples prove that there is no direct generalization of the Kouhia-Stenberg FEM to three space dimensions: The finite element space with one non-conforming and two conforming piecewise affine components does not satisfy a discrete inf-sup condition with piecewise constant pressure approximations, while finite element functions with two non-conforming and one conforming component do not satisfy a discrete Korn inequality.
LGAug 1, 2024
Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerceHouye Ji, Ye Tang, Zhaoxin Chen et al.
With the rapid development of the short video industry, traditional e-commerce has encountered a new paradigm, video-driven e-commerce, which leverages attractive videos for product showcases and provides both video and item services for users. Benefitting from the dynamic and visualized introduction of items,video-driven e-commerce has shown huge potential in stimulating consumer confidence and promoting sales. In this paper, we focus on the video retrieval task, facing the following challenges: (1) Howto handle the heterogeneities among users, items, and videos? (2)How to mine the complementarity between items and videos for better user understanding? In this paper, we first leverage the dual graph to model the co-existing of user-video and user-item interactions in video-driven e-commerce and innovatively reduce user preference understanding to a graph matching problem. To solve it, we further propose a novel bi-level Graph Matching Network(GMN), which mainly consists of node- and preference-level graph matching. Given a user, node-level graph matching aims to match videos and items, while preference-level graph matching aims to match multiple user preferences extracted from both videos and items. Then the proposed GMN can generate and improve user embedding by aggregating matched nodes or preferences from the dual graph in a bi-level manner. Comprehensive experiments show the superiority of the proposed GMN with significant improvements over state-of-the-art approaches (e.g., AUC+1.9% and CTR+7.15%). We have developed it on a well-known video-driven e-commerce platform, serving hundreds of millions of users every day
NAFeb 6, 2018
High accuracy methods for eigenvalues of elliptic operators by nonconforming elementsJun Hu, Limin Ma
In this paper, three high-accuracy methods for eigenvalues of second order elliptic operators are proposed by using the nonconforming Crouzeix-Raviart(CR for short) element and the nonconforming enriched Crouzeix-Raviart(ECR for short) element. They are based on a crucial full one order superconvergence of the first order mixed Raviart-Thomas(RT for short) element. The main ingredient of such a superconvergence analysis is to employ a discrete Helmholtz decomposition of the difference between the canonical interpolation and the finite element solution of the RT element. In particular, it allows for some vital cancellation between terms in one key sum of boundary terms. Consequently, a full one order superconvergence follows from a special relation between the CR element and the RT element, and the equivalence between the ECR element and the RT element for these two nonconforming elements. These superconvergence results improve those in literature from a half order to a full one order for the RT element, the CR element and the ECR element. Based on the aforementioned superconvergence of the RT element, asymptotic expansions of eigenvalues are established and employed to achieve high accuracy extrapolation methods for these two nonconforming elements. In contrast to a classic analysis in literature, the novelty herein is to use not only the canonical interpolations of these nonconforming elements but also that of the RT element to analyze such asymptotic expansions. Based on the superconvergence of these nonconforming elements, asymptotically exact a posteriori error estimators of eigenvalues are constructed and analyzed for them. Finally, two post-processing methods are proposed to improve accuracy of approximate eigenvalues by employing these a posteriori error estimators.Numerical tests are provided to justify and compare the performance of the aforementioned methods.
NAApr 27, 2016
Conforming mixed triangular prism and nonconforming mixed tetrahedral elements for the linear elasticity problemJun Hu, Rui Ma
We propose two families of mixed finite elements for solving the classical Hellinger-Reissner mixed problem of the linear elasticity equations in three dimensions. First, a family of conforming mixed triangular prism elements is constructed by product of elements on triangular meshes and elements in one dimension. The well-posedness is established for all elements with $k\geq1$, which are of $k+1$ order convergence for both the stress and displacement. Besides, a family of reduced stress spaces is proposed by dropping the degrees of polynomial functions associated with faces. As a result, the lowest order conforming mixed triangular prism element has 93 plus 33 degrees of freedom on each element. Second, we construct a new family of nonconforming mixed tetrahedral elements. The shape function spaces of our stress spaces are different from those of the elements in literature.
IRJan 16
Cross-Modal Attention Network with Dual Graph Learning in Multimodal RecommendationJi Dai, Quan Fang, Jun Hu et al.
Multimedia recommendation systems leverage user-item interactions and multimodal information to capture user preferences, enabling more accurate and personalized recommendations. Despite notable advancements, existing approaches still face two critical limitations: first, shallow modality fusion often relies on simple concatenation, failing to exploit rich synergic intra- and inter-modal relationships; second, asymmetric feature treatment-where users are only characterized by interaction IDs while items benefit from rich multimodal content-hinders the learning of a shared semantic space. To address these issues, we propose a Cross-modal Recursive Attention Network with dual graph Embedding (CRANE). To tackle shallow fusion, we design a core Recursive Cross-Modal Attention (RCA) mechanism that iteratively refines modality features based on cross-correlations in a joint latent space, effectively capturing high-order intra- and inter-modal dependencies. For symmetric multimodal learning, we explicitly construct users' multimodal profiles by aggregating features of their interacted items. Furthermore, CRANE integrates a symmetric dual-graph framework-comprising a heterogeneous user-item interaction graph and a homogeneous item-item semantic graph-unified by a self-supervised contrastive learning objective to fuse behavioral and semantic signals. Despite these complex modeling capabilities, CRANE maintains high computational efficiency. Theoretical and empirical analyses confirm its scalability and high practical efficiency, achieving faster convergence on small datasets and superior performance ceilings on large-scale ones. Comprehensive experiments on four public real-world datasets validate an average 5% improvement in key metrics over state-of-the-art baselines.
LGMar 7, 2025Code
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUsLing Team, Binwei Zeng, Chao Huang et al.
In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two differently sized MoE large language models (LLMs), namely Ling-Lite and Ling-Plus (referred to as "Bailing" in Chinese, spelled Bǎilíng in Pinyin). Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters. Both models exhibit comparable performance to leading industry benchmarks. This report offers actionable insights to improve the efficiency and accessibility of AI development in resource-constrained settings, promoting more scalable and sustainable technologies. Specifically, to reduce training costs for large-scale MoE models, we propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency. Additionally, leveraging high-quality data generated from knowledge graphs, our models demonstrate superior capabilities in tool use compared to other models. Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models. Compared to high-performance devices, utilizing a lower-specification hardware system during the pre-training phase demonstrates significant cost savings, reducing computing costs by approximately 20%. The models can be accessed at https://huggingface.co/inclusionAI.
LGFeb 1, 2023
Experimental observation on a low-rank tensor model for eigenvalue problemsJun Hu, Pengzhan Jin
Here we utilize a low-rank tensor model (LTM) as a function approximator, combined with the gradient descent method, to solve eigenvalue problems including the Laplacian operator and the harmonic oscillator. Experimental results show the superiority of the polynomial-based low-rank tensor model (PLTM) compared to the tensor neural network (TNN). We also test such low-rank architectures for the classification problem on the MNIST dataset.
CVMar 10, 2025Code
Effective and Efficient Masked Image Generation ModelsZebin You, Jingyang Ou, Xiaolu Zhang et al.
Although masked image generation models and masked diffusion models are designed with different motivations and objectives, we observe that they can be unified within a single framework. Building upon this insight, we carefully explore the design space of training and sampling, identifying key factors that contribute to both performance and efficiency. Based on the improvements observed during this exploration, we develop our model, referred to as eMIGM. Empirically, eMIGM demonstrates strong performance on ImageNet generation, as measured by Fréchet Inception Distance (FID). In particular, on ImageNet 256x256, with similar number of function evaluations (NFEs) and model parameters, eMIGM outperforms the seminal VAR. Moreover, as NFE and model parameters increase, eMIGM achieves performance comparable to the state-of-the-art continuous diffusion models while requiring less than 40% of the NFE. Additionally, on ImageNet 512x512, with only about 60% of the NFE, eMIGM outperforms the state-of-the-art continuous diffusion models. Code is available at https://github.com/ML-GSAI/eMIGM.
LGNov 14, 2025
Echoless Label-Based Pre-computation for Memory-Efficient Heterogeneous Graph LearningJun Hu, Shangheng Chen, Yufei He et al.
Heterogeneous Graph Neural Networks (HGNNs) are widely used for deep learning on heterogeneous graphs. Typical end-to-end HGNNs require repetitive message passing during training, limiting efficiency for large-scale real-world graphs. Pre-computation-based HGNNs address this by performing message passing only once during preprocessing, collecting neighbor information into regular-shaped tensors, which enables efficient mini-batch training. Label-based pre-computation methods collect neighbors' label information but suffer from training label leakage, where a node's own label information propagates back to itself during multi-hop message passing - the echo effect. Existing mitigation strategies are memory-inefficient on large graphs or suffer from compatibility issues with advanced message passing methods. We propose Echoless Label-based Pre-computation (Echoless-LP), which eliminates training label leakage with Partition-Focused Echoless Propagation (PFEP). PFEP partitions target nodes and performs echoless propagation, where nodes in each partition collect label information only from neighbors in other partitions, avoiding echo while remaining memory-efficient and compatible with any message passing method. We also introduce an Asymmetric Partitioning Scheme (APS) and a PostAdjust mechanism to address information loss from partitioning and distributional shifts across partitions. Experiments on public datasets demonstrate that Echoless-LP achieves superior performance and maintains memory efficiency compared to baselines.
CLJan 30
Autonomous Chain-of-Thought Distillation for Graph-Based Fraud DetectionYuan Li, Jun Hu, Bryan Hooi et al.
Graph-based fraud detection on text-attributed graphs (TAGs) requires jointly modeling rich textual semantics and relational dependencies. However, existing LLM-enhanced GNN approaches are constrained by predefined prompting and decoupled training pipelines, limiting reasoning autonomy and weakening semantic-structural alignment. We propose FraudCoT, a unified framework that advances TAG-based fraud detection through autonomous, graph-aware chain-of-thought (CoT) reasoning and scalable LLM-GNN co-training. To address the limitations of predefined prompts, we introduce a fraud-aware selective CoT distillation mechanism that generates diverse reasoning paths and enhances semantic-structural understanding. These distilled CoTs are integrated into node texts, providing GNNs with enriched, multi-hop semantic and structural cues for fraud detection. Furthermore, we develop an efficient asymmetric co-training strategy that enables end-to-end optimization while significantly reducing the computational cost of naive joint training. Extensive experiments on public and industrial benchmarks demonstrate that FraudCoT achieves up to 8.8% AUPRC improvement over state-of-the-art methods and delivers up to 1,066x speedup in training throughput, substantially advancing both detection performance and efficiency.
CVSep 5, 2024
TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal ModelYihao Zhao, Enhao Zhong, Cuiyun Yuan et al.
We propose TG-LMM (Text-Guided Large Multi-Modal Model), a novel approach that leverages textual descriptions of organs to enhance segmentation accuracy in medical images. Existing medical image segmentation methods face several challenges: current medical automatic segmentation models do not effectively utilize prior knowledge, such as descriptions of organ locations; previous text-visual models focus on identifying the target rather than improving the segmentation accuracy; prior models attempt to use prior knowledge to enhance accuracy but do not incorporate pre-trained models. To address these issues, TG-LMM integrates prior knowledge, specifically expert descriptions of the spatial locations of organs, into the segmentation process. Our model utilizes pre-trained image and text encoders to reduce the number of training parameters and accelerate the training process. Additionally, we designed a comprehensive image-text information fusion structure to ensure thorough integration of the two modalities of data. We evaluated TG-LMM on three authoritative medical image datasets, encompassing the segmentation of various parts of the human body. Our method demonstrated superior performance compared to existing approaches, such as MedSAM, SAM and nnUnet.
CLFeb 14, 2025
Large Language Diffusion ModelsShen Nie, Fengqi Zhu, Zebin You et al.
The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a principled generative approach for probabilistic inference by optimizing a likelihood lower bound. Across extensive benchmarks on general tasks, math, code, and so on, LLaDA demonstrates strong scalability and performs comparably to our self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue. Moreover, LLaDA addresses the reversal curse, surpassing GPT-4o in a reversal poem completion task. Our findings show the promise of diffusion models for language modeling at scale and challenge the common assumption that core LLM capabilities discussed above inherently depend on ARMs. Project page and codes: https://ml-gsai.github.io/LLaDA-demo/.
CVMar 11, 2022
DRTAM: Dual Rank-1 Tensor Attention ModuleHanxing Chi, Baihong Lin, Jun Hu et al.
Recently, attention mechanisms have been extensively investigated in computer vision, but few of them show excellent performance on both large and mobile networks. This paper proposes Dual Rank-1 Tensor Attention Module (DRTAM), a novel residual-attention-learning-guided attention module for feed-forward convolutional neural networks. Given a 3D feature tensor map, DRTAM firstly generates three 2D feature descriptors along three axes. Then, using three descriptors, DRTAM sequentially infers two rank-1 tensor attention maps, the initial attention map and the complement attention map, combines and multiplied them to the input feature map for adaptive feature refinement(see Fig.1(c)). To generate two attention maps, DRTAM introduces rank-1 tensor attention module (RTAM) and residual descriptors extraction module (RDEM): RTAM divides each 2D feature descriptors into several chunks, and generate three factor vectors of a rank-1 tensor attention map by employing strip pooling on each chunk so that local and long-range contextual information can be captured along three dimension respectively; RDEM generates three 2D feature descriptors of the residual feature to produce the complement attention map, using three factor vectors of the initial attention map and three descriptors of the input feature. Extensive experimental results on ImageNet-1K, MS COCO and PASCAL VOC demonstrate that DRTAM achieves competitive performance on both large and mobile networks compare with other state-of-the-art attention modules.
SIDec 18, 2024Code
Modality-Independent Graph Neural Networks with Global Transformers for Multimodal RecommendationJun Hu, Bryan Hooi, Bingsheng He et al.
Multimodal recommendation systems can learn users' preferences from existing user-item interactions as well as the semantics of multimodal data associated with items. Many existing methods model this through a multimodal user-item graph, approaching multimodal recommendation as a graph learning task. Graph Neural Networks (GNNs) have shown promising performance in this domain. Prior research has capitalized on GNNs' capability to capture neighborhood information within certain receptive fields (typically denoted by the number of hops, $K$) to enrich user and item semantics. We observe that the optimal receptive fields for GNNs can vary across different modalities. In this paper, we propose GNNs with Modality-Independent Receptive Fields, which employ separate GNNs with independent receptive fields for different modalities to enhance performance. Our results indicate that the optimal $K$ for certain modalities on specific datasets can be as low as 1 or 2, which may restrict the GNNs' capacity to capture global information. To address this, we introduce a Sampling-based Global Transformer, which utilizes uniform global sampling to effectively integrate global information for GNNs. We conduct comprehensive experiments that demonstrate the superiority of our approach over existing methods. Our code is publicly available at https://github.com/CrawlScript/MIG-GT.
LGDec 2, 2021Code
Contrastive Adaptive Propagation Graph Neural Networks for Efficient Graph LearningJun Hu, Shengsheng Qian, Quan Fang et al.
Graph Neural Networks (GNNs) have achieved great success in processing graph data by extracting and propagating structure-aware features. Existing GNN research designs various propagation schemes to guide the aggregation of neighbor information. Recently the field has advanced from local propagation schemes that focus on local neighbors towards extended propagation schemes that can directly deal with extended neighbors consisting of both local and high-order neighbors. Despite the impressive performance, existing approaches are still insufficient to build an efficient and learnable extended propagation scheme that can adaptively adjust the influence of local and high-order neighbors. This paper proposes an efficient yet effective end-to-end framework, namely Contrastive Adaptive Propagation Graph Neural Networks (CAPGNN), to address these issues by combining Personalized PageRank and attention techniques. CAPGNN models the learnable extended propagation scheme with a polynomial of a sparse local affinity matrix, where the polynomial relies on Personalized PageRank to provide superior initial coefficients. In order to adaptively adjust the influence of both local and high-order neighbors, a coefficient-attention model is introduced to learn to adjust the coefficients of the polynomial. In addition, we leverage self-supervised learning techniques and design a negative-free entropy-aware contrastive loss to explicitly take advantage of unlabeled data for training. We implement CAPGNN as two different versions named CAPGCN and CAPGAT, which use static and dynamic sparse local affinity matrices, respectively. Experiments on graph benchmark datasets suggest that CAPGNN can consistently outperform or match state-of-the-art baselines. The source code is publicly available at https://github.com/hujunxianligong/CAPGNN.
IRNov 19, 2021Code
GRecX: An Efficient and Unified Benchmark for GNN-based RecommendationDesheng Cai, Jun Hu, Quan Zhao et al.
In this paper, we present GRecX, an open-source TensorFlow framework for benchmarking GNN-based recommendation models in an efficient and unified way. GRecX consists of core libraries for building GNN-based recommendation benchmarks, as well as the implementations of popular GNN-based recommendation models. The core libraries provide essential components for building efficient and unified benchmarks, including FastMetrics (efficient metrics computation libraries), VectorSearch (efficient similarity search libraries for dense vectors), BatchEval (efficient mini-batch evaluation libraries), and DataManager (unified dataset management libraries). Especially, to provide a unified benchmark for the fair comparison of different complex GNN-based recommendation models, we design a new metric GRMF-X and integrate it into the FastMetrics component. Based on a TensorFlow GNN library tf_geometric, GRecX carefully implements a variety of popular GNN-based recommendation models. We carefully implement these baseline models to reproduce the performance reported in the literature, and our implementations are usually more efficient and friendly. In conclusion, GRecX enables uses to train and benchmark GNN-based recommendation baselines in an efficient and unified way. We conduct experiments with GRecX, and the experimental results show that GRecX allows us to train and benchmark GNN-based recommendation baselines in an efficient and unified way. The source code of GRecX is available at https://github.com/maenzhier/GRecX.
94.7NAApr 20
A Coupling Method of Mixed and Lagrange Finite Elements for Linear Elasticity ProblemWei Chen, Jun Hu, Limin Ma et al.
This paper proposes a finite element method that couples mixed and Lagrange finite elements to efficiently capture stress concentrations in elasticity problems. The method employs conforming mixed finite elements in regions with stress concentration, while standard Lagrange elements are used elsewhere, achieving a balance between stress accuracy and computational efficiency. The well-posedness of the coupled formulation and optimal a priori error estimates are established, even when the size of the mixed finite element subregion is $O(h)$. Numerical experiments are presented to verify the theoretical convergence rates and to demonstrate the effectiveness and efficiency of the proposed method.
LGMay 25, 2025
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion ModelsFengqi Zhu, Rongzhen Wang, Shen Nie et al.
While Masked Diffusion Models (MDMs), such as LLaDA, present a promising paradigm for language modeling, there has been relatively little effort in aligning these models with human preferences via reinforcement learning. The challenge primarily arises from the high variance in Evidence Lower Bound (ELBO)-based likelihood estimates required for preference optimization. To address this issue, we propose Variance-Reduced Preference Optimization (VRPO), a framework that formally analyzes the variance of ELBO estimators and derives bounds on both the bias and variance of preference optimization gradients. Building on this theoretical foundation, we introduce unbiased variance reduction strategies, including optimal Monte Carlo budget allocation and antithetic sampling, that significantly improve the performance of MDM alignment. We demonstrate the effectiveness of VRPO by applying it to LLaDA, and the resulting model, LLaDA 1.5, outperforms its SFT-only predecessor consistently and significantly across mathematical (GSM8K +4.7), code (HumanEval +3.0, MBPP +1.8), and alignment benchmarks (IFEval +4.0, Arena-Hard +4.3). Furthermore, LLaDA 1.5 demonstrates a highly competitive mathematical performance compared to strong language MDMs and ARMs. Project page: https://ml-gsai.github.io/LLaDA-1.5-Demo/.
LGMay 22, 2025
LLaDA-V: Large Language Diffusion Models with Visual Instruction TuningZebin You, Shen Nie, Xiaolu Zhang et al.
In this work, we introduce LLaDA-V, a purely diffusion-based Multimodal Large Language Model (MLLM) that integrates visual instruction tuning with masked diffusion models, representing a departure from the autoregressive paradigms dominant in current multimodal approaches. Built upon LLaDA, a representative large language diffusion model, LLaDA-V incorporates a vision encoder and MLP connector that projects visual features into the language embedding space, enabling effective multimodal alignment. Our empirical investigation reveals several intriguing results: First, LLaDA-V demonstrates promising multimodal performance despite its language model being weaker on purely textual tasks than counterparts like LLaMA3-8B and Qwen2-7B. When trained on the same instruction data, LLaDA-V is highly competitive to LLaMA3-V across multimodal tasks with better data scalability. It also narrows the performance gap to Qwen2-VL, suggesting the effectiveness of its architecture for multimodal tasks. Second, LLaDA-V achieves state-of-the-art performance in multimodal understanding compared to existing hybrid autoregressive-diffusion and purely diffusion-based MLLMs. Our findings suggest that large language diffusion models show promise in multimodal contexts and warrant further investigation in future research. Project page and codes: https://ml-gsai.github.io/LLaDA-V-demo/.
43.6IRMay 1
Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality CompletionYuan Li, Jun Hu, Jiaxin Jiang et al.
Multimodal data plays a critical role in web-based recommendation systems, where information from diverse modalities such as vision and text enhances representation learning. However, real-world multimodal datasets often suffer from modality incompleteness due to sensor failures, annotation scarcity, or privacy constraints, which substantially degrade model performance and reliability. One effective solution to address this issue is modality completion, which reconstructs missing features to provide modality-complete graphs for downstream tasks. Given a query node with missing multimodal features, existing modality completion methods typically infer information from the node itself or its neighbors to reconstruct the missing modality. However, these methods may overlook semantically relevant context in the graph, which contains valuable cues that are non-trivial to capture through simple methods like neighborhood aggregation. In this work, we propose GRE-MC, a Graph Retrieval-Enhanced Modality Completion framework, to overcome these limitations. By introducing a modality-aware subgraph retrieval mechanism, GRE-MC selects semantically relevant subgraphs from the entire graph, providing richer contextual information for completing missing modalities. Subsequently, a graph transformer jointly encodes the query node and the retrieved subgraph via global attention to complete the missing features, while a learnable sparse-routing codebook regularizes latent embeddings into compact bases for improved robustness. Extensive experiments on multimodal recommendation benchmarks demonstrate that GRE-MC consistently outperforms state-of-the-art methods, validating the effectiveness of subgraph retrieval and joint-encoding graph transformer for robust modality completion.
NAFeb 11, 2024
A hybrid iterative method based on MIONet for PDEs: Theory and numerical examplesJun Hu, Pengzhan Jin
We propose a hybrid iterative method based on MIONet for PDEs, which combines the traditional numerical iterative solver and the recent powerful machine learning method of neural operator, and further systematically analyze its theoretical properties, including the convergence condition, the spectral behavior, as well as the convergence rate, in terms of the errors of the discretization and the model inference. We show the theoretical results for the frequently-used smoothers, i.e. Richardson (damped Jacobi) and Gauss-Seidel. We give an upper bound of the convergence rate of the hybrid method w.r.t. the model correction period, which indicates a minimum point to make the hybrid iteration converge fastest. Several numerical examples including the hybrid Richardson (Gauss-Seidel) iteration for the 1-d (2-d) Poisson equation are presented to verify our theoretical results, and also reflect an excellent acceleration effect. As a meshless acceleration method, it is provided with enormous potentials for practice applications.