Han Feng

LG
h-index34
23papers
324citations
Novelty48%
AI Score55

23 Papers

LGJun 7, 2023
Permutation Equivariant Graph Framelets for Heterophilous Graph Learning

Jianfei Li, Ruigang Zheng, Han Feng et al.

The nature of heterophilous graphs is significantly different from that of homophilous graphs, which causes difficulties in early graph neural network models and suggests aggregations beyond the 1-hop neighborhood. In this paper, we develop a new way to implement multi-scale extraction via constructing Haar-type graph framelets with desired properties of permutation equivariance, efficiency, and sparsity, for deep learning tasks on graphs. We further design a graph framelet neural network model PEGFAN (Permutation Equivariant Graph Framelet Augmented Network) based on our constructed graph framelets. The experiments are conducted on a synthetic dataset and 9 benchmark datasets to compare performance with other state-of-the-art models. The result shows that our model can achieve the best performance on certain datasets of heterophilous graphs (including the majority of heterophilous datasets with relatively larger sizes and denser connections) and competitive performance on the remaining.

LGOct 19, 2022
SignReLU neural network and its approximation ability

Jianfei Li, Han Feng, Ding-Xuan Zhou

Deep neural networks (DNNs) have garnered significant attention in various fields of science and technology in recent years. Activation functions define how neurons in DNNs process incoming signals for them. They are essential for learning non-linear transformations and for performing diverse computations among successive neuron layers. In the last few years, researchers have investigated the approximation ability of DNNs to explain their power and success. In this paper, we explore the approximation ability of DNNs using a different activation function, called SignReLU. Our theoretical results demonstrate that SignReLU networks outperform rational and ReLU networks in terms of approximation performance. Numerical experiments are conducted comparing SignReLU with the existing activations such as ReLU, Leaky ReLU, and ELU, which illustrate the competitive practical performance of SignReLU.

CVDec 15, 2025Code
Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation

Jiangning Zhang, Junwei Zhu, Zhenye Gan et al.

We propose a multimodal-driven framework for high-fidelity long-term digital human animation termed $\textbf{Soul}$, which generates semantically coherent videos from a single-frame portrait image, text prompts, and audio, achieving precise lip synchronization, vivid facial expressions, and robust identity preservation. We construct Soul-1M, containing 1 million finely annotated samples with a precise automated annotation pipeline (covering portrait, upper-body, full-body, and multi-person scenes) to mitigate data scarcity, and we carefully curate Soul-Bench for comprehensive and fair evaluation of audio-/text-guided animation methods. The model is built on the Wan2.2-5B backbone, integrating audio-injection layers and multiple training strategies together with threshold-aware codebook replacement to ensure long-term generation consistency. Meanwhile, step/CFG distillation and a lightweight VAE are used to optimize inference efficiency, achieving an 11.4$\times$ speedup with negligible quality loss. Extensive experiments show that Soul significantly outperforms current leading open-source and commercial models on video quality, video-text alignment, identity preservation, and lip-synchronization accuracy, demonstrating broad applicability in real-world scenarios such as virtual anchors and film production. Project page at https://zhangzjn.github.io/projects/Soul/

LGOct 14, 2022
Approximation analysis of CNNs from a feature extraction view

Jianfei Li, Han Feng, Ding-Xuan Zhou

Deep learning based on deep neural networks has been very successful in many practical applications, but it lacks enough theoretical understanding due to the network architectures and structures. In this paper we establish some analysis for linear feature extraction by a deep multi-channel convolutional neural networks (CNNs), which demonstrates the power of deep learning over traditional linear transformations, like Fourier, wavelets, redundant dictionary coding methods. Moreover, we give an exact construction presenting how linear features extraction can be conducted efficiently with multi-channel CNNs. It can be applied to lower the essential dimension for approximating a high dimensional function. Rates of function approximation by such deep networks implemented with channels and followed by fully-connected layers are investigated as well. Harmonic analysis for factorizing linear features into multi-resolution convolutions plays an essential role in our work. Nevertheless, a dedicate vectorization of matrices is constructed, which bridges 1D CNN and 2D CNN and allows us to have corresponding 2D analysis.

39.4LGApr 8
Sparse-Aware Neural Networks for Nonlinear Functionals: Mitigating the Exponential Dependence on Dimension

Jianfei Li, Shuo Huang, Han Feng et al.

Deep neural networks have emerged as powerful tools for learning operators defined over infinite-dimensional function spaces. However, existing theories frequently encounter difficulties related to dimensionality and limited interpretability. This work investigates how sparsity can help address these challenges in functional learning, a central ingredient in operator learning. We propose a framework that employs convolutional architectures to extract sparse features from a finite number of samples, together with deep fully connected networks to effectively approximate nonlinear functionals. Using universal discretization methods, we show that sparse approximators enable stable recovery from discrete samples. In addition, both the deterministic and the random sampling schemes are sufficient for our analysis. These findings lead to improved approximation rates and reduced sample sizes in various function spaces, including those with fast frequency decay and mixed smoothness. They also provide new theoretical insights into how sparsity can alleviate the curse of dimensionality in functional learning.

IVSep 29, 2022
Spherical Image Inpainting with Frame Transformation and Data-driven Prior Deep Networks

Jianfei Li, Chaoyan Huang, Raymond Chan et al.

Spherical image processing has been widely applied in many important fields, such as omnidirectional vision for autonomous cars, global climate modelling, and medical imaging. It is non-trivial to extend an algorithm developed for flat images to the spherical ones. In this work, we focus on the challenging task of spherical image inpainting with deep learning-based regularizer. Instead of a naive application of existing models for planar images, we employ a fast directional spherical Haar framelet transform and develop a novel optimization framework based on a sparsity assumption of the framelet transform. Furthermore, by employing progressive encoder-decoder architecture, a new and better-performed deep CNN denoiser is carefully designed and works as an implicit regularizer. Finally, we use a plug-and-play method to handle the proposed optimization model, which can be implemented efficiently by training the CNN denoiser prior. Numerical experiments are conducted and show that the proposed algorithms can greatly recover damaged spherical images and achieve the best performance over purely using deep learning denoiser and plug-and-play model.

LGAug 10, 2024
Convergence Analysis for Deep Sparse Coding via Convolutional Neural Networks

Jianfei Li, Han Feng, Ding-Xuan Zhou

In this work, we explore the intersection of sparse coding theory and deep learning to enhance our understanding of feature extraction capabilities in advanced neural network architectures. We begin by introducing a novel class of Deep Sparse Coding (DSC) models and establish a thorough theoretical analysis of their uniqueness and stability properties. By applying iterative algorithms to these DSC models, we derive convergence rates for convolutional neural networks (CNNs) in their ability to extract sparse features. This provides a strong theoretical foundation for the use of CNNs in sparse feature-learning tasks. We additionally extend this convergence analysis to more general neural network architectures, including those with diverse activation functions, as well as self-attention and transformer-based models. This broadens the applicability of our findings to a wide range of deep learning methods for the extraction of deep-sparse features. Inspired by the strong connection between sparse coding and CNNs, we also explore training strategies to encourage neural networks to learn sparser features. Through numerical experiments, we demonstrate the effectiveness of these approaches, providing valuable insight for the design of efficient and interpretable deep learning models.

LGJul 1, 2024
Bridging Smoothness and Approximation: Theoretical Insights into Over-Smoothing in Graph Neural Networks

Guangrui Yang, Jianfei Li, Ming Li et al.

In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we introduce the concept of a $K$-functional on graphs, establishing its equivalence to the modulus of smoothness. We then analyze a typical type of GCN to demonstrate how the high-frequency energy of the output decays, an indicator of over-smoothing. This analysis provides theoretical insights into the nature of over-smoothing within GCNs. Furthermore, we establish a lower bound for the approximation of target functions by GCNs, which is governed by the modulus of smoothness of these functions. This finding offers a new perspective on the approximation capabilities of GCNs. In our numerical experiments, we analyze several widely applied GCNs and observe the phenomenon of energy decay. These observations corroborate our theoretical results on exponential decay order.

CVJul 7, 2025Code
Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations

Yuji Wang, Moran Li, Xiaobin Hu et al.

Identity-preserving text-to-video (IPT2V) generation, which aims to create high-fidelity videos with consistent human identity, has become crucial for downstream applications. However, current end-to-end frameworks suffer a critical spatial-temporal trade-off: optimizing for spatially coherent layouts of key elements (e.g., character identity preservation) often compromises instruction-compliant temporal smoothness, while prioritizing dynamic realism risks disrupting the spatial coherence of visual structures. To tackle this issue, we propose a simple yet effective spatial-temporal decoupled framework that decomposes representations into spatial features for layouts and temporal features for motion dynamics. Specifically, our paper proposes a semantic prompt optimization mechanism and stage-wise decoupled generation paradigm. The former module decouples the prompt into spatial and temporal components. Aligned with the subsequent stage-wise decoupled approach, the spatial prompts guide the text-to-image (T2I) stage to generate coherent spatial features, while the temporal prompts direct the sequential image-to-video (I2V) stage to ensure motion consistency. Experimental results validate that our approach achieves excellent spatiotemporal consistency, demonstrating outstanding performance in identity preservation, text relevance, and video quality. By leveraging this simple yet robust mechanism, our algorithm secures the runner-up position in 2025 ACM MultiMedia Challenge. Our code is available at https://github.com/rain152/IPVG.

68.4CVMay 12
CRAFT: Clinical Reward-Aligned Finetuning for Medical Image Synthesis

Yunsung Chung, Alex El Darzi, Carlo El Khoury et al.

Foundation diffusion models can generate photorealistic natural images, but adapting them to medical imaging remains challenging. In medical adaptation, limited labeled data can exacerbate hallucination-like and clinically implausible synthesis, while existing metrics such as FID or Inception Score do not quantify per-image alignment with pathology-relevant criteria. We introduce the Clinical Alignment Score (CAS), a foundation-model-based proxy for clinical alignment that evaluates generated images along four complementary dimensions beyond visual fidelity. Building on CAS, we propose Clinical Reward-Aligned Finetuning (CRAFT), a reward-based adaptation framework that transfers medical knowledge from multimodal large language models and vision-language models through label-conditioned prompt enrichment, clinical checklists, and differentiable reward optimization. Across four diverse modalities, CRAFT improves CAS and downstream classification performance over strong adaptation baselines. Beyond average CAS gains, CRAFT reduces the empirical low-alignment tail below a real-image reference threshold by 5.5-34.7% points relative to the strongest baseline, corresponding to a 20.4% average relative reduction across datasets. These results indicate fewer hallucination-like generations under CAS, and are corroborated by out-of-family evaluator evaluation, structured checklist auditing, memorization analysis, and a blinded physician preference study on CheXpert.

CVAug 3, 2025Code
StrandDesigner: Towards Practical Strand Generation with Sketch Guidance

Na Zhang, Moran Li, Chengming Xu et al.

Realistic hair strand generation is crucial for applications like computer graphics and virtual reality. While diffusion models can generate hairstyles from text or images, these inputs lack precision and user-friendliness. Instead, we propose the first sketch-based strand generation model, which offers finer control while remaining user-friendly. Our framework tackles key challenges, such as modeling complex strand interactions and diverse sketch patterns, through two main innovations: a learnable strand upsampling strategy that encodes 3D strands into multi-scale latent spaces, and a multi-scale adaptive conditioning mechanism using a transformer with diffusion heads to ensure consistency across granularity levels. Experiments on several benchmark datasets show our method outperforms existing approaches in realism and precision. Qualitative results further confirm its effectiveness. Code will be released at [GitHub](https://github.com/fighting-Zhang/StrandDesigner).

LGJan 29
Understanding Diffusion Models via Ratio-Based Function Approximation with SignReLU Networks

Luwei Sun, Dongrui Shen, Jianfe Li et al.

Motivated by challenges in conditional generative modeling, where the target conditional density takes the form of a ratio f1 over f2, this paper develops a theoretical framework for approximating such ratio-type functionals. Here, f1 and f2 are kernel-based marginal densities that capture structured interactions, a setting central to diffusion-based generative models. We provide a concise proof for approximating these ratio-type functionals using deep neural networks with the SignReLU activation function, leveraging the activation's piecewise structure. Under standard regularity assumptions, we establish L^p(Omega) approximation bounds and convergence rates. Specializing to Denoising Diffusion Probabilistic Models (DDPMs), we construct a SignReLU-based neural estimator for the reverse process and derive bounds on the excess Kullback-Leibler (KL) risk between the generated and true data distributions. Our analysis decomposes this excess risk into approximation and estimation error components. These results provide generalization guarantees for finite-sample training of diffusion-based generative models.

LGJul 16, 2024
Theoretical Insights into CycleGAN: Analyzing Approximation and Estimation Errors in Unpaired Data Generation

Luwei Sun, Dongrui Shen, Han Feng

In this paper, we focus on analyzing the excess risk of the unpaired data generation model, called CycleGAN. Unlike classical GANs, CycleGAN not only transforms data between two unpaired distributions but also ensures the mappings are consistent, which is encouraged by the cycle-consistency term unique to CycleGAN. The increasing complexity of model structure and the addition of the cycle-consistency term in CycleGAN present new challenges for error analysis. By considering the impact of both the model architecture and training procedure, the risk is decomposed into two terms: approximation error and estimation error. These two error terms are analyzed separately and ultimately combined by considering the trade-off between them. Each component is rigorously analyzed; the approximation error through constructing approximations of the optimal transport maps, and the estimation error through establishing an upper bound using Rademacher complexity. Our analysis not only isolates these errors but also explores the trade-offs between them, which provides a theoretical insights of how CycleGAN's architecture and training procedures influence its performance.

LGOct 11, 2024
Deeper Insights into Deep Graph Convolutional Networks: Stability and Generalization

Guangrui Yang, Ming Li, Han Feng et al.

Graph convolutional networks (GCNs) have emerged as powerful models for graph learning tasks, exhibiting promising performance in various domains. While their empirical success is evident, there is a growing need to understand their essential ability from a theoretical perspective. Existing theoretical research has primarily focused on the analysis of single-layer GCNs, while a comprehensive theoretical exploration of the stability and generalization of deep GCNs remains limited. In this paper, we bridge this gap by delving into the stability and generalization properties of deep GCNs, aiming to provide valuable insights by characterizing rigorously the associated upper bounds. Our theoretical results reveal that the stability and generalization of deep GCNs are influenced by certain key factors, such as the maximum absolute eigenvalue of the graph filter operators and the depth of the network. Our theoretical studies contribute to a deeper understanding of the stability and generalization properties of deep GCNs, potentially paving the way for developing more reliable and well-performing models.

LGMar 25, 2024
On the rates of convergence for learning with convolutional neural networks

Yunfei Yang, Han Feng, Ding-Xuan Zhou

We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels. Our first result proves a new approximation bound for CNNs with certain constraint on the weights. Our second result gives new analysis on the covering number of feed-forward neural networks with CNNs as special cases. The analysis carefully takes into account the size of the weights and hence gives better bounds than the existing literature in some situations. Using these two results, we are able to derive rates of convergence for estimators based on CNNs in many learning problems. In particular, we establish minimax optimal convergence rates of the least squares based on CNNs for learning smooth functions in the nonparametric regression setting. For binary classification, we derive convergence rates for CNN classifiers with hinge loss and logistic loss. It is also shown that the obtained rates for classification are minimax optimal in some common settings.

CVDec 15, 2025
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Team Seedance, Heyi Chen, Siyan Chen et al.

Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practical utility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning (SFT) on high-quality datasets and Reinforcement Learning from Human Feedback (RLHF) with multi-dimensional reward models. Furthermore, we introduce an acceleration framework that boosts inference speed by over 10X. Seedance 1.5 pro distinguishes itself through precise multilingual and dialect lip-syncing, dynamic cinematic camera control, and enhanced narrative coherence, positioning it as a robust engine for professional-grade content creation. Seedance 1.5 pro is now accessible on Volcano Engine at https://console.volcengine.com/ark/region:ark+cn-beijing/experience/vision?type=GenVideo.

LGFeb 2
Weighted Temporal Decay Loss for Learning Wearable PPG Data with Sparse Clinical Labels

Yunsung Chung, Keum San Chun, Migyeong Gwak et al.

Advances in wearable computing and AI have increased interest in leveraging PPG for health monitoring over the past decade. One of the biggest challenges in developing health algorithms based on such biosignals is the sparsity of clinical labels, which makes biosignals temporally distant from lab draws less reliable for supervision. To address this problem, we introduce a simple training strategy that learns a biomarker-specific decay of sample weight over the time gap between a segment and its ground truth label and uses this weight in the loss with a regularizer to prevent trivial solutions. On smartwatch PPG from 450 participants across 10 biomarkers, the approach improves over baselines. In the subject-wise setting, the proposed approach averages 0.715 AUPRC, compared to 0.674 for a fine-tuned self-supervised baseline and 0.626 for a feature-based Random Forest. A comparison of four decay families shows that a simple linear decay function is most robust on average. Beyond accuracy, the learned decay rates summarize how quickly each biomarker's PPG evidence becomes stale, providing an interpretable view of temporal sensitivity.

LGJan 18, 2025
HOPS: High-order Polynomials with Self-supervised Dimension Reduction for Load Forecasting

Pengyang Song, Han Feng, Shreyashi Shukla et al.

Load forecasting is a fundamental task in smart grid. Many techniques have been applied to developing load forecasting models. Due to the challenges such as the Curse of Dimensionality, overfitting, and limited computing resources, multivariate higher-order polynomial models have received limited attention in load forecasting, despite their desirable mathematical foundations and optimization properties. In this paper, we propose low rank approximation and self-supervised dimension reduction to address the aforementioned issues. To further improve computational efficiency, we also utilize a fast Conjugate Gradient based algorithm for the proposed polynomial models. Based on the load datasets from the ISO New England, the proposed method high-order polynomials with self-supervised dimension reduction (HOPS) demonstrates higher forecasting accuracy over several competitive models. Additionally, experimental results indicate that our approach alleviates redundant variable construction, achieving better forecasts with fewer input variables.

MLMay 31, 2023
Fine-grained Analysis of Non-parametric Estimation for Pairwise Learning

Junyu Zhou, Shuo Huang, Han Feng et al.

In this paper, we are concerned with the generalization performance of non-parametric estimation for pairwise learning. Most of the existing work requires the hypothesis space to be convex or a VC-class, and the loss to be convex. However, these restrictive assumptions limit the applicability of the results in studying many popular methods, especially kernel methods and neural networks. We significantly relax these restrictive assumptions and establish a sharp oracle inequality of the empirical minimizer with a general hypothesis space for the Lipschitz continuous pairwise losses. As an example, we apply our general results to study pairwise least squares regression and derive an excess population risk bound that matches the minimax lower bound for the pointwise least squares regression. The key novelty lies in constructing a structured deep ReLU neural network to approximate the true predictor, and in designing a targeted hypothesis space composed of networks with this structure and controllable complexity. Experiments validate the effectiveness of the proposed method. This example demonstrates that the obtained general results indeed help us to explore the generalization performance on a variety of problems that cannot be handled by existing approaches.

SPJan 17, 2022
Convolutional Neural Networks for Spherical Signal Processing via Spherical Haar Tight Framelets

Jianfei Li, Han Feng, Xiaosheng Zhuang

In this paper, we develop a general theoretical framework for constructing Haar-type tight framelets on any compact set with a hierarchical partition. In particular, we construct a novel area-regular hierarchical partition on the 2-sphere and establish its corresponding spherical Haar tight framelets with directionality. We conclude by evaluating and illustrating the effectiveness of our area-regular spherical Haar tight framelets in several denoising experiments. Furthermore, we propose a convolutional neural network (CNN) model for spherical signal denoising which employs the fast framelet decomposition and reconstruction algorithms. Experiment results show that our proposed CNN model outperforms threshold methods, and processes strong generalization and robustness properties.

LGDec 5, 2021
Radial Basis Function Approximation with Distributively Stored Data on Spheres

Han Feng, Shao-Bo Lin, Ding-Xuan Zhou

This paper proposes a distributed weighted regularized least squares algorithm (DWRLS) based on spherical radial basis functions and spherical quadrature rules to tackle spherical data that are stored across numerous local servers and cannot be shared with each other. Via developing a novel integral operator approach, we succeed in deriving optimal approximation rates for DWRLS and theoretically demonstrate that DWRLS performs similarly as running a weighted regularized least squares algorithm with the whole data on a large enough machine. This interesting finding implies that distributed learning is capable of sufficiently exploiting potential values of distributively stored spherical data, even though every local server cannot access all the data.

CYNov 8, 2021
Identifying Best Fair Intervention

Ruijiang Gao, Han Feng

We study the problem of best arm identification with a fairness constraint in a given causal model. The goal is to find a soft intervention on a given node to maximize the outcome while meeting a fairness constraint by counterfactual estimation with only partial knowledge of the causal model. The problem is motivated by ensuring fairness on an online marketplace. We provide theoretical guarantees on the probability of error and empirically examine the effectiveness of our algorithm with a two-stage baseline.

LGJul 28, 2020
Theory of Deep Convolutional Neural Networks II: Spherical Analysis

Zhiying Fang, Han Feng, Shuo Huang et al.

Deep learning based on deep neural networks of various structures and architectures has been powerful in many practical applications, but it lacks enough theoretical verifications. In this paper, we consider a family of deep convolutional neural networks applied to approximate functions on the unit sphere $\mathbb{S}^{d-1}$ of $\mathbb{R}^d$. Our analysis presents rates of uniform approximation when the approximated function lies in the Sobolev space $W^r_\infty (\mathbb{S}^{d-1})$ with $r>0$ or takes an additive ridge form. Our work verifies theoretically the modelling and approximation ability of deep convolutional neural networks followed by downsampling and one fully connected layer or two. The key idea of our spherical analysis is to use the inner product form of the reproducing kernels of the spaces of spherical harmonics and then to apply convolutional factorizations of filters to realize the generated linear features.