Hong Jiang

CV
h-index26
21papers
407citations
Novelty48%
AI Score48

21 Papers

98.4CLMay 2
The grip of grammar on meaning uncertainty: cross-linguistic evidence, neural correlates, and clinical relevance

Rui He, Claudio Palominos, Samuele Vallisa et al.

Isolated word meanings are inherently uncertain. This uncertainty reduces when they are combined and anchored in context. We propose that grammar compresses meaning uncertainty cross-linguistically, which is reflected in brain and selectively disrupted in disorders. Compression was operationalized as the relative difference between non-contextual surprisal estimated from lexical frequency, and contextual surprisal from grammar-sensitive models. In narratives from 20 languages, contextual surprisal reduced frequency-based surprisal. This reduction closely tracked the surprisal cost of reversing word order, and scaled with richer, non-redundant lexis as organized by more complex but optimal dependency structure. During fMRI, surprisal and its reduction explained BOLD activity for comprehension and production in overlapping but distinct regions. Uncertainty reduction was significantly attenuated in aphasia, dementia, and schizophrenia, but remained intact where primary deficit is not language. These findings position uncertainty reduction via grammar as a foundational concept that illuminates principles, brain basis, and disruptions of language.

NAFeb 8, 2013
A Stochastic Conjugate Gradient Method for Approximation of Functions

Hong Jiang, Paul Wilford

A stochastic conjugate gradient method for approximation of a function is proposed. The proposed method avoids computing and storing the covariance matrix in the normal equations for the least squares solution. In addition, the method performs the conjugate gradient steps by using an inner product that is based stochastic sampling. Theoretical analysis shows that the method is convergent in probability. The method has applications in such fields as predistortion for the linearization of power amplifiers.

97.9CVApr 19
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models

Hong Jiang, Wensong Song, Zongxing Yang et al.

Camera-controllable image editing aims to synthesize novel views of a given scene under varying camera poses while strictly preserving cross-view geometric consistency. However, existing methods typically rely on fragmented geometric guidance, such as only injecting point clouds at the representation level despite models containing multiple levels, and are mainly based on image diffusion models that operate on discrete view mappings. These two limitations jointly lead to geometric drift and structural degradation under continuous camera motion. We observe that while leveraging video models provides continuous viewpoint priors for camera-controllable image editing, they still struggle to form stable geometric understanding if geometric guidance remains fragmented. To systematically address this, we inject unified geometric guidance across three levels that jointly determine the generative output: representation, architecture, and loss function. To this end, we propose UniGeo, a novel camera-controllable editing framework. Specifically, at the representation level, UniGeo incorporates a frame-decoupled geometric reference injection mechanism to provide robust cross-view geometry context. At the architecture level, it introduces geometric anchor attention to align multi-view features. At the loss function level, it proposes a trajectory-endpoint geometric supervision strategy to explicitly reinforce the structural fidelity of target views. Comprehensive experiments across multiple public benchmarks, encompassing both extensive and limited camera motion settings, demonstrate that UniGeo significantly outperforms existing methods in both visual quality and geometric consistency.

28.0DCMar 17
ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation

Nij Dorairaj, Debabrata Chatterjee, Hong Wang et al.

Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability. As systems evolve toward chiplet-based architectures, pre-silicon validation of tightly coupled CPU-GPU subsystems becomes increasingly challenging due to complex validation framework setup, large design scale, high concurrency, non-deterministic execution, and intricate protocol interactions at chiplet boundaries, often resulting in long integration cycles. This paper presents a replay-driven validation methodology developed during the integration of a CPU subsystem, multiple Xe GPU cores, and a configurable Network-on-Chip (NoC) within a foundational SoC building block targeting the ODIN integrated chiplet architecture. By leveraging deterministic waveform capture and replay across both simulation and emulation using a single design database, complex GPU workloads and protocol sequences can be reproduced reliably at the system level. This approach significantly accelerates debug, improves integration confidence, and enables end-to-end system boot and workload execution within a single quarter, demonstrating the effectiveness of replay-based validation as a scalable methodology for chiplet-based systems.

CVApr 21, 2025
Insert Anything: Image Insertion via In-Context Editing in DiT

Wensong Song, Hong Jiang, Zongxing Yang et al.

This work presents Insert Anything, a unified framework for reference-based image insertion that seamlessly integrates objects from reference images into target scenes under flexible, user-specified control guidance. Instead of training separate models for individual tasks, our approach is trained once on our new AnyInsertion dataset--comprising 120K prompt-image pairs covering diverse tasks such as person, object, and garment insertion--and effortlessly generalizes to a wide range of insertion scenarios. Such a challenging setting requires capturing both identity features and fine-grained details, while allowing versatile local adaptations in style, color, and texture. To this end, we propose to leverage the multimodal attention of the Diffusion Transformer (DiT) to support both mask- and text-guided editing. Furthermore, we introduce an in-context editing mechanism that treats the reference image as contextual information, employing two prompting strategies to harmonize the inserted elements with the target scene while faithfully preserving their distinctive features. Extensive experiments on AnyInsertion, DreamBooth, and VTON-HD benchmarks demonstrate that our method consistently outperforms existing alternatives, underscoring its great potential in real-world applications such as creative content generation, virtual try-on, and scene composition.

LGMar 31, 2025
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Huandong Chang, Zicheng Ma, Mingyuan Ma et al.

Low-Rank Adaptation (LoRA) has become a widely adopted technique for fine-tuning large-scale pre-trained models with minimal parameter updates. However, existing methods rely on fixed ranks or focus solely on either rank pruning or expansion, failing to adapt ranks dynamically to match the importance of different layers during training. In this work, we propose ElaLoRA, an adaptive low-rank adaptation framework that dynamically prunes and expands ranks based on gradient-derived importance scores. To the best of our knowledge, ElaLoRA is the first method that enables both rank pruning and expansion during fine-tuning. Experiments across multiple benchmarks demonstrate that ElaLoRA consistently outperforms existing PEFT methods across different parameter budgets. Furthermore, our studies validate that layers receiving higher rank allocations contribute more significantly to model performance, providing theoretical justification for our adaptive strategy. By introducing a principled and adaptive rank allocation mechanism, ElaLoRA offers a scalable and efficient fine-tuning solution, particularly suited for resource-constrained environments.

LGOct 4, 2019
Lipschitz Learning for Signal Recovery

Hong Jiang, Jong-Hoon Ahn, Xiaoyang Wang

We consider the recovery of signals from their observations, which are samples of a transform of the signals rather than the signals themselves, by using machine learning (ML). We will develop a theoretical framework to characterize the signals that can be robustly recovered from their observations by an ML algorithm, and establish a Lipschitz condition on signals and observations that is both necessary and sufficient for the existence of a robust recovery. We will compare the Lipschitz condition with the well-known restricted isometry property of the sparse recovery of compressive sensing, and show the former is more general and less restrictive. For linear observations, our work also suggests an ML method in which the output space is reduced to the lowest possible dimension.

CVJan 19, 2017
Block-wise Lensless Compressive Camera

Xin Yuan, Gang Huang, Hong Jiang et al.

The existing lensless compressive camera ($\text{L}^2\text{C}^2$)~\cite{Huang13ICIP} suffers from low capture rates, resulting in low resolution images when acquired over a short time. In this work, we propose a new regime to mitigate these drawbacks. We replace the global-based compressive sensing used in the existing $\text{L}^2\text{C}^2$ by the local block (patch) based compressive sensing. We use a single sensor for each block, rather than for the entire image, thus forming a multiple but spatially parallel sensor $\text{L}^2\text{C}^2$. This new camera retains the advantages of existing $\text{L}^2\text{C}^2$ while leading to the following additional benefits: 1) Since each block can be very small, {\em e.g.}$~8\times 8$ pixels, we only need to capture $\sim 10$ measurements to achieve reasonable reconstruction. Therefore the capture time can be reduced significantly. 2) The coding patterns used in each block can be the same, therefore the sensing matrix is only of the block size compared to the entire image size in existing $\text{L}^2\text{C}^2$. This saves the memory requirement of the sensing matrix as well as speeds up the reconstruction. 3) Patch based image reconstruction is fast and since real time stitching algorithms exist, we can perform real time reconstruction. 4) These small blocks can be integrated to any desirable number, leading to ultra high resolution images while retaining fast capture rate and fast reconstruction. We develop multiple geometries of this block-wise $\text{L}^2\text{C}^2$ in this paper. We have built prototypes of the proposed block-wise $\text{L}^2\text{C}^2$ and demonstrated excellent results of real data.

CVFeb 18, 2016
Multi-resolution Compressive Sensing Reconstruction

Adriana Gonzalez, Hong Jiang, Gang Huang et al.

We consider the problem of reconstructing an image from compressive measurements using a multi-resolution grid. In this context, the reconstructed image is divided into multiple regions, each one with a different resolution. This problem arises in situations where the image to reconstruct contains a certain region of interest (RoI) that is more important than the rest. Through a theoretical analysis and simulation experiments we show that the multi-resolution reconstruction provides a higher quality of the RoI compared to the traditional single-resolution approach.

MLAug 27, 2015
Compressive Sensing via Low-Rank Gaussian Mixture Models

Xin Yuan, Hong Jiang, Gang Huang et al.

We develop a new compressive sensing (CS) inversion algorithm by utilizing the Gaussian mixture model (GMM). While the compressive sensing is performed globally on the entire image as implemented in our lensless camera, a low-rank GMM is imposed on the local image patches. This low-rank GMM is derived via eigenvalue thresholding of the GMM trained on the projection of the measurement data, thus learned {\em in situ}. The GMM and the projection of the measurement data are updated iteratively during the reconstruction. Our GMM algorithm degrades to the piecewise linear estimator (PLE) if each patch is represented by a single Gaussian model. Inspired by this, a low-rank PLE algorithm is also developed for CS inversion, constituting an additional contribution of this paper. Extensive results on both simulation data and real data captured by the lensless camera demonstrate the efficacy of the proposed algorithm. Furthermore, we compare the CS reconstruction results using our algorithm with the JPEG compression. Simulation results demonstrate that when limited bandwidth is available (a small number of measurements), our algorithm can achieve comparable results as JPEG.

CVAug 14, 2015
Lensless Compressive Imaging

Xin Yuan, Hong Jiang, Gang Huang et al.

We develop a lensless compressive imaging architecture, which consists of an aperture assembly and a single sensor, without using any lens. An anytime algorithm is proposed to reconstruct images from the compressive measurements; the algorithm produces a sequence of solutions that monotonically converge to the true signal (thus, anytime). The algorithm is developed based on the sparsity of local overlapping patches (in the transformation domain) and state-of-the-art results have been obtained. Experiments on real data demonstrate that encouraging results are obtained by measuring about 10% (of the image pixels) compressive measurements. The reconstruction results of the proposed algorithm are compared with the JPEG compression (based on file sizes) and the reconstructed image quality is close to the JPEG compression, in particular at a high compression rate.

NAAug 30, 2015
Constrained and Preconditioned Stochastic Gradient Method

Hong Jiang, Gang Huang, Paul Wilford et al.

We consider stochastic approximations which arise from such applications as data communications and image processing. We demonstrate why constraints are needed in a stochastic approximation and how a constrained approximation can be incorporated into a preconditioning technique to derive the pre-conditioned stochastic gradient method (PSGM). We perform convergence analysis to show that the PSGM converges to the theoretical best approximation under some simple assumptions on the preconditioner and on the independence of samples drawn from a stochastic process. Simulation results are presented to demonstrate the effectiveness of the constrained and precondi-tioned stochastic gradient method.

CVFeb 12, 2014
Noise Analysis for Lensless Compressive Imaging

Hong Jiang, Gang Huang, Paul Wilford

We analyze the signal to noise ratio (SNR) in a recently proposed lensless compressive imaging architecture. The architecture consists of a sensor of a single detector element and an aperture assembly of an array of aperture elements, each of which has a programmable transmittance. This lensless compressive imaging architecture can be used in conjunction with compressive sensing to capture images in a compressed form of compressive measurements. In this paper, we perform noise analysis of this lensless compressive imaging architecture and compare it with pinhole aperture imaging and lens aperture imaging. We will show that the SNR in the lensless compressive imaging is independent of the image resolution, while that in either pinhole aperture imaging or lens aperture imaging decreases as the image resolution increases. Consequently, the SNR in the lensless compressive imaging can be much higher if the image resolution is large enough.

CVFeb 4, 2014
Signal to Noise Ratio in Lensless Compressive Imaging

Hong Jiang, Gang Huang, Paul Wilford

We analyze the signal to noise ratio (SNR) in a lensless compressive imaging (LCI) architecture. The architecture consists of a sensor of a single detecting element and an aperture assembly of an array of programmable elements. LCI can be used in conjunction with compressive sensing to capture images in a compressed form of compressive measurements. In this paper, we perform SNR analysis of the LCI and compare it with imaging with a pinhole or a lens. We will show that the SNR in the LCI is independent of the image resolution, while the SNR in either pinhole aperture imaging or lens aperture imaging decreases as the image resolution increases. Consequently, the SNR in the LCI is much higher if the image resolution is large enough.

ITJun 17, 2013
Multi-view in Lensless Compressive Imaging

Hong Jiang, Gang Huang, Paul Wilford

Multi-view images are acquired by a lensless compressive imaging architecture, which consists of an aperture assembly and multiple sensors. The aperture assembly consists of a two dimensional array of aperture elements whose transmittance can be individually controlled to implement a compressive sensing matrix. For each transmittance pattern of the aperture assembly, each of the sensors takes a measurement. The measurement vectors from the multiple sensors represent multi-view images of the same scene. We present theoretical framework for multi-view reconstruction and experimental results for enhancing quality of image using multi-view.

CVMay 30, 2013
Lensless Imaging by Compressive Sensing

Gang Huang, Hong Jiang, Kim Matthews et al.

In this paper, we propose a lensless compressive imaging architecture. The architecture consists of two components, an aperture assembly and a sensor. No lens is used. The aperture assembly consists of a two dimensional array of aperture elements. The transmittance of each aperture element is independently controllable. The sensor is a single detection element. A compressive sensing matrix is implemented by adjusting the transmittance of the individual aperture elements according to the values of the sensing matrix. The proposed architecture is simple and reliable because no lens is used. The architecture can be used for capturing images of visible and other spectra such as infrared, or millimeter waves, in surveillance applications for detecting anomalies or extracting features such as speed of moving objects. Multiple sensors may be used with a single aperture assembly to capture multi-view images simultaneously. A prototype was built by using a LCD panel and a photoelectric sensor for capturing images of visible spectrum.

SDFeb 28, 2013
Sound localization using compressive sensing

Hong Jiang, Boyd Mathews, Paul Wilford

In a sensor network with remote sensor devices, it is important to have a method that can accurately localize a sound event with a small amount of data transmitted from the sensors. In this paper, we propose a novel method for localization of a sound source using compressive sensing. Instead of sampling a large amount of data at the Nyquist sampling rate in time domain, the acoustic sensors take compressive measurements integrated in time. The compressive measurements can be used to accurately compute the location of a sound source.

MMFeb 8, 2013
A new compressive video sensing framework for mobile broadcast

Chengbo Li, Hong Jiang, Paul Wilford et al.

A new video coding method based on compressive sampling is proposed. In this method, a video is coded using compressive measurements on video cubes. Video reconstruction is performed by minimization of total variation (TV) of the pixelwise DCT coefficients along the temporal direction. A new reconstruction algorithm is developed from TVAL3, an efficient TV minimization algorithm based on the alternating minimization and augmented Lagrangian methods. Video coding with this method is inherently scalable, and has applications in mobile broadcast.

CVFeb 8, 2013
Surveillance Video Processing Using Compressive Sensing

Hong Jiang, Wei Deng, Zuowei Shen

A compressive sensing method combined with decomposition of a matrix formed with image frames of a surveillance video into low rank and sparse matrices is proposed to segment the background and extract moving objects in a surveillance video. The video is acquired by compressive measurements, and the measurements are used to reconstruct the video by a low rank and sparse decomposition of matrix. The low rank component represents the background, and the sparse component is used to identify moving objects in the surveillance video. The decomposition is performed by an augmented Lagrangian alternating direction method. Experiments are carried out to demonstrate that moving objects can be reliably extracted with a small amount of measurements.

CVFeb 7, 2013
Lensless Compressive Sensing Imaging

Gang Huang, Hong Jiang, Kim Matthews et al.

In this paper, we propose a lensless compressive sensing imaging architecture. The architecture consists of two components, an aperture assembly and a sensor. No lens is used. The aperture assembly consists of a two dimensional array of aperture elements. The transmittance of each aperture element is independently controllable. The sensor is a single detection element, such as a single photo-conductive cell. Each aperture element together with the sensor defines a cone of a bundle of rays, and the cones of the aperture assembly define the pixels of an image. Each pixel value of an image is the integration of the bundle of rays in a cone. The sensor is used for taking compressive measurements. Each measurement is the integration of rays in the cones modulated by the transmittance of the aperture elements. A compressive sensing matrix is implemented by adjusting the transmittance of the individual aperture elements according to the values of the sensing matrix. The proposed architecture is simple and reliable because no lens is used. Furthermore, the sharpness of an image from our device is only limited by the resolution of the aperture assembly, but not affected by blurring due to defocus. The architecture can be used for capturing images of visible lights, and other spectra such as infrared, or millimeter waves. Such devices may be used in surveillance applications for detecting anomalies or extracting features such as speed of moving objects. Multiple sensors may be used with a single aperture assembly to capture multi-view images simultaneously. A prototype was built by using a LCD panel and a photoelectric sensor for capturing images of visible spectrum.

ITFeb 6, 2013
Adaptive low rank and sparse decomposition of video using compressive sensing

Fei Yang, Hong Jiang, Zuowei Shen et al.

We address the problem of reconstructing and analyzing surveillance videos using compressive sensing. We develop a new method that performs video reconstruction by low rank and sparse decomposition adaptively. Background subtraction becomes part of the reconstruction. In our method, a background model is used in which the background is learned adaptively as the compressive measurements are processed. The adaptive method has low latency, and is more robust than previous methods. We will present experimental results to demonstrate the advantages of the proposed method.