Hao Jin

h-index13

12papers

242citations

Novelty49%

AI Score40

Ranked #97,413 of 205,806 authors (top 47%)#32,064 in CV (top 54%)

12 Papers

LGApr 6, 2022

Federated Reinforcement Learning with Environment Heterogeneity

Hao Jin, Yang Peng, Wenhao Yang et al.

We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. We stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state transitions. To obtain a value function or a policy function which optimizes the overall performance in all environments, we propose two federated RL algorithms, \texttt{QAvg} and \texttt{PAvg}. We theoretically prove that these algorithms converge to suboptimal solutions, while such suboptimality depends on how heterogeneous these $n$ environments are. Moreover, we propose a heuristic that achieves personalization by embedding the $n$ environments into $n$ vectors. The personalization heuristic not only improves the training but also allows for better generalization to new environments.

25.6CVMar 15

DualTSR: Unified Dual-Diffusion Transformer for Scene Text Image Super-Resolution

Axi Niu, Kang Zhang, Qingsen Yan et al.

Scene Text Image Super-Resolution (STISR) aims to restore high-resolution details in low-resolution text images, which is crucial for both human readability and machine recognition. Existing methods, however, often depend on external Optical Character Recognition (OCR) models for textual priors or rely on complex multi-component architectures that are difficult to train and reproduce. In this paper, we introduce DualTSR, a unified end-to-end framework that addresses both issues. DualTSR employs a single multimodal transformer backbone trained with a dual diffusion objective. It simultaneously models the continuous distribution of high-resolution images via Conditional Flow Matching and the discrete distribution of textual content via discrete diffusion. This shared design enables visual and textual information to interact at every layer, allowing the model to infer text priors internally instead of relying on an external OCR module. Compared with prior multi-branch diffusion systems, DualTSR offers a simpler end-to-end formulation with fewer hand-crafted components. Experiments on synthetic Chinese benchmarks and a curated real-world evaluation protocol show that DualTSR achieves strong perceptual quality and text fidelity.

CVNov 3, 2024

RS-MoE: A Vision-Language Model with Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering

Hui Lin, Danfeng Hong, Shuhang Ge et al.

Remote Sensing Image Captioning (RSIC) presents unique challenges and plays a critical role in applications. Traditional RSIC methods often struggle to produce rich and diverse descriptions. Recently, with advancements in VLMs, efforts have emerged to integrate these models into the remote sensing domain and to introduce descriptive datasets specifically designed to enhance VLM training. This paper proposes RS-MoE, a first Mixture of Expert based VLM specifically customized for remote sensing domain. Unlike traditional MoE models, the core of RS-MoE is the MoE Block, which incorporates a novel Instruction Router and multiple lightweight Large Language Models (LLMs) as expert models. The Instruction Router is designed to generate specific prompts tailored for each corresponding LLM, guiding them to focus on distinct aspects of the RSIC task. This design not only allows each expert LLM to concentrate on a specific subset of the task, thereby enhancing the specificity and accuracy of the generated captions, but also improves the scalability of the model by facilitating parallel processing of sub-tasks. Additionally, we present a two-stage training strategy for tuning our RS-MoE model to prevent performance degradation due to sparsity. We fine-tuned our model on the RSICap dataset using our proposed training strategy. Experimental results on the RSICap dataset, along with evaluations on other traditional datasets where no additional fine-tuning was applied, demonstrate that our model achieves state-of-the-art performance in generating precise and contextually relevant captions. Notably, our RS-MoE-1B variant achieves performance comparable to 13B VLMs, demonstrating the efficiency of our model design. Moreover, our model demonstrates promising generalization capabilities by consistently achieving state-of-the-art performance on the Remote Sensing Visual Question Answering (RSVQA) task.

CVDec 1, 2024

Sketch-Guided Motion Diffusion for Stylized Cinemagraph Synthesis

Hao Jin, Hengyuan Chang, Xiaoxuan Xie et al.

Designing stylized cinemagraphs is challenging due to the difficulty in customizing complex and expressive flow motions. To achieve intuitive and detailed control of the generated cinemagraphs, freehand sketches can provide a better solution to convey personalized design requirements than only text inputs. In this paper, we propose Sketch2Cinemagraph, a sketch-guided framework that enables the conditional generation of stylized cinemagraphs from freehand sketches. Sketch2Cinemagraph adopts text prompts for initial content generation and provides hand-drawn sketch controls for both spatial and motion cues. The latent diffusion model is adopted to generate target stylized landscape images along with realistic versions. Then, a pre-trained object detection model is utilized to segment and obtain masks for the flow regions. We proposed a novel latent motion diffusion model to estimate the motion field in the fluid regions of the generated landscape images. The input motion sketches serve as the conditions to control the generated vector fields in the masked fluid regions with the prompt. To synthesize the cinemagraph frames, the pixels within fluid regions are subsequently warped to the target locations for each timestep using a frame generator. The results verified that Sketch2Cinemagraph can generate high-fidelity and aesthetically appealing stylized cinemagraphs with continuous temporal flow from intuitive sketch inputs. We showcase the advantages of Sketch2Cinemagraph through quantitative comparisons against the state-of-the-art generation approaches.

MLMay 7, 2024

Federated Control in Markov Decision Processes

Hao Jin, Yang Peng, Liangyu Zhang et al.

We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the training process. In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training. In terms of theoretical analysis, we justify the correctness of FedQ as a communication protocol, then give a general result on sample complexity of derived algorithms FedQ-X with the RL oracle , and finally conduct a thorough study on the sample complexity of FedQ-SynQ. Specifically, FedQ-X has been shown to enjoy linear speedup in terms of sample complexity when workload is uniformly distributed among agents. Moreover, we carry out experiments in various environments to justify the efficiency of our methods.

LGMay 6, 2024

Federated Reinforcement Learning with Constraint Heterogeneity

Hao Jin, Liangyu Zhang, Zhihua Zhang

We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity. In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals. Such learning problems are prevalent in scenarios of Large Language Model (LLM) fine-tuning and healthcare applications. To solve the problem, we propose federated primal-dual policy optimization methods based on traditional policy gradient methods. Specifically, we introduce $N$ local Lagrange functions for agents to perform local policy updates, and these agents are then scheduled to periodically communicate on their local policies. Taking natural policy gradient (NPG) and proximal policy optimization (PPO) as policy optimization methods, we mainly focus on two instances of our algorithms, ie, {FedNPG} and {FedPPO}. We show that FedNPG achieves global convergence with an $\tilde{O}(1/\sqrt{T})$ rate, and FedPPO efficiently solves complicated learning tasks with the use of deep neural networks.

SEOct 13, 2021

Constrained Detecting Arrays: Mathematical Structures for Fault Identification in Combinatorial Interaction Testing

Hao Jin, Ce Shi, Tatsuhiro Tsuchiya

Context: Detecting arrays are mathematical structures aimed at fault identification in combinatorial interaction testing. However, they cannot be directly applied to systems that have constraints among test parameters. Such constraints are prevalent in real-world systems. Objectives: This paper proposes Constrained Detecting Arrays (CDAs), an extension of detecting arrays, which can be used for systems with constraints. Methods: The paper examines the properties and capabilities of CDAs with rigorous arguments. The paper also proposes two algorithms for constructing CDAs: One is aimed at generating minimum CDAs and the other is a heuristic algorithm aimed at fast generation of CDAs. The algorithms are evaluated through experiments using a benchmark dataset. Results: Experimental results show that the first algorithm can generate minimum CDAs if a sufficiently long generation time is allowed, and the second algorithm can generate minimum or near-minimum CDAs in a reasonable time. Conclusion: CDAs enhance detecting arrays to be applied to systems with constraints. The two proposed algorithms have different advantages with respect to the array size and generation time

LGApr 12, 2021

Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent

Guangzeng Xie, Hao Jin, Dachao Lin et al.

We propose \textit{Meta-Regularization}, a novel approach for the adaptive choice of the learning rate in first-order gradient descent methods. Our approach modifies the objective function by adding a regularization term on the learning rate, and casts the joint updating process of parameters and learning rates into a maxmin problem. Given any regularization term, our approach facilitates the generation of practical algorithms. When \textit{Meta-Regularization} takes the $\varphi$-divergence as a regularizer, the resulting algorithms exhibit comparable theoretical convergence performance with other first-order gradient-based algorithms. Furthermore, we theoretically prove that some well-designed regularizers can improve the convergence performance under the strong-convexity condition of the objective function. Numerical experiments on benchmark problems demonstrate the effectiveness of algorithms derived from some common $\varphi$-divergence in full batch as well as online learning settings.

CLSep 15, 2019

Natural Language Adversarial Defense through Synonym Encoding

Xiaosen Wang, Hao Jin, Yichen Yang et al.

In the area of natural language processing, deep learning models are recently known to be vulnerable to various types of adversarial perturbations, but relatively few works are done on the defense side. Especially, there exists few effective defense method against the successful synonym substitution based attacks that preserve the syntactic structure and semantic information of the original text while fooling the deep learning models. We contribute in this direction and propose a novel adversarial defense method called Synonym Encoding Method (SEM). Specifically, SEM inserts an encoder before the input layer of the target model to map each cluster of synonyms to a unique encoding and trains the model to eliminate possible adversarial perturbations without modifying the network architecture or adding extra data. Extensive experiments demonstrate that SEM can effectively defend the current synonym substitution based attacks and block the transferability of adversarial examples. SEM is also easy and efficient to scale to large models and big datasets.

MLAug 18, 2019

Towards Better Generalization: BP-SVRG in Training Deep Neural Networks

Hao Jin, Dachao Lin, Zhihua Zhang

Stochastic variance-reduced gradient (SVRG) is a classical optimization method. Although it is theoretically proved to have better convergence performance than stochastic gradient descent (SGD), the generalization performance of SVRG remains open. In this paper we investigate the effects of some training techniques, mini-batching and learning rate decay, on the generalization performance of SVRG, and verify the generalization performance of Batch-SVRG (B-SVRG). In terms of the relationship between optimization and generalization, we believe that the average norm of gradients on each training sample as well as the norm of average gradient indicate how flat the landscape is and how well the model generalizes. Based on empirical observations of such metrics, we perform a sign switch on B-SVRG and derive a practical algorithm, BatchPlus-SVRG (BP-SVRG), which is numerically shown to enjoy better generalization performance than B-SVRG, even SGD in some scenarios of deep neural networks.

CRJan 4, 2019

Practical Verifiable In-network Filtering for DDoS defense

Deli Gong, Muoi Tran, Shweta Shinde et al.

In light of ever-increasing scale and sophistication of modern DDoS attacks, it is time to revisit in-network filtering or the idea of empowering DDoS victims to install in-network traffic filters in the upstream transit networks. Recent proposals show that filtering DDoS traffic at a handful of large transit networks can handle volumetric DDoS attacks effectively. However, the innetwork filtering primitive can also be misused. Transit networks can use the in-network filtering service as an excuse for any arbitrary packet drops made for their own benefit. For example, transit networks may intentionally execute filtering services poorly or unfairly to discriminate their competing neighbor ASes while claiming that they drop packets for the sake of DDoS defense. We argue that it is due to the lack of verifiable filtering - i.e., no one can check if a transit network executes the filter rules correctly as requested by the DDoS victims. To make in-network filtering a more robust defense primitive, in this paper, we propose a verifiable in-network filtering, called VIF, that exploits emerging hardware-based trusted execution environments (TEEs) and offers filtering verifiability to DDoS victims and neighbor ASes. Our proof of concept demonstrates that a VIF filter implementation on commodity servers with TEE support can handle traffic at line rate (e.g., 10 Gb/s) and execute up to 3,000 filter rules. We show that VIF can easily scale to handle larger traffic volume (e.g., 500 Gb/s) and more complex filtering operations (e.g., 150,000 filter rules) by parallelizing the TEE-based filters. As a practical deployment model, we suggest that Internet exchange points (IXPs) are the ideal candidates for the early adopters of our verifiable filters due to their central locations and flexible software-defined architecture.

SEDec 6, 2017

Constrained locating arrays for combinatorial interaction testing

Hao Jin, Tatsuhiro Tsuchiya

This paper introduces the notion of Constrained Locating Arrays (CLAs), mathematical objects which can be used for fault localization in software testing. CLAs extend ordinary locating arrays to make them applicable to testing of systems that have constraints on test parameters. Such constraints are common in real-world systems; thus CLA enhances the applicability of locating arrays to practical testing problems. The paper also proposes an algorithm for constructing CLAs. Experimental results show that the proposed algorithm scales to problems of practical sizes.