NAMar 6, 2011
Strong Predictor-Corrector Euler-Maruyama Methods for Stochastic Differential Equations with Markovian SwitchingJun Ye, Haibo Li, Lili Xiao
In this paper numerical methods for solving stochastic differential equations with Markovian switching (SDEwMSs) are developed by pathwise approximation. The proposed family of strong predictor-corrector Euler-Maruyama methods is designed to overcome the propagation of errors during the simulation of an approximate path. This paper not only shows the strong convergence of the numerical solution to the exact solution but also reveals the order of the error under some conditions on the coefficient functions. A natural analogue of $p$-stability criterion is studied. Numerical examples are given to illustrate the computational efficiency of the new predictor-corrector Euler-Maruyama approximation.
CVNov 6, 2025
DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identificationYujie Yang, Shuang Li, Jun Ye et al.
Video-based Visible-Infrared person re-identification (VVI-ReID) aims to retrieve the same pedestrian across visible and infrared modalities from video sequences. Existing methods tend to exploit modality-invariant visual features but largely overlook gait features, which are not only modality-invariant but also rich in temporal dynamics, thus limiting their ability to model the spatiotemporal consistency essential for cross-modal video matching. To address these challenges, we propose a DINOv2-Driven Gait Representation Learning (DinoGRL) framework that leverages the rich visual priors of DINOv2 to learn gait features complementary to appearance cues, facilitating robust sequence-level representations for cross-modal retrieval. Specifically, we introduce a Semantic-Aware Silhouette and Gait Learning (SASGL) model, which generates and enhances silhouette representations with general-purpose semantic priors from DINOv2 and jointly optimizes them with the ReID objective to achieve semantically enriched and task-adaptive gait feature learning. Furthermore, we develop a Progressive Bidirectional Multi-Granularity Enhancement (PBMGE) module, which progressively refines feature representations by enabling bidirectional interactions between gait and appearance streams across multiple spatial granularities, fully leveraging their complementarity to enhance global representations with rich local details and produce highly discriminative features. Extensive experiments on HITSZ-VCM and BUPT datasets demonstrate the superiority of our approach, significantly outperforming existing state-of-the-art methods.
ROMay 9
ECHO: Continuous Hierarchical Memory for Vision-Language-Action ModelsYanbin Hu, Jin Cui, Jiayi Lu et al.
Memory capacity is a critical factor determining the performance of Vision-Language-Action (VLA) models in long-horizon manipulation tasks. Existing memory-augmented architectures primarily rely on linear or flat storage, lacking structural priors for manipulation categories and hierarchical organization. This deficiency hinders efficient experience retrieval and limits generalization to unseen long-horizon task compositions. Inspired by the hierarchical organization of human experience, we propose ECHO (Experience Consolidation and Hierarchical Organization), a novel memory framework operating within a Continuous Hierarchical Space. By employing a hyperbolic autoencoder, ECHO maps VLA hidden states into this space. Leveraging hyperbolic metrics and entailment constraint mechanisms, experience vectors are organized into a semantic memory tree that supports efficient top-down retrieval. In parallel, a background consolidation mechanism continuously refines the memory tree through geometric interpolation and structural splitting, supporting virtual memory synthesis in the continuous space. We integrate ECHO into the $π_0$ foundation model. Evaluations on LIBERO and preliminary real-world experiments demonstrate the effectiveness of our approach, notably achieving a 12.8% absolute improvement in execution success rate over the $π_0$ baseline on LIBERO-Long, while improving compositional generalization on cross-suite unseen long-horizon tasks.
LGOct 31, 2017Code
Accelerate RNN-based Training with Importance SamplingFei Wang, Xiaofeng Gao, Guihai Chen et al.
Importance sampling (IS) as an elegant and efficient variance reduction (VR) technique for the acceleration of stochastic optimization problems has attracted many researches recently. Unlike commonly adopted stochastic uniform sampling in stochastic optimizations, IS-integrated algorithms sample training data at each iteration with respect to a weighted sampling probability distribution $P$, which is constructed according to the precomputed importance factors. Previous experimental results show that IS has achieved remarkable progresses in the acceleration of training convergence. Unfortunately, the calculation of the sampling probability distribution $P$ causes a major limitation of IS: it requires the input data to be well-structured, i.e., the feature vector is properly defined. Consequently, recurrent neural networks (RNN) as a popular learning algorithm is not able to enjoy the benefits of IS due to the fact that its raw input data, i.e., the training sequences, are often unstructured which makes calculation of $P$ impossible. In considering of the the popularity of RNN-based learning applications and their relative long training time, we are interested in accelerating them through IS. This paper propose a novel Fast-Importance-Mining algorithm to calculate the importance factor for unstructured data which makes the application of IS in RNN-based applications possible. Our experimental evaluation on popular open-source RNN-based learning applications validate the effectiveness of IS in improving the convergence rate of RNNs.
NAMar 10, 2011
Numerical Solutions of Jump Diffusions with Markovian SwitchingJun Ye, Kai Li
In this paper we consider the numerical solutions for a class of jump diffusions with Markovian switching. After briefly reviewing necessary notions, a new jump-adapted efficient algorithm based on the Euler scheme is constructed for approximating the exact solution. Under some general conditions, it is proved that the numerical solution through such scheme converge to the exact solution. Moreover, the order of the error between the numerical solution and the exact solution is also derived. Numerical experiments are carried out to show the computational efficiency of the approximation.
CVDec 1, 2024
Enhancing Skin Lesion Classification Generalization with Active Domain AdaptationJun Ye
We propose a method to improve the generalization of skin lesion classification models by combining self-supervised learning (SSL) and active domain adaptation (ADA). The main steps of the approach include selection of an SSL pre-trained model on natural image datasets, subsequent SSL retraining on all available skin-lesion datasets, fine-tuning of the model on source domain data with labels, and application of ADA methods on target domain data. The efficacy of the proposed approach is assessed in ten skin lesion datasets with five different ADA methods, demonstrating its potential to improve generalization in settings with different amounts of domain shifts.
MED-PHFeb 4, 2022
Breath analysis by ultra-sensitive broadband laser spectroscopy detects SARS-CoV-2 infectionQizhong Liang, Ya-Chu Chan, Jutta Toscano et al.
Rapid testing is essential to fighting pandemics such as COVID-19, the disease caused by the SARS-CoV-2 virus. Exhaled human breath contains multiple volatile molecules providing powerful potential for non-invasive diagnosis of diverse medical conditions. We investigated breath detection of SARS-CoV-2 infection using cavity-enhanced direct frequency comb spectroscopy (CE-DFCS), a state-of-the-art laser spectroscopic technique capable of a real-time massive collection of broadband molecular absorption features at ro-vibrational quantum state resolution and at parts-per-trillion volume detection sensitivity. Using a total of 170 individual breath samples (83 positive and 87 negative with SARS-CoV-2 based on Reverse Transcription Polymerase Chain Reaction tests), we report excellent discrimination capability for SARS-CoV-2 infection with an area under the Receiver-Operating-Characteristics curve of 0.849(4). Our results support the development of CE-DFCS as an alternative, rapid, non-invasive test for COVID-19 and highlight its remarkable potential for optical diagnoses of diverse biological conditions and disease states.
SOFTNov 21, 2018
Machine learning enables polymer cloud-point engineering via inverse designJatin N. Kumar, Qianxiao Li, Karen Y. T. Tang et al.
Inverse design is an outstanding challenge in disordered systems with multiple length scales such as polymers, particularly when designing polymers with desired phase behavior. We demonstrate high-accuracy tuning of poly(2-oxazoline) cloud point via machine learning. With a design space of four repeating units and a range of molecular masses, we achieve an accuracy of 4 °C root mean squared error (RMSE) in a temperature range of 24-90 °C, employing gradient boosting with decision trees. The RMSE is >3x better than linear and polynomial regression. We perform inverse design via particle-swarm optimization, predicting and synthesizing 17 polymers with constrained design at 4 target cloud points from 37 to 80 °C. Our approach challenges the status quo in polymer design with a machine learning algorithm, that is capable of fast and systematic discovery of new polymers.
CVJun 6, 2015
First-Take-All: Temporal Order-Preserving Hashing for 3D Action VideosJun Ye, Hao Hu, Kai Li et al.
With the prevalence of the commodity depth cameras, the new paradigm of user interfaces based on 3D motion capturing and recognition have dramatically changed the way of interactions between human and computers. Human action recognition, as one of the key components in these devices, plays an important role to guarantee the quality of user experience. Although the model-driven methods have achieved huge success, they cannot provide a scalable solution for efficiently storing, retrieving and recognizing actions in the large-scale applications. These models are also vulnerable to the temporal translation and warping, as well as the variations in motion scales and execution rates. To address these challenges, we propose to treat the 3D human action recognition as a video-level hashing problem and propose a novel First-Take-All (FTA) Hashing algorithm capable of hashing the entire video into hash codes of fixed length. We demonstrate that this FTA algorithm produces a compact representation of the video invariant to the above mentioned variations, through which action recognition can be solved by an efficient nearest neighbor search by the Hamming distance between the FTA hash codes. Experiments on the public 3D human action datasets shows that the FTA algorithm can reach a recognition accuracy higher than 80%, with about 15 bits per frame considering there are 65 frames per video over the datasets.
LGMar 19, 2015
Rank Subspace Learning for Compact Hash CodesKai Li, Guojun Qi, Jun Ye et al.
The era of Big Data has spawned unprecedented interests in developing hashing algorithms for efficient storage and fast nearest neighbor search. Most existing work learn hash functions that are numeric quantizations of feature values in projected feature space. In this work, we propose a novel hash learning framework that encodes feature's rank orders instead of numeric values in a number of optimal low-dimensional ranking subspaces. We formulate the ranking subspace learning problem as the optimization of a piece-wise linear convex-concave function and present two versions of our algorithm: one with independent optimization of each hash bit and the other exploiting a sequential learning framework. Our work is a generalization of the Winner-Take-All (WTA) hash family and naturally enjoys all the numeric stability benefits of rank correlation measures while being optimized to achieve high precision at very short code length. We compare with several state-of-the-art hashing algorithms in both supervised and unsupervised domain, showing superior performance in a number of data sets.