SYJun 6, 2018
PID2018 Benchmark Challenge: Model Predictive Control With Conditional Integral Control Using A General Purpose Optimal Control Problem Solver - RIOTSSina Dehghan, Tiebiao Zhao, Yang Zhao et al.
This paper presents a multi-variable Model Predictive Control (MPC) based controller for the one-staged refrigeration cycle model described in the PID2018 Benchmark Challenge. This model represents a two-input, two-output system with strong nonlinearities and high coupling between its variables. A general purpose optimal control problem (OCP) solver Matlab toolbox called RIOTS is used as the OCP solver for the proposed MPC scheme which allows for straightforward implementation of the method and for solving a wide range of constrained linear and nonlinear optimal control problems. A conditional integral (CI) compensator is embedded in the controller to compensate for the small steady state errors. This method shows significant improvements in performance compared to both discrete decentralized control (C1) and multi-variable PID controller (C2) originally given in PID2018 Benchmark Challenge as a baseline. Our solution is introduced in detail in this paper and our final results using the overall relative index, $J$, are 0.2 over C1 and 0.3 over C2, respectively. In other words, we achieved 80% improvement over C1 and 70% improvement over C2. We expect to achieve further improvements when some optimized searching efforts are used for MPC and CI parameter tuning.
ROAug 30, 2024Code
Efficient Camera Exposure Control for Visual Odometry via Deep Reinforcement LearningShuyang Zhang, Jinhao He, Yilong Zhu et al.
The stability of visual odometry (VO) systems is undermined by degraded image quality, especially in environments with significant illumination changes. This study employs a deep reinforcement learning (DRL) framework to train agents for exposure control, aiming to enhance imaging performance in challenging conditions. A lightweight image simulator is developed to facilitate the training process, enabling the diversification of image exposure and sequence trajectory. This setup enables completely offline training, eliminating the need for direct interaction with camera hardware and the real environments. Different levels of reward functions are crafted to enhance the VO systems, equipping the DRL agents with varying intelligence. Extensive experiments have shown that our exposure control agents achieve superior efficiency-with an average inference duration of 1.58 ms per frame on a CPU-and respond more quickly than traditional feedback control schemes. By choosing an appropriate reward function, agents acquire an intelligent understanding of motion trends and anticipate future illumination changes. This predictive capability allows VO systems to deliver more stable and precise odometry results. The codes and datasets are available at https://github.com/ShuyangUni/drl_exposure_ctrl.
SYMay 31, 2018
PID2018 Benchmark Challenge: Model-based Feedforward Compensator with A Conditional IntegratorJie Yuan, Abdullah Ates, Sina Dehghan et al.
Since proportional-integral-derivative (PID) controllers absolutely dominate the control engineering, numbers of different control structures and theories have been developed to enhance the efficiency of PID controllers. Thus, it is essential and inspiring to operate different PID control strategies to the PID2018 Benchmark Challenge. In this paper, a novel control strategy is designed for this refrigeration system, where a feedforward compensator and a conditional integrator are utilized to compensate the disturbances and remove the steady-state error in the benchmark problem, respectively. The simulation results given in the benchmark problem show the straightforward effectiveness of the proposed control structure compared with the existing control methods.
SYJun 4, 2018
PID2018 Benchmark Challenge:Multi-Objective Stochastic Optimization AlgorithmAbdullah Ates, Jie Yuan, Sina Dehghan et al.
This paper presents a multi-objective stochastic optimization method for tuning of the controller parameters of Refrigeration Systems based on Vapour Compression. Stochastic Multi Parameter Divergence Optimization (SMDO) algorithm is modified for minimization of the Multi Objective function for optimization process. System control performance is improved by tuning of the PI controller parameters according to discrete time model of the refrigeration system with multi objective function by adding conditional integral structure that is preferred to reduce the steady state error of the system. Simulations are compared with existing results via many graphical and numerical solutions.
SYMay 30, 2018
PID2018 Benchmark Challenge: learning feedforward controlYang Zhao, Sina Dehghan, Abdullah Ates et al.
The design and application of learning feedforward controllers (LFFC) for the one-staged refrigeration cycle model described in the PID2018 Benchmark Challenge is presented, and its effectiveness is evaluated. The control system consists of two components: 1) a preset PID component and 2) a learning feedforward component which is a function approximator that is adapted on the basis of the feedback signal. A B-spline network based LFFC and a low-pass filter based LFFC are designed to track the desired outlet temperature of evaporator secondary flux and the superheating degree of refrigerant at evaporator outlet. Encouraging simulation results are included. Qualitative and quantitative comparison results evaluations show that, with little effort, a high-performance control system can be obtained with this approach. Our initial simple attempt of low-pass filter based LFFC and B-spline network based LFFC give J=0.4902 and J=0.6536 relative to the decentralized PID controller, respectively. Besides, the initial attempt of a combination controller of our optimized PI controller and low-pass filter LFFC gives J=0.6947 relative to the multi-variable PID controller.
MADec 3, 2022
DACOM: Learning Delay-Aware Communication for Multi-Agent Reinforcement LearningTingting Yuan, Hwei-Ming Chung, Jie Yuan et al.
Communication is supposed to improve multi-agent collaboration and overall performance in cooperative Multi-agent reinforcement learning (MARL). However, such improvements are prevalently limited in practice since most existing communication schemes ignore communication overheads (e.g., communication delays). In this paper, we demonstrate that ignoring communication delays has detrimental effects on collaborations, especially in delay-sensitive tasks such as autonomous driving. To mitigate this impact, we design a delay-aware multi-agent communication model (DACOM) to adapt communication to delays. Specifically, DACOM introduces a component, TimeNet, that is responsible for adjusting the waiting time of an agent to receive messages from other agents such that the uncertainty associated with delay can be addressed. Our experiments reveal that DACOM has a non-negligible performance improvement over other mechanisms by making a better trade-off between the benefits of communication and the costs of waiting for messages.
CVNov 9, 2023
Self-similarity Prior Distillation for Unsupervised Remote Physiological MeasurementXinyu Zhang, Weiyu Sun, Hao Lu et al.
Remote photoplethysmography (rPPG) is a noninvasive technique that aims to capture subtle variations in facial pixels caused by changes in blood volume resulting from cardiac activities. Most existing unsupervised methods for rPPG tasks focus on the contrastive learning between samples while neglecting the inherent self-similar prior in physiological signals. In this paper, we propose a Self-Similarity Prior Distillation (SSPD) framework for unsupervised rPPG estimation, which capitalizes on the intrinsic self-similarity of cardiac activities. Specifically, we first introduce a physical-prior embedded augmentation technique to mitigate the effect of various types of noise. Then, we tailor a self-similarity-aware network to extract more reliable self-similar physiological features. Finally, we develop a hierarchical self-distillation paradigm to assist the network in disentangling self-similar physiological patterns from facial videos. Comprehensive experiments demonstrate that the unsupervised SSPD framework achieves comparable or even superior performance compared to the state-of-the-art supervised methods. Meanwhile, SSPD maintains the lowest inference time and computation cost among end-to-end models.
73.2LGApr 17
Corner Reflector Array Jamming Discrimination Using Multi-Dimensional Micro-Motion Features with Frequency Agile RadarJie Yuan, Lei Wang, Yanhao Wang et al.
This paper introduces a robust discrimination method for distinguishing real ship targets from corner-reflector-array jamming with frequency-agile radar. The key idea is to exploit the multidimensional micro-motion signatures that separate rigid ships from non-rigid decoys. From Range-Velocity maps we derive two new hand-crafted descriptors-mean weighted residual (MWR) and complementary contrast factor (CCF) and fuse them with deep features learned by a lightweight CNN. An XGBoost classifier then gives the final decision. Extensive simulations show that the hybrid feature set consistently outperforms state-of-the-art alternatives, confirming the superiority of the proposed approach.
CVApr 11, 2024Code
Resolve Domain Conflicts for Generalizable Remote Physiological MeasurementWeiyu Sun, Xinyu Zhang, Hao Lu et al.
Remote photoplethysmography (rPPG) technology has become increasingly popular due to its non-invasive monitoring of various physiological indicators, making it widely applicable in multimedia interaction, healthcare, and emotion analysis. Existing rPPG methods utilize multiple datasets for training to enhance the generalizability of models. However, they often overlook the underlying conflict issues across different datasets, such as (1) label conflict resulting from different phase delays between physiological signal labels and face videos at the instance level, and (2) attribute conflict stemming from distribution shifts caused by head movements, illumination changes, skin types, etc. To address this, we introduce the DOmain-HArmonious framework (DOHA). Specifically, we first propose a harmonious phase strategy to eliminate uncertain phase delays and preserve the temporal variation of physiological signals. Next, we design a harmonious hyperplane optimization that reduces irrelevant attribute shifts and encourages the model's optimization towards a global solution that fits more valid scenarios. Our experiments demonstrate that DOHA significantly improves the performance of existing methods under multiple protocols. Our code is available at https://github.com/SWY666/rPPG-DOHA.
CLJan 13
WISE-Flow: Workflow-Induced Structured Experience for Self-Evolving Conversational Service AgentsYuqing Zhou, Zhuoer Wang, Jie Yuan et al.
Large language model (LLM)-based agents are widely deployed in user-facing services but remain error-prone in new tasks, tend to repeat the same failure patterns, and show substantial run-to-run variability. Fixing failures via environment-specific training or manual patching is costly and hard to scale. To enable self-evolving agents in user-facing service environments, we propose WISE-Flow, a workflow-centric framework that converts historical service interactions into reusable procedural experience by inducing workflows with prerequisite-augmented action blocks. At deployment, WISE-Flow aligns the agent's execution trajectory to retrieved workflows and performs prerequisite-aware feasibility reasoning to achieve state-grounded next actions. Experiments on ToolSandbox and $τ^2$-bench show consistent improvement across base models.
CLMay 14, 2024
SpeechVerse: A Large-scale Generalizable Audio Language ModelNilaksh Das, Saket Dingliwal, Srikanth Ronanki et al. · amazon-science
Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore develop SpeechVerse, a robust multi-task training and curriculum learning framework that combines pre-trained speech and text foundation models via a small set of learnable parameters, while keeping the pre-trained models frozen during training. The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions. We perform extensive benchmarking that includes comparing our model performance against traditional baselines across several datasets and tasks. Furthermore, we evaluate the model's capability for generalized instruction following by testing on out-of-domain datasets, novel prompts, and unseen tasks. Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
CVMar 25, 2025
High-Quality Spatial Reconstruction and Orthoimage Generation Using Efficient 2D Gaussian SplattingQian Wang, Zhihao Zhan, Jialei He et al.
Highly accurate geometric precision and dense image features characterize True Digital Orthophoto Maps (TDOMs), which are in great demand for applications such as urban planning, infrastructure management, and environmental monitoring.Traditional TDOM generation methods need sophisticated processes, such as Digital Surface Models (DSM) and occlusion detection, which are computationally expensive and prone to errors.This work presents an alternative technique rooted in 2D Gaussian Splatting (2DGS), free of explicit DSM and occlusion detection. With depth map generation, spatial information for every pixel within the TDOM is retrieved and can reconstruct the scene with high precision. Divide-and-conquer strategy achieves excellent GS training and rendering with high-resolution TDOMs at a lower resource cost, which preserves higher quality of rendering on complex terrain and thin structure without a decrease in efficiency. Experimental results demonstrate the efficiency of large-scale scene reconstruction and high-precision terrain modeling. This approach provides accurate spatial data, which assists users in better planning and decision-making based on maps.
IVNov 25, 2024
Real-time volumetric free-hand ultrasound imaging for large-sized organs: A study of imaging the whole spineCaozhe Li, Enxiang Shen, Haoyang Wang et al.
Three-dimensional (3D) ultrasound imaging can overcome the limitations of conventional two dimensional (2D) ultrasound imaging in structural observation and measurement. However, conducting volumetric ultrasound imaging for large-sized organs still faces difficulties including long acquisition time, inevitable patient movement, and 3D feature recognition. In this study, we proposed a real-time volumetric free-hand ultrasound imaging system optimized for the above issues and applied it to the clinical diagnosis of scoliosis. This study employed an incremental imaging method coupled with algorithmic acceleration to enable real-time processing and visualization of the large amounts of data generated when scanning large-sized organs. Furthermore, to deal with the difficulty of image feature recognition, we proposed two tissue segmentation algorithms to reconstruct and visualize the spinal anatomy in 3D space by approximating the depth at which the bone structures are located and segmenting the ultrasound images at different depths. We validated the adaptability of our system by deploying it to multiple models of ultra-sound equipment and conducting experiments using different types of ultrasound probes. We also conducted experiments on 6 scoliosis patients and 10 normal volunteers to evaluate the performance of our proposed method. Ultrasound imaging of a volunteer spine from shoulder to crotch (more than 500 mm) was performed in 2 minutes, and the 3D imaging results displayed in real-time were compared with the corresponding X-ray images with a correlation coefficient of 0.96 in spinal curvature. Our proposed volumetric ultrasound imaging system might hold the potential to be clinically applied to other large-sized organs.
CVMar 3, 2025
A Multi-Sensor Fusion Approach for Rapid Orthoimage Generation in Large-Scale UAV MappingJialei He, Zhihao Zhan, Zhituo Tu et al.
Rapid generation of large-scale orthoimages from Unmanned Aerial Vehicles (UAVs) has been a long-standing focus of research in the field of aerial mapping. A multi-sensor UAV system, integrating the Global Positioning System (GPS), Inertial Measurement Unit (IMU), 4D millimeter-wave radar and camera, can provide an effective solution to this problem. In this paper, we utilize multi-sensor data to overcome the limitations of conventional orthoimage generation methods in terms of temporal performance, system robustness, and geographic reference accuracy. A prior-pose-optimized feature matching method is introduced to enhance matching speed and accuracy, reducing the number of required features and providing precise references for the Structure from Motion (SfM) process. The proposed method exhibits robustness in low-texture scenes like farmlands, where feature matching is difficult. Experiments show that our approach achieves accurate feature matching orthoimage generation in a short time. The proposed drone system effectively aids in farmland detection and management.
LGFeb 5, 2021
Machine Learning Applications on Neuroimaging for Diagnosis and Prognosis of Epilepsy: A ReviewJie Yuan, Xuming Ran, Keyin Liu et al.
Machine learning is playing an increasingly important role in medical image analysis, spawning new advances in the clinical application of neuroimaging. There have been some reviews on machine learning and epilepsy before, and they mainly focused on electrophysiological signals such as electroencephalography (EEG) and stereo electroencephalography (SEEG), while neglecting the potential of neuroimaging in epilepsy research. Neuroimaging has its important advantages in confirming the range of the epileptic region, which is essential in presurgical evaluation and assessment after surgery. However, it is difficult for EEG to locate the accurate epilepsy lesion region in the brain. In this review, we emphasize the interaction between neuroimaging and machine learning in the context of epilepsy diagnosis and prognosis. We start with an overview of epilepsy and typical neuroimaging modalities used in epilepsy clinics, MRI, DWI, fMRI, and PET. Then, we elaborate two approaches in applying machine learning methods to neuroimaging data: i) the conventional machine learning approach combining manual feature engineering and classifiers, ii) the deep learning approach, such as the convolutional neural networks and autoencoders. Subsequently, the application of machine learning on epilepsy neuroimaging, such as segmentation, localization, and lateralization tasks, as well as tasks directly related to diagnosis and prognosis are looked into in detail. Finally, we discuss the current achievements, challenges, and potential future directions in this field, hoping to pave the way for computer-aided diagnosis and prognosis of epilepsy.
CLOct 21, 2020
Probing and Fine-tuning Reading Comprehension Models for Few-shot Event ExtractionRui Feng, Jie Yuan, Chao Zhang
We study the problem of event extraction from text data, which requires both detecting target event types and their arguments. Typically, both the event detection and argument detection subtasks are formulated as supervised sequence labeling problems. We argue that the event extraction models so trained are inherently label-hungry, and can generalize poorly across domains and text genres.We propose a reading comprehension framework for event extraction.Specifically, we formulate event detection as a textual entailment prediction problem, and argument detection as a question answer-ing problem. By constructing proper query templates, our approach can effectively distill rich knowledge about tasks and label semantics from pretrained reading comprehension models. Moreover, our model can be fine-tuned with a small amount of data to boost its performance. Our experiment results show that our method performs strongly for zero-shot and few-shot event extraction, and it achieves state-of-the-art performance on the ACE 2005 benchmark when trained with full supervision.
SYApr 16, 2019
Fractional order [PI] Controller and Smith-like Predictor Design for A Class of High Order SystemsZhenlong Wu, Jie Yuan, Yuquan Chen et al.
To handle the control difficulties caused by high-order dynamics, a control structure based on fractional order [proportional integral] (PI) controller and fractional order Smith-like predictor for a class of high order systems in the type of K/(Ts+1)n is proposed in this paper. The analysis of the tracking and disturbance rejection is illustrated based on the terminal value theorem and shows that the proposed control structure can ensure that the closed-loop system converges to the set point without static error and the closed-loop system recovers to its original state when the input disturbance occurs. Then, simulations about the influence on the control performance and control signal with different are carried out based on multi-objective genetic algorithm (MO-GA). The results show that the control performance can be improved and the energy of the control signal can be reduced simultaneously when the order is chosen no more than one. This can verify that the fractional order Smith-like predictor with has an advantage over that of the integral order Smith-like predictor.
MLJul 17, 2018
Item Recommendation with Variational Autoencoders and Heterogenous PriorsGiannis Karamanolakis, Kevin Raji Cherian, Ananth Ravi Narayan et al.
In recent years, Variational Autoencoders (VAEs) have been shown to be highly effective in both standard collaborative filtering applications and extensions such as incorporation of implicit feedback. We extend VAEs to collaborative filtering with side information, for instance when ratings are combined with explicit text feedback from the user. Instead of using a user-agnostic standard Gaussian prior, we incorporate user-dependent priors in the latent VAE space to encode users' preferences as functions of the review text. Taking into account both the rating and the text information to represent users in this multimodal latent space is promising to improve recommendation quality. Our proposed model is shown to outperform the existing VAE models for collaborative filtering (up to 29.41% relative improvement in ranking metric) along with other baselines that incorporate both user ratings and text for item recommendation.