AIJun 3
Agents' Last ExamYiyou Sun, Xinyang Han, Weichen Zhang et al.
Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue that this gap is largely an evaluation problem: widely used benchmarks lack sustained performance measurement on real and economically valuable workflows. This paper introduces Agents' Last Exam (ALE), a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes. Developed in collaboration with 250+ industry experts, ALE covers non-physical industries defined with reference to O*NET / SOC 2018 (the U.S. federal occupational taxonomy). It is organized around a task taxonomy with 55 subfields grouped into 13 industry clusters covering 1K+ tasks. Current results show that the hardest tier remains far from saturated: across mainstream harness and backbone configurations, the average full pass rate is 2.6%. ALE is designed as a living benchmark: its task pool grows continuously as new workflows and industries are onboarded. More broadly, ALE is intended not merely as another leaderboard, but as an instrument for closing the gap between benchmark success and GDP-relevant impact.
SYJan 28, 2021
Gaussian Process-Based Model Predictive Control of Blood Glucose for Patients with Type 1 Diabetes MellitusLukas Ortmann, Dawei Shi, Eyal Dassau et al.
The insulin sensitivity (IS) of the human body changes with a circadian rhythm. This adds to the time-varying feature of the glucose metabolism process and places challenges on the blood glucose (BG) control of patients with Type 1 Diabetes Mellitus. This paper presents a Model Predictive Controller that takes the periodic IS into account, in order to enhance BG control. The future effect of the IS is predicted using a machine learning technique, namely, a customized Gaussian Process (GP), based on historical training data. The training data for the GP is continuously updated during closed-loop control, which enables the control scheme to learn and adapt to intra-individual and inter-individual changes of the circadian IS rhythm. The necessary state information is provided by an Unscented Kalman Filter. The closed-loop performance of the proposed control scheme is evaluated for different scenarios (including fasting, announced meals and skipped meals) through in silico studies on simulation models of Göttingen Minipigs.
SYApr 13, 2023
Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent MethodsShilei Li, Lijing Li, Dawei Shi et al.
This paper presents two computationally efficient algorithms for the orientation estimation of inertial measurement units (IMUs): the correntropy-based gradient descent (CGD) and the correntropy-based decoupled orientation estimation (CDOE). Traditional methods, such as gradient descent (GD) and decoupled orientation estimation (DOE), rely on the mean squared error (MSE) criterion, making them vulnerable to external acceleration and magnetic interference. To address this issue, we demonstrate that the multi-kernel correntropy loss (MKCL) is an optimal objective function for maximum likelihood estimation (MLE) when the noise follows a type of heavy-tailed distribution. In certain situations, the estimation error of the MKCL is bounded even in the presence of arbitrarily large outliers. By replacing the standard MSE cost function with MKCL, we develop the CGD and CDOE algorithms. We evaluate the effectiveness of our proposed methods by comparing them with existing algorithms in various situations. Experimental results indicate that our proposed methods (CGD and CDOE) outperform their conventional counterparts (GD and DOE), especially when faced with external acceleration and magnetic disturbances. Furthermore, the new algorithms demonstrate significantly lower computational complexity than Kalman filter-based approaches, making them suitable for applications with low-cost microprocessors.
ITMay 8
Variational Robust Kalman Filters: A Unified FrameworkShilei Li, Dawei Shi, Hao Yu et al.
Robustness and adaptivity are two competing objectives in Kalman filters (KF). Robustness involves temporarily inflating prior estimates of noise covariances, while adaptivity updates prior beliefs by exploiting measurements. In practical applications, both process and measurement noise can be influenced by outliers, be time-varying, or both. In this work, we propose a variational robust Kalman filter, built on a Student's $t$-distribution induced loss function and variational inference, and solved in a computationally efficient manner. We demonstrate that robustness can be understood as a prerequisite for adaptivity, making it possible to merge the above two competing goals into a single framework through a probabilistic switching rule. Additionally, our proposed filter can recover conventional KF, robust KF, and adaptive KF by tuning parameters, and can suppress both the imperfect process and measurement noise, enabling it to perform superiorly in complex noise environments. Simulations verify the effectiveness of the proposed method.
LGJun 27, 2025
Thompson Sampling-Based Learning and Control for Unknown Dynamic SystemsKaikai Zheng, Dawei Shi, Yang Shi et al.
Thompson sampling (TS) is an effective method to explore parametric uncertainties and can therefore be used for active learning-based controller design. However, TS relies on finite parametric representations, which limits its applicability to more general spaces, which are more commonly encountered in control system design. To address this issue, this work pro poses a parameterization method for control law learning using reproducing kernel Hilbert spaces and designs a data-driven active learning control approach. Specifically, the proposed method treats the control law as an element in a function space, allowing the design of control laws without imposing restrictions on the system structure or the form of the controller. A TS framework is proposed in this work to explore potential optimal control laws, and the convergence guarantees are further provided for the learning process. Theoretical analysis shows that the proposed method learns the relationship between control laws and closed-loop performance metrics at an exponential rate, and the upper bound of control regret is also derived. Numerical experiments on controlling unknown nonlinear systems validate the effectiveness of the proposed method.
RONov 9, 2020
Posture Adjustment for a Wheel-legged Robotic System via Leg Force Control with Prescribed Transient PerformanceDongchen Liu, Junzheng Wang, Shoukun Wang et al.
This work proposes a force control strategy with prescribed transient performance for the legs of a wheel-legged robotic system to realize the posture adjustment on uneven roads. A dynamic model of the robotic system is established with the body postures as inputs and the leg forces as outputs, such that the desired forces for the wheel-legs are calculated by the posture reference and feedback. Based on the funnel control scheme, the legs realize force tracking with prescribed transient performance. To improve the robustness of the force control system, an event-based mechanism is designed for the online segment of the funnel function. As a result, the force tracking error of the wheel-leg evolves inside the performance funnel with proved convergence. The absence of Zeno behavior for the event-triggering condition is also guaranteed. The proposed control scheme is applied to the wheel-legged physical prototype for the performance of force tracking and posture adjustment. Multiple comparative experimental results are presented to validate the stability and effectiveness of the proposed methodology.
SYAug 5, 2016
Quickest Change Detection in Adaptive Censoring Sensor NetworksXiaoqiang Ren, Karl H. Johansson, Dawei Shi et al.
The problem of quickest change detection with communication rate constraints is studied. A network of wireless sensors with limited computation capability monitors the environment and sends observations to a fusion center via wireless channels. At an unknown time instant, the distributions of observations at all the sensor nodes change simultaneously. Due to limited energy, the sensors cannot transmit at all the time instants. The objective is to detect the change at the fusion center as quickly as possible, subject to constraints on false detection and average communication rate between the sensors and the fusion center. A minimax formulation is proposed. The cumulative sum (CuSum) algorithm is used at the fusion center and censoring strategies are used at the sensor nodes. The censoring strategies, which are adaptive to the CuSum statistic, are fed back by the fusion center. The sensors only send observations that fall into prescribed sets to the fusion center. This CuSum adaptive censoring (CuSum-AC) algorithm is proved to be an equalizer rule and to be globally asymptotically optimal for any positive communication rate constraint, as the average run length to false alarm goes to infinity. It is also shown, by numerical examples, that the CuSum-AC algorithm provides a suitable trade-off between the detection performance and the communication rate.