Anqi Chen

8papers

53citations

Novelty39%

AI Score44

Ranked #75,438 of 205,806 authors (top 37%)#406 in NA (top 12%)

8 Papers

99.3CVMay 25Code

ERNIE-Image Technical Report

Jiaxiang Liu, Zhida Feng, Pengyu Zou et al.

We introduce ERNIE-Image, an open-source text-to-image generation model built upon an 8B single-stream DiT architecture. ERNIE-Image aims to bridge the gap between current open-source models and leading closed-source systems through more effective mining of large-scale pre-training data and improved supervision quality throughout training. During pre-training, we adopt a bottom-up data construction pipeline that combines fine-grained image categorization, rich caption annotation, aesthetic assessment, and hierarchical sampling. This strategy reduces data noise while preserving long-tail concepts and detailed real-world knowledge, providing a stronger foundation for complex generation tasks. In the post-training stage, we use a top-down data construction pipeline for high-demand scenarios, diversify prompt annotations to better match real user inputs, and apply a stabilized DPO strategy to align the model with human aesthetic preferences. We further train ERNIE-Image-Turbo for efficient 8-NFE generation and propose MT-DMD to mitigate capability drift during distillation. To make the model easier to use in practical scenarios, we equip it with a lightweight Prompt Enhancer that expands concise user intents into structured visual descriptions. In addition, we develop ERNIE-Image-Aes, an industrial-grade aesthetic model, together with ERNIE-Image-Aes-1K, a human-annotated benchmark for realistic aesthetic evaluation. Extensive qualitative and quantitative experiments show that ERNIE-Image achieves leading performance among open-source models and approaches top-tier commercial models in instruction following, text rendering, and aesthetic quality. We release the trained models and aesthetic resources to facilitate further academic research and technical progress in the AIGC community.

96.0CVApr 19Code

Instinct vs. Reflection: Unifying Token and Verbalized Confidence in Multimodal Large Models

Yunkai Dang, Yifan Jiang, Yizhu Jiang et al.

Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in various perception and reasoning tasks. Despite this success, ensuring their reliability in practical deployment necessitates robust confidence estimation. Prior works have predominantly focused on text-only LLMs, often relying on computationally expensive self-consistency sampling. In this paper, we extend this to multimodal settings and conduct a comprehensive evaluation of MLLMs' response confidence estimation. Our analysis reveals a significant instinct-reflection misalignment: the model's implicit token-level support frequently diverges from its verbal self-assessment confidence. To address this misalignment, we propose a monotone confidence fusion framework to merge dual-channel signals and cross-channel consistency to estimate correctness. Subsequently, an order-preserving mean alignment step is applied to correct global bias, which improves calibration while preserving the risk-coverage trade-off for selective prediction. Experiments on diverse open-source and closed-source MLLMs show that our method consistently yields more reliable confidence estimates and improves both calibration and failure prediction. Code will be available at https://github.com/Yunkaidang/Instinct-vs.-Reflection.

NAJan 17, 2018

An Ultra-Weak Discontinuous Galerkin Method for Schrödinger Equation in One Dimension

Anqi Chen, Fengyan Li, Yingda Cheng

In this paper, we develop an ultra-weak discontinuous Galerkin (DG) method to solve the one-dimensional nonlinear Schrödinger equation. Stability conditions and error estimates are derived for the scheme with a general class of numerical fluxes. The error estimates are based on detailed analysis of the projection operator associated with each individual flux choice. Depending on the parameters, we find out that in some cases, the projection can be defined element-wise, facilitating analysis. In most cases, the projection is global, and its analysis depends on the resulting $2\times2$ block-circulant matrix structures. For a large class of parameter choices, optimal $\textit{a priori}$ $L^2$ error estimates can be obtained. Numerical examples are provided verifying theoretical results.

HCAug 27, 2024

Cross-subject Brain Functional Connectivity Analysis for Multi-task Cognitive State Evaluation

Jun Chen, Anqi Chen, Bingkun Jiang et al.

Cognition refers to the function of information perception and processing, which is the fundamental psychological essence of human beings. It is responsible for reasoning and decision-making, while its evaluation is significant for the aviation domain in mitigating potential safety risks. Existing studies tend to use varied methods for cognitive state evaluation yet have limitations in timeliness, generalisation, and interpretability. Accordingly, this study adopts brain functional connectivity with electroencephalography signals to capture associations in brain regions across multiple subjects for evaluating real-time cognitive states. Specifically, a virtual reality-based flight platform is constructed with multi-screen embedded. Three distinctive cognitive tasks are designed and each has three degrees of difficulty. Thirty subjects are acquired for analysis and evaluation. The results are interpreted through different perspectives, including inner-subject and cross-subject for task-wise and gender-wise underlying brain functional connectivity. Additionally, this study incorporates questionnaire-based, task performance-based, and physiological measure-based approaches to fairly label the trials. A multi-class cognitive state evaluation is further conducted with the active brain connections. Benchmarking results demonstrate that the identified brain regions have considerable influences in cognition, with a multi-class accuracy rate of 95.83% surpassing existing studies. The derived findings bring significance to understanding the dynamic relationships among human brain functional regions, cross-subject cognitive behaviours, and decision-making, which have promising practical application values.

NAMay 20, 2019

Superconvergence of ultra-weak discontinuous Galerkin methods for the linear Schrödinger equation in one dimension

Anqi Chen, Yingda Cheng, Yong Liu et al.

We analyze the superconvergence properties of ultra-weak discontinuous Galerkin (UWDG) methods with various choices of flux parameters for one-dimensional linear Schrödinger equation. In our previous work [10], stability and optimal convergence rate are established for a large class of flux parameters. Depending on the flux choices and if the polynomial degree $k$ is even or odd, in this paper, we prove $2k$ or $(2k-1)$-th order superconvergence rate for cell averages and numerical flux of the function, as well as $(2k-1)$ or $(2k-2)$-th order for numerical flux of the derivative. In addition, we prove superconvergence of $(k+2)$ or $(k+3)$-th order of the DG solution towards a special projection. At a class of special points, the function values and the first and second order derivatives of the DG solution are superconvergent with order $k+2, k+1, k$, respectively. The proof relies on the correction function techniques initiated in [8], and applied to [6] for direct DG (DDG) methods for diffusion problems. Compared with [6], Schrödinger equation poses unique challenges for superconvergence proof because of the lack of the dissipation mechanism from the equation. One major highlight of our proof is that we introduce specially chosen test functions in the error equation and show the superconvergence of the second derivative and jump across the cell interfaces of the difference between numerical solution and projected exact solution. This technique was originally proposed in [12] and is essential to elevate the convergence order for our analysis. Finally, by negative norm estimates, we apply the post-processing technique and show that the accuracy of our scheme can be enhanced to order $2k.$ Theoretical results are verified by numerical experiments.

CRJul 18, 2023

Runtime Stealthy Perception Attacks against DNN-based Adaptive Cruise Control Systems

Xugui Zhou, Anqi Chen, Maxfield Kouzel et al.

Adaptive Cruise Control (ACC) is a widely used driver assistance technology for maintaining the desired speed and safe distance to the leading vehicle. This paper evaluates the security of the deep neural network (DNN) based ACC systems under runtime stealthy perception attacks that strategically inject perturbations into camera data to cause forward collisions. We present a context-aware strategy for the selection of the most critical times for triggering the attacks and a novel optimization-based method for the adaptive generation of image perturbations at runtime. We evaluate the effectiveness of the proposed attack using an actual vehicle, a publicly available driving dataset, and a realistic simulation platform with the control software from a production ACC system, a physical-world driving simulator, and interventions by the human driver and safety features such as Advanced Emergency Braking System (AEBS). Experimental results show that the proposed attack achieves 142.9 times higher success rate in causing hazards and 82.6% higher evasion rate than baselines, while being stealthy and robust to real-world factors and dynamic changes in the environment. This study highlights the role of human drivers and basic safety mechanisms in preventing attacks.

NAJan 15, 2019

Sparse Grid Central Discontinuous Galerkin Method for Linear Hyperbolic Systems in High Dimensions

Zhanjing Tao, Anqi Chen, Mengping Zhang et al.

In this paper, we develop sparse grid central discontinuous Galerkin (CDG) scheme for linear hyperbolic systems with variable coefficients in high dimensions. The scheme combines the CDG framework with the sparse grid approach, with the aim of breaking the curse of dimensionality. A new hierarchical representation of piecewise polynomials on the dual mesh is introduced and analyzed, resulting in a sparse finite element space that can be used for non-periodic problems. Theoretical results, such as $L^2$ stability and error estimates are obtained for scalar problems. CFL conditions are studied numerically comparing discontinuous Galerkin (DG), CDG, sparse grid DG and sparse grid CDG methods. Numerical results including scalar linear equations, acoustic and elastic waves are provided.

ROSep 3, 2021

Communicating Inferred Goals with Passive Augmented Reality and Active Haptic Feedback

James F. Mullen, Josh Mosier, Sounak Chakrabarti et al.

Robots learn as they interact with humans. Consider a human teleoperating an assistive robot arm: as the human guides and corrects the arm's motion, the robot gathers information about the human's desired task. But how does the human know what their robot has inferred? Today's approaches often focus on conveying intent: for instance, upon legible motions or gestures to indicate what the robot is planning. However, closing the loop on robot inference requires more than just revealing the robot's current policy: the robot should also display the alternatives it thinks are likely, and prompt the human teacher when additional guidance is necessary. In this paper we propose a multimodal approach for communicating robot inference that combines both passive and active feedback. Specifically, we leverage information-rich augmented reality to passively visualize what the robot has inferred, and attention-grabbing haptic wristbands to actively prompt and direct the human's teaching. We apply our system to shared autonomy tasks where the robot must infer the human's goal in real-time. Within this context, we integrate passive and active modalities into a single algorithmic framework that determines when and which type of feedback to provide. Combining both passive and active feedback experimentally outperforms single modality baselines; during an in-person user study, we demonstrate that our integrated approach increases how efficiently humans teach the robot while simultaneously decreasing the amount of time humans spend interacting with the robot. Videos here: https://youtu.be/swq_u4iIP-g