CLOct 13, 2023Code
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation BenchmarkChen Zhang, Luis Fernando D'Haro, Chengguang Tang et al.
Recent advancements in reference-free learned metrics for open-domain dialogue evaluation have been driven by the progress in pre-trained language models and the availability of dialogue data with high-quality human annotations. However, current studies predominantly concentrate on English dialogues, and the generalization of these metrics to other languages has not been fully examined. This is largely due to the absence of a multilingual dialogue evaluation benchmark. To address the issue, we introduce xDial-Eval, built on top of open-source English dialogue evaluation datasets. xDial-Eval includes 12 turn-level and 6 dialogue-level English datasets, comprising 14930 annotated turns and 8691 annotated dialogues respectively. The English dialogue data are extended to nine other languages with commercial machine translation systems. On xDial-Eval, we conduct comprehensive analyses of previous BERT-based metrics and the recently-emerged large language models. Lastly, we establish strong self-supervised and multilingual baselines. In terms of average Pearson correlations over all datasets and languages, the best baseline outperforms OpenAI's ChatGPT by absolute improvements of 6.5% and 4.6% at the turn and dialogue levels respectively, albeit with much fewer parameters. The data and code are publicly available at https://github.com/e0397123/xDial-Eval.
NAFeb 23, 2016
An HDG method for linear elasticity with strong symmetric stressesWeifeng Qiu, Jiguang Shen, Ke Shi
This paper presents a new hybridizable discontinuous Galerkin (HDG) method for linear elasticity on general polyhedral meshes, based on a strong symmetric stress formulation. The key feature of this new HDG method is the use of a special form of the numerical trace of the stresses, which makes the error analysis different from the projection-based error analyzes used for most other HDG methods. For arbitrary polyhedral elements, we approximate the stress by using polynomials of degree k>=1 and the displacement by using polynomials of degree k+1. In contrast, to approximate the numerical trace of the displacement on the faces, we use polynomials of degree k only. This allows for a very efficient implementation of the method, since the numerical trace of the displacement is the only globally-coupled unknown, but does not degrade the convergence properties of the method. Indeed, we prove optimal orders of convergence for both the stresses and displacements on the elements. In the almost incompressible case, we show the error of the stress is also optimal in the standard L2-norm. These optimal results are possible thanks to a special superconvergence property of the numerical traces of the displacement, and thanks to the use of a crucial elementwise Korn's inequality. Several numerical results are presented to support our theoretical findings in the end.
NANov 27, 2015
A superconvergent HDG method for the Incompressible Navier-Stokes Equations on general polyhedral meshesWeifeng Qiu, Ke Shi
We present a superconvergent hybridizable discontinuous Galerkin (HDG) method for the steady-state incompressible Navier-Stokes equations on general polyhedral meshes. For arbitrary conforming polyhedral mesh, we use polynomials of degree k+1, k, k to approximate the velocity, velocity gradient and pressure, respectively. In contrast, we only use polynomials of degree k to approximate the numerical trace of the velocity on the interfaces. Since the numerical trace of the velocity field is the only globally coupled unknown, this scheme allows a very efficient implementation of the method. For the stationary case, and under the usual smallness condition for the source term, we prove that the method is well defined and that the global L2-norm of the error in each of the above-mentioned variables and the discrete H1-norm of the error in the velocity converge with the order of k+1 for k>=0. We also show that for k>=1, the global L2-norm of the error in velocity converges with the order of k+2. From the point of view of degrees of freedom of the globally coupled unknown: numerical trace, this method achieves optimal convergence for all the above-mentioned variables in L2-norm for k>=0, superconvergence for the velocity in the discrete H1-norm without postprocessing for k>=0, and superconvergence for the velocity in L2-norm without postprocessing for k>=1.
CLJun 22, 2023
Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4Mario Rodríguez-Cantelar, Chen Zhang, Chengguang Tang et al.
The advent and fast development of neural networks have revolutionized the research on dialogue systems and subsequently have triggered various challenges regarding their automatic evaluation. Automatic evaluation of open-domain dialogue systems as an open challenge has been the center of the attention of many researchers. Despite the consistent efforts to improve automatic metrics' correlations with human evaluation, there have been very few attempts to assess their robustness over multiple domains and dimensions. Also, their focus is mainly on the English language. All of these challenges prompt the development of automatic evaluation metrics that are reliable in various domains, dimensions, and languages. This track in the 11th Dialogue System Technology Challenge (DSTC11) is part of the ongoing effort to promote robust and multilingual automatic evaluation metrics. This article describes the datasets and baselines provided to participants and discusses the submission and result details of the two proposed subtasks.
NAMay 9, 2016
A superconvergent HDG method for the Maxwell equationsHuangxin Chen, Weifeng Qiu, Ke Shi
We present and analyze a new hybridizable discontinuous Galerkin (HDG) method for the steady state Maxwell equations. In order to make the problem well-posed, a condition of divergence is imposed on the electric field. Then a Lagrange multiplier $p$ is introduced, and the problem becomes the solution of a mixed curl-curl formulation of the Maxwell's problem. We use polynomials of degree $k+1$, $k$, $k$ to approximate $\bfu,\nabla \times \bfu$ and $p$ respectively. In contrast, we only use a non-trivial subspace of polynomials of degree $k+1$ to approximate the numerical tangential trace of the electric field and polynomials of degree $k+1$ to approximate the numerical trace of the Lagrange multiplier on the faces. On the simplicial meshes, a special choice of the stabilization parameters is applied, and the HDG system is shown to be well-posed. Moreover, we show that the convergence rates for $\boldsymbol{u}$ and $\nabla \times \boldsymbol{u}$ are independent of the Lagrange multiplier $p$. If we assume the dual operator of the Maxwell equation on the domain has adequate regularity, we show that the convergence rate for $\boldsymbol{u}$ is $O(h^{k+2})$. From the point of view of degrees of freedom of the globally coupled unknown: numerical trace, this HDG method achieves superconvergence for the electric field without postprocessing. Finally, we show that on general polyhedral elements, by a particular choice of the stabilization parameters again, the HDG system is also well-posed and the superconvergence of the HDG method is derived.
NADec 13, 2018
A Mixed DG method and an HDG method for incompressible magnetohydrodynamicsWeifeng Qiu, Ke Shi
In this paper we propose and analyze a mixed DG method and an HDG method for the stationary Magnetohydrodynamics (MHD) equations with two types of boundary (or constraint) conditions. The mixed DG method is based a recent work proposed by Houston et. al. for the linearized MHD. With two novel discrete Sobolev embedding type estimates for the discontinuous polynomials, we provide a priori error estimates for the method on the nonlinear MHD equations. In the smooth case, we have optimal convergence rate for the velocity, magnetic field and pressure in the energy norm, the Lagrange multiplier only has suboptimal convergence order. With the minimal regularity assumption on the exact solution, the approximation is optimal for all unknowns. To the best of our knowledge, this is the first a priori error estimates of DG methods for nonlinear MHD equations. In addition, we also propose and analyze the first divergence-free HDG method for the problem with several unique features comparing with the mixed DG method.
CLMay 27
SuperValid: Capability-Aligned OOD Validation for Generalizable Downstream ScalingQuanen Sun, Changxin Tian, Ke Shi et al.
Scaling laws guide large language model training by relating compute to cross-entropy loss, and recent work further extends them to predict downstream benchmark performance. However, prior approaches face generalization limitations from two aspects: focusing on benchmark-level performance introduces scenario-specific artifacts, while relying on IID validation loss fails to track capability improvements when training distributions vary. In this work, we argue that downstream scaling should be studied at the capability level, which captures shared skill factors across related tasks while abstracting away benchmark-specific noise. We propose SuperValid, a framework that synthesizes OOD (out-of-distribution), capability-aligned validation data by distilling core concepts from benchmarks within a capability domain and expanding them into diverse, knowledge-rich texts. Extensive experiments spanning 17 benchmarks grouped into 6 capability domains show that SuperValid loss exhibits strong and stable correlation with downstream performance across models of different architectures, scales, and training data distributions. As a training-free metric computable during training without benchmark evaluation, SuperValid enables effective model selection, early stopping, and scaling decisions.
CLApr 27Code
DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based AgentsJunshuo Zhang, Chengrui Huang, Feng Guo et al.
Large language model (LLM) agents that follow the sequential "reason-then-act" paradigm have achieved superior performance in many complex tasks.However, these methods suffer from limited exploration and incomplete environmental understanding, as they interact with only a single environment per step. In this paper, we first introduce a novel paradigm that enables an agent to interact with multiple environments simultaneously and share cross-trajectory experiences. Building upon this paradigm, we further propose DPEPO, a reinforcement learning (RL) algorithm that encourages the agent to perform diverse parallel exploration. There are two stages in DPEPO: initial supervised fine-tuning (SFT) imparts basic parallel reasoning and action generation, followed by reinforcement learning stage with a hierarchical reward scheme. We design a parallel trajectory-level success reward and two step-level rewards: Diverse Action Reward and Diverse State Transition Reward, which actively penalize behavioral redundancy and promote broad exploration. Extensive experiments on ALFWorld and ScienceWorld show that DPEPO achieves state-of-the-art (SOTA) success rates, while maintaining comparable efficiency to strong sequential baselines. (Code is available at https://github.com/LePanda026/Code-for-DPEPO)
CLFeb 12
PACE: Prefix-Protected and Difficulty-Aware Compression for Efficient ReasoningRuixiang Feng, Yuntao Wen, Silin Zhou et al.
Language Reasoning Models (LRMs) achieve strong performance by scaling test-time computation but often suffer from ``overthinking'', producing excessively long reasoning traces that increase latency and memory usage. Existing LRMs typically enforce conciseness with uniform length penalties, which over-compress crucial early deduction steps at the sequence level and indiscriminately penalize all queries at the group level. To solve these limitations, we propose \textbf{\model}, a dual-level framework for prefix-protected and difficulty-aware compression under hierarchical supervision. At the sequence level, prefix-protected optimization employs decaying mixed rollouts to maintain valid reasoning paths while promoting conciseness. At the group level, difficulty-aware penalty dynamically scales length constraints based on query complexity, maintaining exploration for harder questions while curbing redundancy on easier ones. Extensive experiments on DeepSeek-R1-Distill-Qwen (1.5B/7B) demonstrate that \model achieves a substantial reduction in token usage (up to \textbf{55.7\%}) while simultaneously improving accuracy (up to \textbf{4.1\%}) on math benchmarks, with generalization ability to code, science, and general domains.
IRNov 1, 2025
Structurally Refined Graph Transformer for Multimodal RecommendationKe Shi, Yan Zhang, Miao Zhang et al.
Multimodal recommendation systems utilize various types of information, including images and text, to enhance the effectiveness of recommendations. The key challenge is predicting user purchasing behavior from the available data. Current recommendation models prioritize extracting multimodal information while neglecting the distinction between redundant and valuable data. They also rely heavily on a single semantic framework (e.g., local or global semantics), resulting in an incomplete or biased representation of user preferences, particularly those less expressed in prior interactions. Furthermore, these approaches fail to capture the complex interactions between users and items, limiting the model's ability to meet diverse users. To address these challenges, we present SRGFormer, a structurally optimized multimodal recommendation model. By modifying the transformer for better integration into our model, we capture the overall behavior patterns of users. Then, we enhance structural information by embedding multimodal information into a hypergraph structure to aid in learning the local structures between users and items. Meanwhile, applying self-supervised tasks to user-item collaborative signals enhances the integration of multimodal information, thereby revealing the representational features inherent to the data's modality. Extensive experiments on three public datasets reveal that SRGFormer surpasses previous benchmark models, achieving an average performance improvement of 4.47 percent on the Sports dataset. The code is publicly available online.
STAT-MECHApr 3
Zero-Freeness of the Hard-Core Model with Bounded Connective ConstantYuan Chen, Shuai Shao, Ke Shi
We study the zero-free regions of the partition function of the hard-core model on finite graphs and their implications for the analyticity of the free energy on infinite lattices. Classically, zero-freeness results have been established up to the tree uniqueness threshold $λ_c(Î-1)$ determined by the maximum degree $Î$. However, for many graph classes, such as regular lattices, the connective constant $Ï$ provides a more precise measure of structural complexity than the maximum degree. While recent approximation algorithms based on correlation decay and Markov chain Monte Carlo have successfully exploited the connective constant to improve the threshold to $λ_c(Ï)$, analogous results for complex zero-freeness have been lacking. In this paper, we bridge this gap by introducing a proper definition of the connective constant for finite graphs based on a lower bound on the number of $k$-depth self-avoiding walks. We prove that for any graph family with a lower connective constant $μ$, the partition function is zero-free in a complex neighborhood of the interval $[0, λ]$ for all $λ< λ_c(μ)$. As a direct consequence, we establish the uniqueness and analyticity of the free energy density for infinite lattices up to the connective constant threshold, extending the known regions derived from maximum degree bounds. Our proof utilizes a block contraction technique that lifts the correlation decay property from a real interval to a strip-like complex neighborhood.
IRApr 6
FAVE: Flow-based Average Velocity Establishment for Sequential RecommendationKe Shi, Yao Zhang, Feng Guo et al.
Generative recommendation has emerged as a transformative paradigm for capturing the dynamic evolution of user intents in sequential recommendation. While flow-based methods improve the efficiency of diffusion models, they remain hindered by the ``Noise-to-Data'' paradigm, which introduces two critical inefficiencies: prior mismatch, where generation starts from uninformative noise, forcing a lengthy recovery trajectory; and linear redundancy, where iterative solvers waste computation on modeling deterministic preference transitions. To address these limitations, we propose a Flow-based Average Velocity Establishment (Fave) framework for one-step generation recommendation that learns a direct trajectory from an informative prior to the target distribution. Fave is structured via a progressive two-stage training strategy. In Stage 1, we establish a stable preference space through dual-end semantic alignment, applying constraints at both the source (user history) and target (next item) to prevent representation collapse. In Stage 2, we directly resolve the efficiency bottlenecks by introducing a semantic anchor prior, which initializes the flow with a masked embedding from the user's interaction history, providing an informative starting point. Then we learn a global average velocity, consolidating the multi-step trajectory into a single displacement vector, and enforce trajectory straightness via a JVP-based consistency constraint to ensure one-step generation. Extensive experiments on three benchmarks demonstrate that Fave not only achieves state-of-the-art recommendation performance but also delivers an order-of-magnitude improvement in inference efficiency, making it practical for latency-sensitive scenarios.
CLOct 25, 2025
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language FoundationLing Team, Ang Li, Ben Liu et al.
We introduce Ling 2.0, a series reasoning-oriented language foundation built upon the principle that every activation boosts reasoning capability. Designed to scale from tens of billions to one trillion parameters under a unified Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity, cross-scale consistency, and efficiency guided by empirical scaling laws. The series includes three non-thinking (instruct) models - Ling-mini-2.0, Ling-flash-2.0, and Ling-1T - ranging from 16B to 1T total parameters and achieving up to 7-fold active-compute efficiency compared with dense counterparts. Ling 2.0 integrates coordinated innovations across model architecture, pre-training, post-training, and infrastructure: a high-sparsity MoE with MTP for efficient reasoning, reasoning-oriented data and mid-training CoT activation, reinforcement-based fine-tuning (DFT, Evo-CoT), and full-scale FP8 training with fine-grained heterogeneous pipelines. At the trillion scale, Ling-1T establishes a new Pareto frontier of reasoning accuracy versus computational efficiency, demonstrating that sparse activation, when properly aligned with reasoning objectives, enables scalable and efficient intelligence. Collectively, Ling 2.0 provides a coherent, open, and efficient foundation for advancing future reasoning and thinking models, including the Ring series built upon the same base.
MED-PHJun 18, 2025
Unsupervised deep learning model for fast energy layer pre-selection of delivery-efficient proton arc therapy plan optimization of nasopharyngeal carcinomaBohan Yang, Gang Liu, Yang Zhong et al.
Proton arc therapy (PAT) is an emerging and promising modality in radiotherapy, offering improved dose distribution and treatment robustness over intensity-modulated proton therapy. Yet, identifying the optimal energy layer (EL) sequence remains challenging due to the intensive computational demand and prolonged treatment delivery time. This study proposes an unsupervised deep learning model for fast EL pre-selection that minimizes EL switch (ELS) time while maintaining high plan quality. We introduce a novel data representation method, spot-count representation, which encodes the number of proton spots intersecting the target and organs at risk (OAR) in a matrix structured by sorted gantry angles and energy layers. This representation serves as the input of an U-Net style architecture, SPArc_dl, which is trained using a tri-objective function: maximizing spot-counts on target, minimizing spot-counts on OAR, and reducing ELS time. The model is evaluated on 35 nasopharyngeal cancer cases, and its performance is compared to SPArc_particle_swarm (SPArc_ps). SPArc_dl produces EL pre-selection that significantly improves both plan quality and delivery efficiency. Compared to SPArc_ps, it enhances the conformity index by 0.1 (p<0.01), reduces the homogeneity index by 0.71 (p<0.01), lowers the brainstem mean dose by 0.25 (p<0.01), and shortens the ELS time by 37.2% (p < 0.01). The results unintentionally reveal employing unchanged ELS is more time-wise efficient than descended ELS. SPArc_dl's inference time is within 1 second. However, SPArc_dl plan demonstrates limitation in robustness. The proposed spot-count representation lays a foundation for incorporating unsupervised deep learning approaches into EL pre-selection task. SPArc_dl is a fast tool for generating high-quality PAT plans by strategically pre-selecting EL to reduce delivery time while maintaining excellent dosimetric performance.
AIApr 28, 2025
Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of MindMouad Abrini, Omri Abend, Dina Acklin et al. · cambridge
This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.
CLOct 9, 2021
DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and ParsingZhengyuan Liu, Ke Shi, Nancy F. Chen
Text discourse parsing weighs importantly in understanding information flow and argumentative structure in natural language, making it beneficial for downstream tasks. While previous work significantly improves the performance of RST discourse parsing, they are not readily applicable to practical use cases: (1) EDU segmentation is not integrated into most existing tree parsing frameworks, thus it is not straightforward to apply such models on newly-coming data. (2) Most parsers cannot be used in multilingual scenarios, because they are developed only in English. (3) Parsers trained from single-domain treebanks do not generalize well on out-of-domain inputs. In this work, we propose a document-level multilingual RST discourse parsing framework, which conducts EDU segmentation and discourse tree parsing jointly. Moreover, we propose a cross-translation augmentation strategy to enable the framework to support multilingual parsing and improve its domain generality. Experimental results show that our model achieves state-of-the-art performance on document-level multilingual RST parsing in all sub-tasks.
CLJul 8, 2021
Multilingual Speech Evaluation: Case Studies on English, Malay and TamilHuayun Zhang, Ke Shi, Nancy F. Chen
Speech evaluation is an essential component in computer-assisted language learning (CALL). While speech evaluation on English has been popular, automatic speech scoring on low resource languages remains challenging. Work in this area has focused on monolingual specific designs and handcrafted features stemming from resource-rich languages like English. Such approaches are often difficult to generalize to other languages, especially if we also want to consider suprasegmental qualities such as rhythm. In this work, we examine three different languages that possess distinct rhythm patterns: English (stress-timed), Malay (syllable-timed), and Tamil (mora-timed). We exploit robust feature representations inspired by music processing and vector representation learning. Empirical validations show consistent gains for all three languages when predicting pronunciation, rhythm and intonation performance.
CLJun 16, 2021
Coreference-Aware Dialogue SummarizationZhengyuan Liu, Ke Shi, Nancy F. Chen
Summarizing conversations via neural approaches has been gaining research traction lately, yet it is still challenging to obtain practical solutions. Examples of such challenges include unstructured information exchange in dialogues, informal interactions between speakers, and dynamic role changes of speakers as the dialogue evolves. Many of such challenges result in complex coreference links. Therefore, in this work, we investigate different approaches to explicitly incorporate coreference information in neural abstractive dialogue summarization models to tackle the aforementioned challenges. Experimental results show that the proposed approaches achieve state-of-the-art performance, implying it is useful to utilize coreference information in dialogue summarization. Evaluation results on factual correctness suggest such coreference-aware models are better at tracing the information flow among interlocutors and associating accurate status/actions with the corresponding interlocutors and person mentions.
CLDec 21, 2020
An End-to-End Document-Level Neural Discourse Parser Exploiting Multi-Granularity RepresentationsKe Shi, Zhengyuan Liu, Nancy F. Chen
Document-level discourse parsing, in accordance with the Rhetorical Structure Theory (RST), remains notoriously challenging. Challenges include the deep structure of document-level discourse trees, the requirement of subtle semantic judgments, and the lack of large-scale training corpora. To address such challenges, we propose to exploit robust representations derived from multiple levels of granularity across syntax and semantics, and in turn incorporate such representations in an end-to-end encoder-decoder neural architecture for more resourceful discourse processing. In particular, we first use a pre-trained contextual language model that embodies high-order and long-range dependency to enable finer-grain semantic, syntactic, and organizational representations. We further encode such representations with boundary and hierarchical information to obtain more refined modeling for document-level discourse processing. Experimental results show that our parser achieves the state-of-the-art performance, approaching human-level performance on the benchmarked RST dataset.
CLDec 3, 2020
Multilingual Neural RST Discourse ParsingZhengyuan Liu, Ke Shi, Nancy F. Chen
Text discourse parsing plays an important role in understanding information flow and argumentative structure in natural language. Previous research under the Rhetorical Structure Theory (RST) has mostly focused on inducing and evaluating models from the English treebank. However, the parsing tasks for other languages such as German, Dutch, and Portuguese are still challenging due to the shortage of annotated data. In this work, we investigate two approaches to establish a neural, cross-lingual discourse parser via: (1) utilizing multilingual vector representations; and (2) adopting segment-level translation of the source content. Experiment results show that both methods are effective even with limited training data, and achieve state-of-the-art performance on cross-lingual, document-level discourse parsing on all sub-tasks.
CLApr 29, 2020
Conditional Neural Generation using Sub-Aspect Functions for Extractive News SummarizationZhengyuan Liu, Ke Shi, Nancy F. Chen
Much progress has been made in text summarization, fueled by neural architectures using large-scale training corpora. However, in the news domain, neural models easily overfit by leveraging position-related features due to the prevalence of the inverted pyramid writing style. In addition, there is an unmet need to generate a variety of summaries for different users. In this paper, we propose a neural framework that can flexibly control summary generation by introducing a set of sub-aspect functions (i.e. importance, diversity, position). These sub-aspect functions are regulated by a set of control codes to decide which sub-aspect to focus on during summary generation. We demonstrate that extracted summaries with minimal position bias is comparable with those generated by standard models that take advantage of position preference. We also show that news summaries generated with a focus on diversity can be more preferred by human raters. These results suggest that a more flexible neural summarization framework providing more control options could be desirable in tailoring to different user preferences, which is useful since it is often impractical to articulate such preferences for different applications a priori.
NAMay 2, 2019
Convergence of a $B$-$E$ based finite element method for MHD models on Lipschitz domainsKaibo Hu, Weifeng Qiu, Ke Shi
We discuss a class of magnetic-electric fields based finite element schemes for stationary magnetohydrodynamics (MHD) systems with two types of boundary conditions. We establish a key $L^{3}$ estimate for divergence-free finite element functions for a new type of boundary conditions. With this estimate and a similar one in [Hu&Xu,2018], we rigorously prove the convergence of Picard iterations and the finite element schemes with weak regularity assumptions. These results demonstrate the convergence of the finite element methods for singular solutions.