14.5CEApr 22
An Explainable Approach to Document-level Translation Evaluation with Topic ModelingHyeokmin Lee, Youngkyu Kim, Byounghyun Yoo
The advent of NMT has expanded the scope of translation beyond isolated sentences, enabling context to be preserved across paragraphs and documents. However, current evaluation metrics largely remain restricted to the sentence level and typically depend on reference translations. Without references, existing metrics cannot provide a clear basis for their quality assessments. To address these limitations, we propose an evaluation framework that independently extracts and compares latent topic structures within source and translated texts. This framework utilises various topic modelling techniques, including LSA, LDA and BERTopic, to achieve this. Our methodology captures statistical frequency information and semantic context, providing a comprehensive evaluation of the entire document. It aligns key topic tokens across languages using a bilingual dictionary and quantifies thematic consistency via cosine similarity. This allows us to evaluate how faithfully the translation maintains the thematic integrity of the source text, even in the absence of reference translations. To this end, we used a large scale dataset of 9.38 million Korean to English sentence pairs from AI Hub, which includes pre evaluated BLEU scores. We also calculated CometKiwi, a state of the art, reference free metric for this dataset, in order to conduct a comparative analysis with our proposed, topic based framework. Through this analysis, we confirmed that, unlike existing metrics, our framework evaluates the differentiated attribute of document level thematic units. Furthermore, visualising the key tokens that underpin the quantitative evaluation score provides clear insight into translation quality. Consequently, this study contributes to effectively complementing the existing translation evaluation system by proposing a new metric that intuitively identifies whether the document's theme has been preserved.
LGMay 28, 2025
Defining Foundation Models for Computational Science: A Call for Clarity and RigorYoungsoo Choi, Siu Wun Cheung, Youngkyu Kim et al.
The widespread success of foundation models in natural language processing and computer vision has inspired researchers to extend the concept to scientific machine learning and computational science. However, this position paper argues that as the term "foundation model" is an evolving concept, its application in computational science is increasingly used without a universally accepted definition, potentially creating confusion and diluting its precise scientific meaning. In this paper, we address this gap by proposing a formal definition of foundation models in computational science, grounded in the core values of generality, reusability, and scalability. We articulate a set of essential and desirable characteristics that such models must exhibit, drawing parallels with traditional foundational methods, like the finite element and finite volume methods. Furthermore, we introduce the Data-Driven Finite Element Method (DD-FEM), a framework that fuses the modular structure of classical FEM with the representational power of data-driven learning. We demonstrate how DD-FEM addresses many of the key challenges in realizing foundation models for computational science, including scalability, adaptability, and physics consistency. By bridging traditional numerical methods with modern AI paradigms, this work provides a rigorous foundation for evaluating and developing novel approaches toward future foundation models in computational science.
NANov 13, 2020
Efficient nonlinear manifold reduced order modelYoungkyu Kim, Youngsoo Choi, David Widemann et al.
Traditional linear subspace reduced order models (LS-ROMs) are able to accelerate physical simulations, in which the intrinsic solution space falls into a subspace with a small dimension, i.e., the solution space has a small Kolmogorov n-width. However, for physical phenomena not of this type, such as advection-dominated flow phenomena, a low-dimensional linear subspace poorly approximates the solution. To address cases such as these, we have developed an efficient nonlinear manifold ROM (NM-ROM), which can better approximate high-fidelity model solutions with a smaller latent space dimension than the LS-ROMs. Our method takes advantage of the existing numerical methods that are used to solve the corresponding full order models (FOMs). The efficiency is achieved by developing a hyper-reduction technique in the context of the NM-ROM. Numerical results show that neural networks can learn a more efficient latent space representation on advection-dominated data from 2D Burgers' equations with a high Reynolds number. A speed-up of up to 11.7 for 2D Burgers' equations is achieved with an appropriate treatment of the nonlinear terms through a hyper-reduction technique.
NASep 25, 2020
A fast and accurate physics-informed neural network reduced order model with shallow masked autoencoderYoungkyu Kim, Youngsoo Choi, David Widemann et al.
Traditional linear subspace reduced order models (LS-ROMs) are able to accelerate physical simulations, in which the intrinsic solution space falls into a subspace with a small dimension, i.e., the solution space has a small Kolmogorov n-width. However, for physical phenomena not of this type, e.g., any advection-dominated flow phenomena, such as in traffic flow, atmospheric flows, and air flow over vehicles, a low-dimensional linear subspace poorly approximates the solution. To address cases such as these, we have developed a fast and accurate physics-informed neural network ROM, namely nonlinear manifold ROM (NM-ROM), which can better approximate high-fidelity model solutions with a smaller latent space dimension than the LS-ROMs. Our method takes advantage of the existing numerical methods that are used to solve the corresponding full order models. The efficiency is achieved by developing a hyper-reduction technique in the context of the NM-ROM. Numerical results show that neural networks can learn a more efficient latent space representation on advection-dominated data from 1D and 2D Burgers' equations. A speedup of up to 2.6 for 1D Burgers' and a speedup of 11.7 for 2D Burgers' equations are achieved with an appropriate treatment of the nonlinear terms through a hyper-reduction technique. Finally, a posteriori error bounds for the NM-ROMs are derived that take account of the hyper-reduced operators.