Dong Sun

CV
h-index45
11papers
146citations
Novelty43%
AI Score48

11 Papers

NANov 2, 2012
Efficient and Long-Time Accurate Second-Order Methods for Stokes-Darcy System

Wenbin Chen, Max Gunzburger, Dong Sun et al.

We propose and study two second-order in time implicit-explicit (IMEX) methods for the coupled Stokes-Darcy system that governs flows in karst aquifers. The first is a combination of a second-order backward differentiation formula and the second-order Gear's extrapolation approach. The second is a combination of the second-order Adams-Moulton and second-order Adams-Bashforth methods. Both algorithms only require the solution of two decoupled problems at each time step, one Stokes and the other Darcy. Hence, these schemes are very efficient and can be easily implemented using legacy codes. We establish the unconditional and uniform in time stability for both schemes. The uniform in time stability leads to uniform in time control of the error which is highly desirable for modeling physical processes, e.g., contaminant sequestration and release, that occur over very long time scales. Error estimates for fully-discretized schemes using finite element spatial discretizations are derived. Numerical examples are provided that illustrate the accuracy, efficiency, and long-time stability of the two schemes.

ROOct 9, 2023
STOPNet: Multiview-based 6-DoF Suction Detection for Transparent Objects on Production Lines

Yuxuan Kuang, Qin Han, Danshi Li et al. · pku

In this work, we present STOPNet, a framework for 6-DoF object suction detection on production lines, with a focus on but not limited to transparent objects, which is an important and challenging problem in robotic systems and modern industry. Current methods requiring depth input fail on transparent objects due to depth cameras' deficiency in sensing their geometry, while we proposed a novel framework to reconstruct the scene on the production line depending only on RGB input, based on multiview stereo. Compared to existing works, our method not only reconstructs the whole 3D scene in order to obtain high-quality 6-DoF suction poses in real time but also generalizes to novel environments, novel arrangements and novel objects, including challenging transparent objects, both in simulation and the real world. Extensive experiments in simulation and the real world show that our method significantly surpasses the baselines and has better generalizability, which caters to practical industrial needs.

CLFeb 4
ERNIE 5.0 Technical Report

Haifeng Wang, Hua Wu, Tian Wu et al.

In this report, we introduce ERNIE 5.0, a natively autoregressive foundation model desinged for unified multimodal understanding and generation across text, image, video, and audio. All modalities are trained from scratch under a unified next-group-of-tokens prediction objective, based on an ultra-sparse mixture-of-experts (MoE) architecture with modality-agnostic expert routing. To address practical challenges in large-scale deployment under diverse resource constraints, ERNIE 5.0 adopts a novel elastic training paradigm. Within a single pre-training run, the model learns a family of sub-models with varying depths, expert capacities, and routing sparsity, enabling flexible trade-offs among performance, model size, and inference latency in memory- or time-constrained scenarios. Moreover, we systematically address the challenges of scaling reinforcement learning to unified foundation models, thereby guaranteeing efficient and stable post-training under ultra-sparse MoE architectures and diverse multimodal settings. Extensive experiments demonstrate that ERNIE 5.0 achieves strong and balanced performance across multiple modalities. To the best of our knowledge, among publicly disclosed models, ERNIE 5.0 represents the first production-scale realization of a trillion-parameter unified autoregressive model that supports both multimodal understanding and generation. To facilitate further research, we present detailed visualizations of modality-agnostic expert routing in the unified model, alongside comprehensive empirical analysis of elastic training, aiming to offer profound insights to the community.

CVJan 27, 2023
Dual-View Selective Instance Segmentation Network for Unstained Live Adherent Cells in Differential Interference Contrast Images

Fei Pan, Yutong Wu, Kangning Cui et al.

Despite recent advances in data-independent and deep-learning algorithms, unstained live adherent cell instance segmentation remains a long-standing challenge in cell image processing. Adherent cells' inherent visual characteristics, such as low contrast structures, fading edges, and irregular morphology, have made it difficult to distinguish from one another, even by human experts, let alone computational methods. In this study, we developed a novel deep-learning algorithm called dual-view selective instance segmentation network (DVSISN) for segmenting unstained adherent cells in differential interference contrast (DIC) images. First, we used a dual-view segmentation (DVS) method with pairs of original and rotated images to predict the bounding box and its corresponding mask for each cell instance. Second, we used a mask selection (MS) method to filter the cell instances predicted by the DVS to keep masks closest to the ground truth only. The developed algorithm was trained and validated on our dataset containing 520 images and 12198 cells. Experimental results demonstrate that our algorithm achieves an AP_segm of 0.555, which remarkably overtakes a benchmark by a margin of 23.6%. This study's success opens up a new possibility of using rotated images as input for better prediction in cell images.

LGJan 21
Robustness of Mixtures of Experts to Feature Noise

Dong Sun, Rahul Nittala, Rebekka Burkholz

Despite their practical success, it remains unclear why Mixture of Experts (MoE) models can outperform dense networks beyond sheer parameter scaling. We study an iso-parameter regime where inputs exhibit latent modular structure but are corrupted by feature noise, a proxy for noisy internal activations. We show that sparse expert activation acts as a noise filter: compared to a dense estimator, MoEs achieve lower generalization error under feature noise, improved robustness to perturbations, and faster convergence speed. Empirical results on synthetic data and real-world language tasks corroborate the theoretical insights, demonstrating consistent robustness and efficiency gains from sparse modular computation.

77.4SEMay 18
ProcBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents

Jiawei He, Jie Jia, Chenbo Liu et al.

Existing benchmarks for LLM coding agents mainly evaluate final outcomes, such as task completion, compilation success, and test pass rates. While these metrics are useful for measuring end-task capability, they provide limited visibility into how an execution unfolds and often miss recurrent process-level failures that arise during multi-step operation. We present ProcBench, a benchmark-oriented framework for evaluating coding-agent trajectories through process defects and control preservation. ProcBench organizes execution failures into a reusable ontology, standardizes heterogeneous logs into a unified trajectory representation, and reports calibrated risk-based scorecards instead of relying only on final outcomes. We instantiate ProcBench on an annotated set of 200 trajectories and apply it across three coding-agent benchmarks: AndroidBench, TerminalBench, and SWE-bench-Verified. Our results suggest that ProcBench can be instantiated with useful reliability, that calibration improves the empirical interpretability of defect findings relative to direct thresholding, and that process-aware scorecards provide diagnostic distinctions beyond conventional outcome-based evaluation. We also discuss limitations, including annotation dependence, partial observability for some defect classes, and the need for broader external validation.

IVApr 2, 2024Code
Contextual Embedding Learning to Enhance 2D Networks for Volumetric Image Segmentation

Zhuoyuan Wang, Dong Sun, Xiangyun Zeng et al.

The segmentation of organs in volumetric medical images plays an important role in computer-aided diagnosis and treatment/surgery planning. Conventional 2D convolutional neural networks (CNNs) can hardly exploit the spatial correlation of volumetric data. Current 3D CNNs have the advantage to extract more powerful volumetric representations but they usually suffer from occupying excessive memory and computation nevertheless. In this study we aim to enhance the 2D networks with contextual information for better volumetric image segmentation. Accordingly, we propose a contextual embedding learning approach to facilitate 2D CNNs capturing spatial information properly. Our approach leverages the learned embedding and the slice-wisely neighboring matching as a soft cue to guide the network. In such a way, the contextual information can be transferred slice-by-slice thus boosting the volumetric representation of the network. Experiments on challenging prostate MRI dataset (PROMISE12) and abdominal CT dataset (CHAOS) show that our contextual embedding learning can effectively leverage the inter-slice context and improve segmentation performance. The proposed approach is a plug-and-play, and memory-efficient solution to enhance the 2D networks for volumetric segmentation. Our code is publicly available at https://github.com/JuliusWang-7/CE_Block.

SEOct 11, 2012Code
A lightweight forum-based distributed requirement elicitation process for open source community

Han Lai, Rong Peng, Dong Sun et al.

Nowadays, lots of open source communities adopt forum to acquire scattered stakeholders' requirements. But the requirements collection process always suffers from the unformatted description and unfocused discussions. In this paper, we establish a framework ReqForum to define the metamodel of the requirement elicitation forum. Based on it, we propose a lightweight forum-based requirements elicitation process which includes six steps: template-based requirements creation, opinions collection, requirements collection, requirements management, capability identification and the incentive mechanism. According to the proposed process, the prototype SKLSEForum is established by composing the Discuz and its existed pulg-ins. The implementation indicates that the process is feasible and the cost is economic.

CVOct 8, 2021
Stereo Dense Scene Reconstruction and Accurate Localization for Learning-Based Navigation of Laparoscope in Minimally Invasive Surgery

Ruofeng Wei, Bin Li, Hangjie Mo et al.

Objective: The computation of anatomical information and laparoscope position is a fundamental block of surgical navigation in Minimally Invasive Surgery (MIS). Recovering a dense 3D structure of surgical scene using visual cues remains a challenge, and the online laparoscopic tracking primarily relies on external sensors, which increases system complexity. Methods: Here, we propose a learning-driven framework, in which an image-guided laparoscopic localization with 3D reconstructions of complex anatomical structures is obtained. To reconstruct the 3D structure of the whole surgical environment, we first fine-tune a learning-based stereoscopic depth perception method, which is robust to the texture-less and variant soft tissues, for depth estimation. Then, we develop a dense visual reconstruction algorithm to represent the scene by surfels, estimate the laparoscope poses and fuse the depth maps into a unified reference coordinate for tissue reconstruction. To estimate poses of new laparoscope views, we achieve a coarse-to-fine localization method, which incorporates our reconstructed 3D model. Results: We evaluate the reconstruction method and the localization module on three datasets, namely, the stereo correspondence and reconstruction of endoscopic data (SCARED), the ex-vivo phantom and tissue data collected with Universal Robot (UR) and Karl Storz Laparoscope, and the in-vivo DaVinci robotic surgery dataset, where the reconstructed 3D structures have rich details of surface texture with an accuracy error under 1.71 mm and the localization module can accurately track the laparoscope with only images as input. Conclusions: Experimental results demonstrate the superior performance of the proposed method in 3D anatomy reconstruction and laparoscopic localization. Significance: The proposed framework can be potentially extended to the current surgical navigation system.

HCMay 7, 2020
DFSeer: A Visual Analytics Approach to Facilitate Model Selection for Demand Forecasting

Dong Sun, Zezheng Feng, Yuanzhe Chen et al.

Selecting an appropriate model to forecast product demand is critical to the manufacturing industry. However, due to the data complexity, market uncertainty and users' demanding requirements for the model, it is challenging for demand analysts to select a proper model. Although existing model selection methods can reduce the manual burden to some extent, they often fail to present model performance details on individual products and reveal the potential risk of the selected model. This paper presents DFSeer, an interactive visualization system to conduct reliable model selection for demand forecasting based on the products with similar historical demand. It supports model comparison and selection with different levels of details. Besides, it shows the difference in model performance on similar products to reveal the risk of model selection and increase users' confidence in choosing a forecasting model. Two case studies and interviews with domain experts demonstrate the effectiveness and usability of DFSeer.

HCJul 29, 2019
PlanningVis: A Visual Analytics Approach to Production Planning in Smart Factories

Dong Sun, Renfei Huang, Yuanzhe Chen et al.

Production planning in the manufacturing industry is crucial for fully utilizing factory resources (e.g., machines, raw materials and workers) and reducing costs. With the advent of industry 4.0, plenty of data recording the status of factory resources have been collected and further involved in production planning, which brings an unprecedented opportunity to understand, evaluate and adjust complex production plans through a data-driven approach. However, developing a systematic analytics approach for production planning is challenging due to the large volume of production data, the complex dependency between products, and unexpected changes in the market and the plant. Previous studies only provide summarized results and fail to show details for comparative analysis of production plans. Besides, the rapid adjustment to the plan in the case of an unanticipated incident is also not supported. In this paper, we propose PlanningVis, a visual analytics system to support the exploration and comparison of production plans with three levels of details: a plan overview presenting the overall difference between plans, a product view visualizing various properties of individual products, and a production detail view displaying the product dependency and the daily production details in related factories. By integrating an automatic planning algorithm with interactive visual explorations, PlanningVis can facilitate the efficient optimization of daily production planning as well as support a quick response to unanticipated incidents in manufacturing. Two case studies with real-world data and carefully designed interviews with domain experts demonstrate the effectiveness and usability of PlanningVis.