CLOct 19, 2023Code
NameGuess: Column Name Expansion for Tabular DataJiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan et al.
Recent advances in large language models have revolutionized many sectors, including the database industry. One common challenge when dealing with large volumes of tabular data is the pervasive use of abbreviated column names, which can negatively impact performance on various data search, access, and understanding tasks. To address this issue, we introduce a new task, called NameGuess, to expand column names (used in database schema) as a natural language generation problem. We create a training dataset of 384K abbreviated-expanded column pairs using a new data fabrication method and a human-annotated evaluation benchmark that includes 9.2K examples from real-world tables. To tackle the complexities associated with polysemy and ambiguity in NameGuess, we enhance auto-regressive language models by conditioning on table content and column header names -- yielding a fine-tuned model (with 2.7B parameters) that matches human performance. Furthermore, we conduct a comprehensive analysis (on multiple LLMs) to validate the effectiveness of table content in NameGuess and identify promising future opportunities. Code has been made available at https://github.com/amazon-science/nameguess.
LGApr 20, 2023
Train Your Own GNN Teacher: Graph-Aware Distillation on Textual GraphsCostas Mavromatis, Vassilis N. Ioannidis, Shen Wang et al. · amazon-science
How can we learn effective node representations on textual graphs? Graph Neural Networks (GNNs) that use Language Models (LMs) to encode textual information of graphs achieve state-of-the-art performance in many node classification tasks. Yet, combining GNNs with LMs has not been widely explored for practical deployments due to its scalability issues. In this work, we tackle this challenge by developing a Graph-Aware Distillation framework (GRAD) to encode graph structures into an LM for graph-free, fast inference. Different from conventional knowledge distillation, GRAD jointly optimizes a GNN teacher and a graph-free student over the graph's nodes via a shared LM. This encourages the graph-free student to exploit graph information encoded by the GNN teacher while at the same time, enables the GNN teacher to better leverage textual information from unlabeled nodes. As a result, the teacher and the student models learn from each other to improve their overall performance. Experiments in eight node classification benchmarks in both transductive and inductive settings showcase GRAD's superiority over existing distillation approaches for textual graphs.
CLMay 18, 2022
Entailment Tree Explanations via Iterative Retrieval-Generation ReasonerDanilo Ribeiro, Shen Wang, Xiaofei Ma et al. · amazon-science
Large language models have achieved high performance on various question answering (QA) benchmarks, but the explainability of their output remains elusive. Structured explanations, called entailment trees, were recently suggested as a way to explain and inspect a QA system's answer. In order to better generate such entailment trees, we propose an architecture called Iterative Retrieval-Generation Reasoner (IRGR). Our model is able to explain a given hypothesis by systematically generating a step-by-step explanation from textual premises. The IRGR model iteratively searches for suitable premises, constructing a single entailment step at a time. Contrary to previous approaches, our method combines generation steps and retrieval of premises, allowing the model to leverage intermediate conclusions, and mitigating the input size limit of baseline encoder-decoder models. We conduct experiments using the EntailmentBank dataset, where we outperform existing benchmarks on premise retrieval and entailment tree generation, with around 300% gain in overall correctness.
CLFeb 13, 2023
STREET: A Multi-Task Structured Reasoning and Explanation BenchmarkDanilo Ribeiro, Shen Wang, Xiaofei Ma et al. · amazon-science
We introduce STREET, a unified multi-task and multi-domain natural language reasoning and explanation benchmark. Unlike most existing question-answering (QA) datasets, we expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer. We perform extensive evaluation with popular language models such as few-shot prompting GPT-3 and fine-tuned T5. We find that these models still lag behind human performance when producing such structured reasoning steps. We believe this work will provide a way for the community to better train and test systems on multi-step reasoning and explanations in natural language.
99.6CLApr 20Code
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error DetectionYibo Yan, Shen Wang, Jiahao Huo et al.
As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their potential to revolutionize artificial intelligence is particularly promising, especially in addressing mathematical reasoning tasks. Current mathematical benchmarks predominantly focus on evaluating MLLMs' problem-solving ability, yet there is a crucial gap in addressing more complex scenarios such as error detection, for enhancing reasoning capability in complicated settings. To fill this gap, we formally formulate the new task: multimodal error detection, and introduce ErrorRadar, the first benchmark designed to assess MLLMs' capabilities in such a task. ErrorRadar evaluates two sub-tasks: error step identification and error categorization, providing a comprehensive framework for evaluating MLLMs' complex mathematical reasoning ability. It consists of 2,500 high-quality multimodal K-12 mathematical problems, collected from real-world student interactions in an educational organization, with rigorous annotation and rich metadata such as problem type and error category. Through extensive experiments, we evaluated both open-source and closed-source representative MLLMs, benchmarking their performance against educational expert evaluators. Results indicate significant challenges still remain, as GPT-4o with best performance is still around 10% behind human evaluation.
LGSep 10, 2023Code
AVARS -- Alleviating Unexpected Urban Road Traffic Congestion using UAVsJiaying Guo, Michael R. Jones, Soufiene Djahel et al.
Reducing unexpected urban traffic congestion caused by en-route events (e.g., road closures, car crashes, etc.) often requires fast and accurate reactions to choose the best-fit traffic signals. Traditional traffic light control systems, such as SCATS and SCOOT, are not efficient as their traffic data provided by induction loops has a low update frequency (i.e., longer than 1 minute). Moreover, the traffic light signal plans used by these systems are selected from a limited set of candidate plans pre-programmed prior to unexpected events' occurrence. Recent research demonstrates that camera-based traffic light systems controlled by deep reinforcement learning (DRL) algorithms are more effective in reducing traffic congestion, in which the cameras can provide high-frequency high-resolution traffic data. However, these systems are costly to deploy in big cities due to the excessive potential upgrades required to road infrastructure. In this paper, we argue that Unmanned Aerial Vehicles (UAVs) can play a crucial role in dealing with unexpected traffic congestion because UAVs with onboard cameras can be economically deployed when and where unexpected congestion occurs. Then, we propose a system called "AVARS" that explores the potential of using UAVs to reduce unexpected urban traffic congestion using DRL-based traffic light signal control. This approach is validated on a widely used open-source traffic simulator with practical UAV settings, including its traffic monitoring ranges and battery lifetime. Our simulation results show that AVARS can effectively recover the unexpected traffic congestion in Dublin, Ireland, back to its original un-congested level within the typical battery life duration of a UAV.
SYApr 5, 2018
Blockchain-Assisted Crowdsourced Energy SystemsShen Wang, Ahmad Taha, Jianhui Wang
Crowdsourcing relies on people's contributions to meet product- or system-level objectives. Crowdsourcing-based methods have been implemented in various cyber-physical systems and realtime markets. This paper explores a framework for Crowdsourced Energy Systems (CES), where small-scale energy generation or energy trading is crowdsourced from distributed energy resources, electric vehicles, and shapable loads. The merits/pillars of energy crowdsourcing are discussed. Then, an operational model for CESs in distribution networks with different types of crowdsourcees is proposed. The model yields a market equilibrium depicting traditional and distributed generator and load setpoints. Given these setpoints, crowdsourcing incentives are designed to steer crowdsourcees to the equilibrium. As the number of crowdsourcees and energy trading transactions scales up, a secure energy trading platform is required. To that end, the presented framework is integrated with a lightweight Blockchain implementation and smart contracts. Numerical tests are provided to showcase the overall implementation.
NIApr 27, 2022
A Survey on XAI for 5G and Beyond Security: Technical Aspects, Challenges and Research DirectionsThulitha Senevirathna, Vinh Hoa La, Samuel Marchal et al.
With the advent of 5G commercialization, the need for more reliable, faster, and intelligent telecommunication systems is envisaged for the next generation beyond 5G (B5G) radio access technologies. Artificial Intelligence (AI) and Machine Learning (ML) are immensely popular in service layer applications and have been proposed as essential enablers in many aspects of 5G and beyond networks, from IoT devices and edge computing to cloud-based infrastructures. However, existing 5G ML-based security surveys tend to emphasize AI/ML model performance and accuracy more than the models' accountability and trustworthiness. In contrast, this paper explores the potential of Explainable AI (XAI) methods, which would allow stakeholders in 5G and beyond to inspect intelligent black-box systems used to secure next-generation networks. The goal of using XAI in the security domain of 5G and beyond is to allow the decision-making processes of ML-based security systems to be transparent and comprehensible to 5G and beyond stakeholders, making the systems accountable for automated actions. In every facet of the forthcoming B5G era, including B5G technologies such as ORAN, zero-touch network management, and end-to-end slicing, this survey emphasizes the role of XAI in them that the general users would ultimately enjoy. Furthermore, we presented the lessons from recent efforts and future research directions on top of the currently conducted projects involving XAI.
IRMay 18, 2022
Debiasing Neural Retrieval via In-batch Balancing RegularizationYuantong Li, Xiaokai Wei, Zijian Wang et al. · amazon-science, stanford
People frequently interact with information retrieval (IR) systems, however, IR models exhibit biases and discrimination towards various demographics. The in-processing fair ranking methods provide a trade-offs between accuracy and fairness through adding a fairness-related regularization term in the loss function. However, there haven't been intuitive objective functions that depend on the click probability and user engagement to directly optimize towards this. In this work, we propose the In-Batch Balancing Regularization (IBBR) to mitigate the ranking disparity among subgroups. In particular, we develop a differentiable \textit{normed Pairwise Ranking Fairness} (nPRF) and leverage the T-statistics on top of nPRF over subgroups as a regularization to improve fairness. Empirical results with the BERT-based neural rankers on the MS MARCO Passage Retrieval dataset with the human-annotated non-gendered queries benchmark \citep{rekabsaz2020neural} show that our IBBR method with nPRF achieves significantly less bias with minimal degradation in ranking performance compared with the baseline.
LGJan 31, 2023
OrthoReg: Improving Graph-regularized MLPs via Orthogonality RegularizationHengrui Zhang, Shen Wang, Vassilis N. Ioannidis et al.
Graph Neural Networks (GNNs) are currently dominating in modeling graph-structure data, while their high reliance on graph structure for inference significantly impedes them from widespread applications. By contrast, Graph-regularized MLPs (GR-MLPs) implicitly inject the graph structure information into model weights, while their performance can hardly match that of GNNs in most tasks. This motivates us to study the causes of the limited performance of GR-MLPs. In this paper, we first demonstrate that node embeddings learned from conventional GR-MLPs suffer from dimensional collapse, a phenomenon in which the largest a few eigenvalues dominate the embedding space, through empirical observations and theoretical analysis. As a result, the expressive power of the learned node representations is constrained. We further propose OrthoReg, a novel GR-MLP model to mitigate the dimensional collapse issue. Through a soft regularization loss on the correlation matrix of node embeddings, OrthoReg explicitly encourages orthogonal node representations and thus can naturally avoid dimensionally collapsed representations. Experiments on traditional transductive semi-supervised classification tasks and inductive node classification for cold-start scenarios demonstrate its effectiveness and superiority.
OCFeb 16, 2019
Geometric Programming-Based Control for Nonlinear, DAE-Constrained Water Distribution NetworksShen Wang, Ahmad F. Taha, Nikolaos Gatsis et al.
Control of water distribution networks (WDNs) can be represented by an optimization problem with hydraulic models describing the nonlinear relationship between head loss, water flow, and demand. The problem is difficult to solve due to the non-convexity in the equations governing water flow. Previous methods used to obtain WDN controls (i.e., operational schedules for pumps and valves) have adopted simplified hydraulic models. One common assumption found in the literature is the modification of WDN topology to exclude loops and assume a known water flow direction. In this paper, we present a new geometric programming-based model predictive control approach, designed to solve the water flow equations and obtain WDN controls. The paper considers the nonlinear difference algebraic equation (DAE) form of the WDN dynamics, and the GP approach amounts to solving a series of convex optimization problems and requires neither the knowledge of water flow direction nor does it restrict the water network topology. A case study is presented to illustrate the performance of the proposed method.
CVAug 6, 2022Code
Analyzing Deep Learning Based Brain Tumor Segmentation with Missing MRI ModalitiesBenteng Ma, Yushi Wang, Shen Wang
This technical report presents a comparative analysis of existing deep learning (DL) based approaches for brain tumor segmentation with missing MRI modalities. Approaches evaluated include the Adversarial Co-training Network (ACN) and a combination of mmGAN and DeepMedic. A more stable and easy-to-use version of mmGAN is also open-sourced at a GitHub repository. Using the BraTS2018 dataset, this work demonstrates that the state-of-the-art ACN performs better especially when T1c is missing. While a simple combination of mmGAN and DeepMedic also shows strong potentials when only one MRI modality is missing. Additionally, this work initiated discussions with future research directions for brain tumor segmentation with missing MRI modalities.
CVSep 14, 2024
AI-Driven Virtual Teacher for Enhanced Educational Efficiency: Leveraging Large Pretrain Models for Autonomous Error Analysis and CorrectionTianlong Xu, Yi-Fan Zhang, Zhendong Chu et al.
Students frequently make mistakes while solving mathematical problems, and traditional error correction methods are both time-consuming and labor-intensive. This paper introduces an innovative \textbf{V}irtual \textbf{A}I \textbf{T}eacher system designed to autonomously analyze and correct student \textbf{E}rrors (VATE). Leveraging advanced large language models (LLMs), the system uses student drafts as a primary source for error analysis, which enhances understanding of the student's learning process. It incorporates sophisticated prompt engineering and maintains an error pool to reduce computational overhead. The AI-driven system also features a real-time dialogue component for efficient student interaction. Our approach demonstrates significant advantages over traditional and machine learning-based error correction methods, including reduced educational costs, high scalability, and superior generalizability. The system has been deployed on the Squirrel AI learning platform for elementary mathematics education, where it achieves 78.3\% accuracy in error analysis and shows a marked improvement in student learning efficiency. Satisfaction surveys indicate a strong positive reception, highlighting the system's potential to transform educational practices.
IVSep 15, 2023
MIML: Multiplex Image Machine Learning for High Precision Cell Classification via Mechanical Traits within Microfluidic SystemsKhayrul Islam, Ratul Paul, Shen Wang et al.
Label-free cell classification is advantageous for supplying pristine cells for further use or examination, yet existing techniques frequently fall short in terms of specificity and speed. In this study, we address these limitations through the development of a novel machine learning framework, Multiplex Image Machine Learning (MIML). This architecture uniquely combines label-free cell images with biomechanical property data, harnessing the vast, often underutilized morphological information intrinsic to each cell. By integrating both types of data, our model offers a more holistic understanding of the cellular properties, utilizing morphological information typically discarded in traditional machine learning models. This approach has led to a remarkable 98.3\% accuracy in cell classification, a substantial improvement over models that only consider a single data type. MIML has been proven effective in classifying white blood cells and tumor cells, with potential for broader application due to its inherent flexibility and transfer learning capability. It's particularly effective for cells with similar morphology but distinct biomechanical properties. This innovative approach has significant implications across various fields, from advancing disease diagnostics to understanding cellular behavior.
IRFeb 26
From Agnostic to Specific: Latent Preference Diffusion for Multi-Behavior Sequential RecommendationRuochen Yang, Xiaodong Li, Jiawei Sheng et al.
Multi-behavior sequential recommendation (MBSR) aims to learn the dynamic and heterogeneous interactions of users' multi-behavior sequences, so as to capture user preferences under target behavior for the next interacted item prediction. Unlike previous methods that adopt unidirectional modeling by mapping auxiliary behaviors to target behavior, recent concerns are shifting from behavior-fixed to behavior-specific recommendation. However, these methods still ignore the user's latent preference that underlying decision-making, leading to suboptimal solutions. Meanwhile, due to the asymmetric deterministic between items and behaviors, discriminative paradigm based on preference scoring is unsuitable to capture the uncertainty from low-entropy behaviors to high-entropy items, failing to provide efficient and diverse recommendation. To address these challenges, we propose \textbf{FatsMB}, a framework based diffusion model that guides preference generation \textit{\textbf{F}rom Behavior-\textbf{A}gnostic \textbf{T}o Behavior-\textbf{S}pecific} in latent spaces, enabling diverse and accurate \textit{\textbf{M}ulti-\textbf{B}ehavior Sequential Recommendation}. Specifically, we design a Multi-Behavior AutoEncoder (MBAE) to construct a unified user latent preference space, facilitating interaction and collaboration across Behaviors, within Behavior-aware RoPE (BaRoPE) employed for multiple information fusion. Subsequently, we conduct target behavior-specific preference transfer in the latent space, enriching with informative priors. A Multi-Condition Guided Layer Normalization (MCGLN) is introduced for the denoising. Extensive experiments on real-world datasets demonstrate the effectiveness of our model.
CVFeb 9, 2025Code
Linear Attention Modeling for Learned Image CompressionDonghui Feng, Zhengxue Cheng, Shen Wang et al.
Recent years, learned image compression has made tremendous progress to achieve impressive coding efficiency. Its coding gain mainly comes from non-linear neural network-based transform and learnable entropy modeling. However, most studies focus on a strong backbone, and few studies consider a low complexity design. In this paper, we propose LALIC, a linear attention modeling for learned image compression. Specially, we propose to use Bi-RWKV blocks, by utilizing the Spatial Mix and Channel Mix modules to achieve more compact feature extraction, and apply the Conv based Omni-Shift module to adapt to two-dimensional latent representation. Furthermore, we propose a RWKV-based Spatial-Channel ConTeXt model (RWKV-SCCTX), that leverages the Bi-RWKV to modeling the correlation between neighboring features effectively. To our knowledge, our work is the first work to utilize efficient Bi-RWKV models with linear attention for learned image compression. Experimental results demonstrate that our method achieves competitive RD performances by outperforming VTM-9.1 by -15.26%, -15.41%, -17.63% in BD-rate on Kodak, CLIC and Tecnick datasets. The code is available at https://github.com/sjtu-medialab/RwkvCompress .
CLMar 26, 2024
Large Language Models for Education: A Survey and OutlookShen Wang, Tianlong Xu, Hang Li et al.
The advent of Large Language Models (LLMs) has brought in a new era of possibilities in the realm of education. This survey paper summarizes the various technologies of LLMs in educational settings from multifaceted perspectives, encompassing student and teacher assistance, adaptive learning, and commercial tools. We systematically review the technological advancements in each perspective, organize related datasets and benchmarks, and identify the risks and challenges associated with deploying LLMs in education. Furthermore, we outline future research opportunities, highlighting the potential promising directions. Our survey aims to provide a comprehensive technological picture for educators, researchers, and policymakers to harness the power of LLMs to revolutionize educational practices and foster a more effective personalized learning environment.
CVMay 30, 2025Code
Decoupled Competitive Framework for Semi-supervised Medical Image SegmentationJiahe Chen, Jiahe Ying, Shen Wang et al.
Confronting the critical challenge of insufficiently annotated samples in medical domain, semi-supervised medical image segmentation (SSMIS) emerges as a promising solution. Specifically, most methodologies following the Mean Teacher (MT) or Dual Students (DS) architecture have achieved commendable results. However, to date, these approaches face a performance bottleneck due to two inherent limitations, \textit{e.g.}, the over-coupling problem within MT structure owing to the employment of exponential moving average (EMA) mechanism, as well as the severe cognitive bias between two students of DS structure, both of which potentially lead to reduced efficacy, or even model collapse eventually. To mitigate these issues, a Decoupled Competitive Framework (DCF) is elaborated in this work, which utilizes a straightforward competition mechanism for the update of EMA, effectively decoupling students and teachers in a dynamical manner. In addition, the seamless exchange of invaluable and precise insights is facilitated among students, guaranteeing a better learning paradigm. The DCF introduced undergoes rigorous validation on three publicly accessible datasets, which encompass both 2D and 3D datasets. The results demonstrate the superiority of our method over previous cutting-edge competitors. Code will be available at https://github.com/JiaheChen2002/DCF.
IRJun 11, 2021Code
Modeling Sequences as Distributions with Uncertainty for Sequential RecommendationZiwei Fan, Zhiwei Liu, Lei Zheng et al.
The sequential patterns within the user interactions are pivotal for representing the user's preference and capturing latent relationships among items. The recent advancements of sequence modeling by Transformers advocate the community to devise more effective encoders for the sequential recommendation. Most existing sequential methods assume users are deterministic. However, item-item transitions might fluctuate significantly in several item aspects and exhibit randomness of user interests. This \textit{stochastic characteristics} brings up a solid demand to include uncertainties in representing sequences and items. Additionally, modeling sequences and items with uncertainties expands users' and items' interaction spaces, thus further alleviating cold-start problems. In this work, we propose a Distribution-based Transformer for Sequential Recommendation (DT4SR), which injects uncertainties into sequential modeling. We use Elliptical Gaussian distributions to describe items and sequences with uncertainty. We describe the uncertainty in items and sequences as Elliptical Gaussian distribution. And we adopt Wasserstein distance to measure the similarity between distributions. We devise two novel Trans-formers for modeling mean and covariance, which guarantees the positive-definite property of distributions. The proposed method significantly outperforms the state-of-the-art methods. The experiments on three benchmark datasets also demonstrate its effectiveness in alleviating cold-start issues. The code is available inhttps://github.com/DyGRec/DT4SR.
CVJul 10, 2020Code
Context-Aware Refinement Network Incorporating Structural Connectivity Prior for Brain Midline DelineationShen Wang, Kongming Liang, Yiming Li et al.
Brain midline delineation can facilitate the clinical evaluation of brain midline shift, which plays an important role in the diagnosis and prognosis of various brain pathology. Nevertheless, there are still great challenges with brain midline delineation, such as the largely deformed midline caused by the mass effect and the possible morphological failure that the predicted midline is not a connected curve. To address these challenges, we propose a context-aware refinement network (CAR-Net) to refine and integrate the feature pyramid representation generated by the UNet. Consequently, the proposed CAR-Net explores more discriminative contextual features and a larger receptive field, which is of great importance to predict largely deformed midline. For keeping the structural connectivity of the brain midline, we introduce a novel connectivity regular loss (CRL) to punish the disconnectivity between adjacent coordinates. Moreover, we address the ignored prerequisite of previous regression-based methods that the brain CT image must be in the standard pose. A simple pose rectification network is presented to align the source input image to the standard pose image. Extensive experimental results on the CQ dataset and one inhouse dataset show that the proposed method requires fewer parameters and outperforms three state-of-the-art methods in terms of four evaluation metrics. Code is available at https://github.com/ShawnBIT/Brain-Midline-Detection.
CLDec 16, 2024
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & ChallengesYibo Yan, Jiamin Su, Jianxiang He et al.
Mathematical reasoning, a core aspect of human cognition, is vital across many domains, from educational problem-solving to scientific advancements. As artificial general intelligence (AGI) progresses, integrating large language models (LLMs) with mathematical reasoning tasks is becoming increasingly significant. This survey provides the first comprehensive analysis of mathematical reasoning in the era of multimodal large language models (MLLMs). We review over 200 studies published since 2021, and examine the state-of-the-art developments in Math-LLMs, with a focus on multimodal settings. We categorize the field into three dimensions: benchmarks, methodologies, and challenges. In particular, we explore multimodal mathematical reasoning pipeline, as well as the role of (M)LLMs and the associated methodologies. Finally, we identify five major challenges hindering the realization of AGI in this domain, offering insights into the future direction for enhancing multimodal reasoning capabilities. This survey serves as a critical resource for the research community in advancing the capabilities of LLMs to tackle complex multimodal reasoning tasks.
CYMar 14, 2025
LLM Agents for Education: Advances and ApplicationsZhendong Chu, Shen Wang, Jian Xie et al.
Large Language Model (LLM) agents have demonstrated remarkable capabilities in automating tasks and driving innovation across diverse educational applications. In this survey, we provide a systematic review of state-of-the-art research on LLM agents in education, categorizing them into two broad classes: (1) \emph{Pedagogical Agents}, which focus on automating complex pedagogical tasks to support both teachers and students; and (2) \emph{Domain-Specific Educational Agents}, which are tailored for specialized fields such as science education, language learning, and professional development. We comprehensively examine the technological advancements underlying these LLM agents, including key datasets, benchmarks, and algorithmic frameworks that drive their effectiveness. Furthermore, we discuss critical challenges such as privacy, bias and fairness concerns, hallucination mitigation, and integration with existing educational ecosystems. This survey aims to provide a comprehensive technological overview of LLM agents for education, fostering further research and collaboration to enhance their impact for the greater good of learners and educators alike.
CLFeb 5, 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific ReasoningYibo Yan, Shen Wang, Jiahao Huo et al.
Scientific reasoning, the process through which humans apply logic, evidence, and critical thinking to explore and interpret scientific phenomena, is essential in advancing knowledge reasoning across diverse fields. However, despite significant progress, current scientific reasoning models still struggle with generalization across domains and often fall short of multimodal perception. Multimodal Large Language Models (MLLMs), which integrate text, images, and other modalities, present an exciting opportunity to overcome these limitations and enhance scientific reasoning. Therefore, this position paper argues that MLLMs can significantly advance scientific reasoning across disciplines such as mathematics, physics, chemistry, and biology. First, we propose a four-stage research roadmap of scientific reasoning capabilities, and highlight the current state of MLLM applications in scientific reasoning, noting their ability to integrate and reason over diverse data types. Second, we summarize the key challenges that remain obstacles to achieving MLLM's full potential. To address these challenges, we propose actionable insights and suggestions for the future. Overall, our work offers a novel perspective on MLLM integration with scientific reasoning, providing the LLM community with a valuable vision for achieving Artificial General Intelligence (AGI).
ITDec 8, 2025
Radiance-Field Reinforced Pretraining: Scaling Localization Models with Unlabeled Wireless SignalsGuosheng Wang, Shen Wang, Lei Yang
Radio frequency (RF)-based indoor localization offers significant promise for applications such as indoor navigation, augmented reality, and pervasive computing. While deep learning has greatly enhanced localization accuracy and robustness, existing localization models still face major challenges in cross-scene generalization due to their reliance on scene-specific labeled data. To address this, we introduce Radiance-Field Reinforced Pretraining (RFRP). This novel self-supervised pretraining framework couples a large localization model (LM) with a neural radio-frequency radiance field (RF-NeRF) in an asymmetrical autoencoder architecture. In this design, the LM encodes received RF spectra into latent, position-relevant representations, while the RF-NeRF decodes them to reconstruct the original spectra. This alignment between input and output enables effective representation learning using large-scale, unlabeled RF data, which can be collected continuously with minimal effort. To this end, we collected RF samples at 7,327,321 positions across 100 diverse scenes using four common wireless technologies--RFID, BLE, WiFi, and IIoT. Data from 75 scenes were used for training, and the remaining 25 for evaluation. Experimental results show that the RFRP-pretrained LM reduces localization error by over 40% compared to non-pretrained models and by 21% compared to those pretrained using supervised learning.
CLMar 23, 2025
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error DetectionYibo Yan, Shen Wang, Jiahao Huo et al.
Mathematical error detection in educational settings presents a significant challenge for Multimodal Large Language Models (MLLMs), requiring a sophisticated understanding of both visual and textual mathematical content along with complex reasoning capabilities. Though effective in mathematical problem-solving, MLLMs often struggle with the nuanced task of identifying and categorizing student errors in multimodal mathematical contexts. Therefore, we introduce MathAgent, a novel Mixture-of-Math-Agent framework designed specifically to address these challenges. Our approach decomposes error detection into three phases, each handled by a specialized agent: an image-text consistency validator, a visual semantic interpreter, and an integrative error analyzer. This architecture enables more accurate processing of mathematical content by explicitly modeling relationships between multimodal problems and student solution steps. We evaluate MathAgent on real-world educational data, demonstrating approximately 5% higher accuracy in error step identification and 3% improvement in error categorization compared to baseline models. Besides, MathAgent has been successfully deployed in an educational platform that has served over one million K-12 students, achieving nearly 90% student satisfaction while generating significant cost savings by reducing manual error detection.
DCJan 2, 2025
Deep Reinforcement Learning for Job Scheduling and Resource Management in Cloud Computing: An Algorithm-Level ReviewYan Gu, Zhaoze Liu, Shuhong Dai et al.
Cloud computing has revolutionized the provisioning of computing resources, offering scalable, flexible, and on-demand services to meet the diverse requirements of modern applications. At the heart of efficient cloud operations are job scheduling and resource management, which are critical for optimizing system performance and ensuring timely and cost-effective service delivery. However, the dynamic and heterogeneous nature of cloud environments presents significant challenges for these tasks, as workloads and resource availability can fluctuate unpredictably. Traditional approaches, including heuristic and meta-heuristic algorithms, often struggle to adapt to these real-time changes due to their reliance on static models or predefined rules. Deep Reinforcement Learning (DRL) has emerged as a promising solution to these challenges by enabling systems to learn and adapt policies based on continuous observations of the environment, facilitating intelligent and responsive decision-making. This survey provides a comprehensive review of DRL-based algorithms for job scheduling and resource management in cloud computing, analyzing their methodologies, performance metrics, and practical applications. We also highlight emerging trends and future research directions, offering valuable insights into leveraging DRL to advance both job scheduling and resource management in cloud computing.
CLFeb 8, 2025
Position: LLMs Can be Good Tutors in English EducationJingheng Ye, Shen Wang, Deqing Zou et al.
While recent efforts have begun integrating large language models (LLMs) into English education, they often rely on traditional approaches to learning tasks without fully embracing educational methodologies, thus lacking adaptability to language learning. To address this gap, we argue that LLMs have the potential to serve as effective tutors in English Education. Specifically, LLMs can play three critical roles: (1) as data enhancers, improving the creation of learning materials or serving as student simulations; (2) as task predictors, serving as learner assessment or optimizing learning pathway; and (3) as agents, enabling personalized and inclusive education. We encourage interdisciplinary research to explore these roles, fostering innovation while addressing challenges and risks, ultimately advancing English Education through the thoughtful integration of LLMs.
29.8ROApr 13
HECTOR: Human-centric Hierarchical Coordination and Supervision of Robotic Fleets under Continual Temporal TasksShen Wang, Yinhang Luo, Jie Li et al.
Robotic fleets can be extremely efficient when working concurrently and collaboratively, e.g., for delivery, surveillance, search and rescue. However, it can be demanding or even impractical for an operator to directly control each robot. Thus, autonomy of the fleet and its online interaction with the operator are both essential, particularly in dynamic and partially unknown environments. The operator might need to add new tasks, cancel some tasks, change priorities and modify planning results. How to design the procedure for these interactions and efficient algorithms to fulfill these needs have been mostly neglected in the related literature. Thus, this work proposes a human-centric coordination and supervision scheme (HECTOR) for large-scale robotic fleets under continual and uncertain temporal tasks. It consists of three hierarchical layers: (I) the bidirectional and multimodal protocol of online human-fleet interaction, where the operator interacts with and supervises the whole fleet; (II) the rolling assignment of currently-known tasks to teams within a certain horizon, and (III) the dynamic coordination within a team given the detected subtasks during online execution. The overall mission can be as general as temporal logic formulas over collaborative actions. Such hierarchical structure allows human interaction and supervision at different granularities and triggering conditions, to both improve computational efficiency and reduce human effort. Extensive human-in-the-loop simulations are performed over heterogeneous fleets under various temporal tasks and environmental uncertainties.
CLMar 26, 2025
UniEDU: A Unified Language and Vision Assistant for Education ApplicationsZhendong Chu, Jian Xie, Shen Wang et al.
Education materials for K-12 students often consist of multiple modalities, such as text and images, posing challenges for models to fully understand nuanced information in these materials. In this paper, we propose a unified language and vision assistant UniEDU designed for various educational applications, including knowledge recommendation, knowledge tracing, time cost prediction, and user answer prediction, all within a single model. Unlike conventional task-specific models, UniEDU offers a unified solution that excels across multiple educational tasks while maintaining strong generalization capabilities. Its adaptability makes it well-suited for real-world deployment in diverse learning environments. Furthermore, UniEDU is optimized for industry-scale deployment by significantly reducing computational overhead-achieving approximately a 300\% increase in efficiency-while maintaining competitive performance with minimal degradation compared to fully fine-tuned models. This work represents a significant step toward creating versatile AI systems tailored to the evolving demands of education.
CLFeb 21, 2025
Corrections Meet Explanations: A Unified Framework for Explainable Grammatical Error CorrectionJingheng Ye, Shang Qin, Yinghui Li et al.
Grammatical Error Correction (GEC) faces a critical challenge concerning explainability, notably when GEC systems are designed for language learners. Existing research predominantly focuses on explaining grammatical errors extracted in advance, thus neglecting the relationship between explanations and corrections. To address this gap, we introduce EXGEC, a unified explainable GEC framework that integrates explanation and correction tasks in a generative manner, advocating that these tasks mutually reinforce each other. Experiments have been conducted on EXPECT, a recent human-labeled dataset for explainable GEC, comprising around 20k samples. Moreover, we detect significant noise within EXPECT, potentially compromising model training and evaluation. Therefore, we introduce an alternative dataset named EXPECT-denoised, ensuring a more objective framework for training and evaluation. Results on various NLP models (BART, T5, and Llama3) show that EXGEC models surpass single-task baselines in both tasks, demonstrating the effectiveness of our approach.
CLNov 28, 2025
FEANEL: A Benchmark for Fine-Grained Error Analysis in K-12 English WritingJingheng Ye, Shen Wang, Jiaqi Chen et al.
Large Language Models (LLMs) have transformed artificial intelligence, offering profound opportunities for educational applications. However, their ability to provide fine-grained educational feedback for K-12 English writing remains underexplored. In this paper, we challenge the error analysis and pedagogical skills of LLMs by introducing the problem of Fine-grained Error Analysis for English Learners and present the Fine-grained Error ANalysis for English Learners (FEANEL) Benchmark. The benchmark comprises 1,000 essays written by elementary and secondary school students, and a well-developed English writing error taxonomy. Each error is annotated by language education experts and categorized by type, severity, and explanatory feedback, using a part-of-speech-based taxonomy they co-developed. We evaluate state-of-the-art LLMs on the FEANEL Benchmark to explore their error analysis and pedagogical abilities. Experimental results reveal significant gaps in current LLMs' ability to perform fine-grained error analysis, highlighting the need for advancements in particular methods for educational applications.
CRJun 10, 2024
An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong DetectionShenao Yan, Shen Wang, Yue Duan et al.
Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CodeBreaker, a pioneering LLM-assisted backdoor attack framework on code completion models. Unlike recent attacks that embed malicious payloads in detectable or irrelevant sections of the code (e.g., comments), CodeBreaker leverages LLMs (e.g., GPT-4) for sophisticated payload transformation (without affecting functionalities), ensuring that both the poisoned data for fine-tuning and generated code can evade strong vulnerability detection. CodeBreaker stands out with its comprehensive coverage of vulnerabilities, making it the first to provide such an extensive set for evaluation. Our extensive experimental evaluations and user studies underline the strong attack performance of CodeBreaker across various settings, validating its superiority over existing approaches. By integrating malicious payloads directly into the source code with minimal transformation, CodeBreaker challenges current security measures, underscoring the critical need for more robust defenses for code completion.
AIJan 31, 2022
CoTV: Cooperative Control for Traffic Light Signals and Connected Autonomous Vehicles using Deep Reinforcement LearningJiaying Guo, Long Cheng, Shen Wang
The target of reducing travel time only is insufficient to support the development of future smart transportation systems. To align with the United Nations Sustainable Development Goals (UN-SDG), a further reduction of fuel and emissions, improvements of traffic safety, and the ease of infrastructure deployment and maintenance should also be considered. Different from existing work focusing on the optimization of the control in either traffic light signal (to improve the intersection throughput), or vehicle speed (to stabilize the traffic), this paper presents a multi-agent Deep Reinforcement Learning (DRL) system called CoTV, which Cooperatively controls both Traffic light signals and Connected Autonomous Vehicles (CAV). Therefore, our CoTV can well balance the achievement of the reduction of travel time, fuel, and emissions. In the meantime, CoTV can also be easy to deploy by cooperating with only one CAV that is the nearest to the traffic light controller on each incoming road. This enables more efficient coordination between traffic light controllers and CAV, thus leading to the convergence of training CoTV under the large-scale multi-agent scenario that is traditionally difficult to converge. We give the detailed system design of CoTV and demonstrate its effectiveness in a simulation study using SUMO under various grid maps and realistic urban scenarios with mixed-autonomy traffic.
CEDec 13, 2021
Stacked Generative Machine Learning Models for Fast Approximations of Steady-State Navier-Stokes EquationsShen Wang, Mehdi Nikfar, Joshua C. Agar et al.
Computational fluid dynamics (CFD) simulations are broadly applied in engineering and physics. A standard description of fluid dynamics requires solving the Navier-Stokes (N-S) equations in different flow regimes. However, applications of CFD simulations are computationally-limited by the availability, speed, and parallelism of high-performance computing. To improve computational efficiency, machine learning techniques have been used to create accelerated data-driven approximations for CFD. A majority of such approaches rely on large labeled CFD datasets that are expensive to obtain at the scale necessary to build robust data-driven models. We develop a weakly-supervised approach to solve the steady-state N-S equations under various boundary conditions, using a multi-channel input with boundary and geometric conditions. We achieve state-of-the-art results without any labeled simulation data, but using a custom data-driven and physics-informed loss function by using and small-scale solutions to prime the model to solve the N-S equations. To improve the resolution and predictability, we train stacked models of increasing complexity generating the numerical solutions for N-S equations. Without expensive computations, our model achieves high predictability with a variety of obstacles and boundary conditions. Given its high flexibility, the model can generate a solution on a 64 x 64 domain within 5 ms on a regular desktop computer which is 1000 times faster than a regular CFD solver. Translation of interactive CFD simulation on local consumer computing hardware enables new applications in real-time predictions on the internet of things devices where data transfer is prohibitive and can increase the scale, speed, and computational cost of boundary-value fluid problems.
NIDec 9, 2021
Applications of Explainable AI for 6G: Technical Aspects, Use Cases, and Research ChallengesShen Wang, M. Atif Qureshi, Luis Miralles-Pechuán et al.
When 5G began its commercialisation journey around 2020, the discussion on the vision of 6G also surfaced. Researchers expect 6G to have higher bandwidth, coverage, reliability, energy efficiency, lower latency, and an integrated "human-centric" network system powered by artificial intelligence (AI). Such a 6G network will lead to an excessive number of automated decisions made in real-time. These decisions can range widely, from network resource allocation to collision avoidance for self-driving cars. However, the risk of losing control over decision-making may increase due to high-speed, data-intensive AI decision-making beyond designers' and users' comprehension. The promising explainable AI (XAI) methods can mitigate such risks by enhancing the transparency of the black-box AI decision-making process. This paper surveys the application of XAI towards the upcoming 6G age in every aspect, including 6G technologies (e.g., intelligent radio, zero-touch network management) and 6G use cases (e.g., industry 5.0). Moreover, we summarised the lessons learned from the recent attempts and outlined important research challenges in applying XAI for 6G in the near future.
CVOct 19, 2021
Fine-Grained Control of Artistic Styles in Image GenerationXin Miao, Huayan Wang, Jun Fu et al.
Recent advances in generative models and adversarial training have enabled artificially generating artworks in various artistic styles. It is highly desirable to gain more control over the generated style in practice. However, artistic styles are unlike object categories -- there are a continuous spectrum of styles distinguished by subtle differences. Few works have been explored to capture the continuous spectrum of styles and apply it to a style generation task. In this paper, we propose to achieve this by embedding original artwork examples into a continuous style space. The style vectors are fed to the generator and discriminator to achieve fine-grained control. Our method can be used with common generative adversarial networks (such as StyleGAN). Experiments show that our method not only precisely controls the fine-grained artistic style but also improves image quality over vanilla StyleGAN as measured by FID.
CLOct 16, 2021
Knowledge Enhanced Pretrained Language Models: A Compreshensive SurveyXiaokai Wei, Shen Wang, Dejiao Zhang et al.
Pretrained Language Models (PLM) have established a new paradigm through learning informative contextualized representations on large-scale text corpus. This new paradigm has revolutionized the entire field of natural language processing, and set the new state-of-the-art performance for a wide variety of NLP tasks. However, though PLMs could store certain knowledge/facts from training corpus, their knowledge awareness is still far from satisfactory. To address this issue, integrating knowledge into PLMs have recently become a very active research area and a variety of approaches have been developed. In this paper, we provide a comprehensive survey of the literature on this emerging and fast-growing field - Knowledge Enhanced Pretrained Language Models (KE-PLMs). We introduce three taxonomies to categorize existing work. Besides, we also survey the various NLU and NLG applications on which KE-PLM has demonstrated superior performance over vanilla PLMs. Finally, we discuss challenges that face KE-PLMs and also promising directions for future research.
LGSep 6, 2021
Delving into Macro Placement with Reinforcement LearningZixuan Jiang, Ebrahim Songhori, Shen Wang et al.
In physical design, human designers typically place macros via trial and error, which is a Markov decision process. Reinforcement learning (RL) methods have demonstrated superhuman performance on the macro placement. In this paper, we propose an extension to this prior work (Mirhoseini et al., 2020). We first describe the details of the policy and value network architecture. We replace the force-directed method with DREAMPlace for placing standard cells in the RL environment. We also compare our improved method with other academic placers on public benchmarks.
CRMar 10, 2021
Anti-Counterfeiting for Polymer Banknotes Based on Polymer Substrate FingerprintingShen Wang, Ehsan Toreini, Feng Hao
Polymer banknotes are the trend for printed currency and have been adopted by more than fifty countries worldwide. However, over the past years, the quantity of polymer counterfeits has been increasing, so has the quality of counterfeits. This shows that the initial advantage of bringing a new polymer technology to fight against counterfeiting is reducing. To maintain one step ahead of counterfeiters, we propose a novel anti-counterfeiting technique called Polymer Substrate Fingerprinting (PSF). Our technique is built based on the observation that the opacity coating, a critical step during the production of polymer notes, is a stochastic manufacturing process, leaving uneven thickness in the coating layer and the random dispersion of impurities from the ink. The imperfections in the coating layer result in random translucent patterns when a polymer banknote is back-lit by a light source. We show these patterns can be reliably captured by a commodity negative-film scanner and processed into a compact fingerprint to uniquely identify each banknote. Using an extensive dataset of 6,200 sample images collected from 340 UK banknotes, we show that our method can reliably authenticate banknotes, and is robust against rough daily handling of banknotes. Furthermore, we show the extracted fingerprints contain around 900 bits of entropy, which makes it extremely scalable to identify every polymer note circulated globally. As compared with previous or existing anti-counterfeiting mechanisms for banknotes, our method has a distinctive advantage: it ensures that even in the extreme case when counterfeiters have procured the same printing equipment and ink as used by a legitimate government, counterfeiting banknotes remains infeasible because of the difficulty to replicate a stochastic manufacturing process.
LGOct 21, 2020
Transferable Graph Optimizers for ML CompilersYanqi Zhou, Sudip Roy, Amirali Abdolrashidi et al.
Most compilers for machine learning (ML) frameworks need to solve many correlated optimization problems to generate efficient machine code. Current ML compilers rely on heuristics based algorithms to solve these optimization problems one at a time. However, this approach is not only hard to maintain but often leads to sub-optimal solutions especially for newer model architectures. Existing learning based approaches in the literature are sample inefficient, tackle a single optimization problem, and do not generalize to unseen graphs making them infeasible to be deployed in practice. To address these limitations, we propose an end-to-end, transferable deep reinforcement learning method for computational graph optimization (GO), based on a scalable sequential attention mechanism over an inductive graph neural network. GO generates decisions on the entire graph rather than on each individual node autoregressively, drastically speeding up the search compared to prior methods. Moreover, we propose recurrent attention layers to jointly optimize dependent graph optimization tasks and demonstrate 33%-60% speedup on three graph optimization tasks compared to TensorFlow default optimization. On a diverse set of representative graphs consisting of up to 80,000 nodes, including Inception-v3, Transformer-XL, and WaveNet, GO achieves on average 21% improvement over human experts and 18% improvement over the prior state of the art with 15x faster convergence, on a device placement task evaluated in real systems.
LGJun 23, 2020
Attentional Graph Convolutional Networks for Knowledge Concept Recommendation in MOOCs in a Heterogeneous ViewShen Wang, Jibing Gong, Jinlong Wang et al.
Massive open online courses are becoming a modish way for education, which provides a large-scale and open-access learning opportunity for students to grasp the knowledge. To attract students' interest, the recommendation system is applied by MOOCs providers to recommend courses to students. However, as a course usually consists of a number of video lectures, with each one covering some specific knowledge concepts, directly recommending courses overlook students'interest to some specific knowledge concepts. To fill this gap, in this paper, we study the problem of knowledge concept recommendation. We propose an end-to-end graph neural network-based approach calledAttentionalHeterogeneous Graph Convolutional Deep Knowledge Recommender(ACKRec) for knowledge concept recommendation in MOOCs. Like other recommendation problems, it suffers from sparsity issues. To address this issue, we leverage both content information and context information to learn the representation of entities via graph convolution network. In addition to students and knowledge concepts, we consider other types of entities (e.g., courses, videos, teachers) and construct a heterogeneous information network to capture the corresponding fruitful semantic relationships among different types of entities and incorporate them into the representation learning process. Specifically, we use meta-path on the HIN to guide the propagation of students' preferences. With the help of these meta-paths, the students' preference distribution with respect to a candidate knowledge concept can be captured. Furthermore, we propose an attention mechanism to adaptively fuse the context information from different meta-paths, in order to capture the different interests of different students. The promising experiment results show that the proposedACKRecis able to effectively recommend knowledge concepts to students pursuing online learning in MOOCs.
LGApr 22, 2020
Chip Placement with Deep Reinforcement LearningAzalia Mirhoseini, Anna Goldie, Mustafa Yazgan et al.
In this work, we present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. To achieve these results, we pose placement as a Reinforcement Learning (RL) problem and train an agent to place the nodes of a chip netlist onto a chip canvas. To enable our RL policy to generalize to unseen blocks, we ground representation learning in the supervised task of predicting placement quality. By designing a neural architecture that can accurately predict reward across a wide variety of netlists and their placements, we are able to generate rich feature embeddings of the input netlists. We then use this architecture as the encoder of our policy and value networks to enable transfer learning. Our objective is to minimize PPA (power, performance, and area), and we show that, in under 6 hours, our method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks.
SPMar 20, 2020
BusTime: Which is the Right Prediction Model for My Bus Arrival Time?Dairui Liu, Jingxiang Sun, Shen Wang
With the rise of big data technologies, many smart transportation applications have been rapidly developed in recent years including bus arrival time predictions. This type of applications help passengers to plan trips more efficiently without wasting unpredictable amount of waiting time at bus stops. Many studies focus on improving the prediction accuracy of various machine learning and statistical models, while much less work demonstrate their applicability of being deployed and used in realistic urban settings. This paper tries to fill this gap by proposing a general and practical evaluation framework for analysing various widely used prediction models (i.e. delay, k-nearest-neighbour, kernel regression, additive model, and recurrent neural network using long short term memory) for bus arrival time. In particular, this framework contains a raw bus GPS data pre-processing method that needs much less number of input data points while still maintain satisfactory prediction results. This pre-processing method enables various models to predict arrival time at bus stops only, by using a KD-tree based nearest point search method. Based on this framework, using raw bus GPS dataset in different scales from the city of Dublin, Ireland, we also present preliminary results for city managers by analysing the practical strengths and weaknesses in both training and predicting stages of commonly used prediction models.
IVFeb 27, 2020
Segmentation-based Method combined with Dynamic Programming for Brain Midline DelineationShen Wang, Kongming Liang, Chengwei Pan et al.
The midline related pathological image features are crucial for evaluating the severity of brain compression caused by stroke or traumatic brain injury (TBI). The automated midline delineation not only improves the assessment and clinical decision making for patients with stroke symptoms or head trauma but also reduces the time of diagnosis. Nevertheless, most of the previous methods model the midline by localizing the anatomical points, which are hard to detect or even missing in severe cases. In this paper, we formulate the brain midline delineation as a segmentation task and propose a three-stage framework. The proposed framework firstly aligns an input CT image into the standard space. Then, the aligned image is processed by a midline detection network (MD-Net) integrated with the CoordConv Layer and Cascade AtrousCconv Module to obtain the probability map. Finally, we formulate the optimal midline selection as a pathfinding problem to solve the problem of the discontinuity of midline delineation. Experimental results show that our proposed framework can achieve superior performance on one in-house dataset and one public dataset.
LGNov 26, 2019
Generative Temporal Link Prediction via Self-tokenized Sequence ModelingYue Wang, Chenwei Zhang, Shen Wang et al.
We formalize networks with evolving structures as temporal networks and propose a generative link prediction model, Generative Link Sequence Modeling (GLSM), to predict future links for temporal networks. GLSM captures the temporal link formation patterns from the observed links with a sequence modeling framework and has the ability to generate the emerging links by inferring from the probability distribution on the potential future links. To avoid overfitting caused by treating each link as a unique token, we propose a self-tokenization mechanism to transform each raw link in the network to an abstract aggregation token automatically. The self-tokenization is seamlessly integrated into the sequence modeling framework, which allows the proposed GLSM model to have the generalization capability to discover link formation patterns beyond raw link sequences. We compare GLSM with the existing state-of-art methods on five real-world datasets. The experimental results demonstrate that GLSM obtains future positive links effectively in a generative fashion while achieving the best performance (2-10\% improvements on AUC) among other alternatives.
CROct 17, 2019
Heterogeneous Graph Matching NetworksShen Wang, Zhengzhang Chen, Xiao Yu et al.
Information systems have widely been the target of malware attacks. Traditional signature-based malicious program detection algorithms can only detect known malware and are prone to evasion techniques such as binary obfuscation, while behavior-based approaches highly rely on the malware training samples and incur prohibitively high training cost. To address the limitations of existing techniques, we propose MatchGNet, a heterogeneous Graph Matching Network model to learn the graph representation and similarity metric simultaneously based on the invariant graph modeling of the program's execution behaviors. We conduct a systematic evaluation of our model and show that it is accurate in detecting malicious program behavior and can help detect malware attacks with less false positives. MatchGNet outperforms the state-of-the-art algorithms in malware detection by generating 50% less false positives while keeping zero false negatives.
LGMay 9, 2019
Adversarial Defense Framework for Graph Neural NetworkShen Wang, Zhengzhang Chen, Jingchao Ni et al.
Graph neural network (GNN), as a powerful representation learning model on graph data, attracts much attention across various disciplines. However, recent studies show that GNN is vulnerable to adversarial attacks. How to make GNN more robust? What are the key vulnerabilities in GNN? How to address the vulnerabilities and defense GNN against the adversarial attacks? In this paper, we propose DefNet, an effective adversarial defense framework for GNNs. In particular, we first investigate the latent vulnerabilities in every layer of GNNs and propose corresponding strategies including dual-stage aggregation and bottleneck perceptron. Then, to cope with the scarcity of training data, we propose an adversarial contrastive learning method to train the GNN in a conditional GAN manner by leveraging the high-level graph representation. Extensive experiments on three public datasets demonstrate the effectiveness of DefNet in improving the robustness of popular GNN variants, such as Graph Convolutional Network and GraphSAGE, under various types of adversarial attacks.
CRDec 10, 2018
Attentional Heterogeneous Graph Neural Network: Application to Program ReidentificationShen Wang, Zhengzhang Chen, Ding Li et al.
Program or process is an integral part of almost every IT/OT system. Can we trust the identity/ID (e.g., executable name) of the program? To avoid detection, malware may disguise itself using the ID of a legitimate program, and a system tool (e.g., PowerShell) used by the attackers may have the fake ID of another common software, which is less sensitive. However, existing intrusion detection techniques often overlook this critical program reidentification problem (i.e., checking the program's identity). In this paper, we propose an attentional heterogeneous graph neural network model (DeepHGNN) to verify the program's identity based on its system behaviors. The key idea is to leverage the representation learning of the heterogeneous program behavior graph to guide the reidentification process. We formulate the program reidentification as a graph classification problem and develop an effective attentional heterogeneous graph embedding algorithm to solve it. Extensive experiments --- using real-world enterprise monitoring data and real attacks --- demonstrate the effectiveness of DeepHGNN across multiple popular metrics and the robustness to the normal dynamic changes like program version upgrades.