LGJun 15, 2022
Learning Large-scale Subsurface Simulations with a Hybrid Graph Network SimulatorTailin Wu, Qinchen Wang, Yinan Zhang et al. · mit, stanford
Subsurface simulations use computational models to predict the flow of fluids (e.g., oil, water, gas) through porous media. These simulations are pivotal in industrial applications such as petroleum production, where fast and accurate models are needed for high-stake decision making, for example, for well placement optimization and field development planning. Classical finite difference numerical simulators require massive computational resources to model large-scale real-world reservoirs. Alternatively, streamline simulators and data-driven surrogate models are computationally more efficient by relying on approximate physics models, however they are insufficient to model complex reservoir dynamics at scale. Here we introduce Hybrid Graph Network Simulator (HGNS), which is a data-driven surrogate model for learning reservoir simulations of 3D subsurface fluid flows. To model complex reservoir dynamics at both local and global scale, HGNS consists of a subsurface graph neural network (SGNN) to model the evolution of fluid flows, and a 3D-U-Net to model the evolution of pressure. HGNS is able to scale to grids with millions of cells per time step, two orders of magnitude higher than previous surrogate models, and can accurately predict the fluid flow for tens of time steps (years into the future). Using an industry-standard subsurface flow dataset (SPE-10) with 1.1 million cells, we demonstrate that HGNS is able to reduce the inference time up to 18 times compared to standard subsurface simulators, and that it outperforms other learning-based models by reducing long-term prediction errors by up to 21%.
95.6AIMay 27
CubePart: An Open-Vocabulary Part-Controllable 3D GeneratorYiheng Zhu, Kangle Deng, Jean-Philippe Fauconnier et al.
Interactive 3D assets used in games and simulation are typically decomposed into specific semantic parts to support animation, physics, and scripted behaviors, yet most generative 3D models produce either monolithic meshes or arbitrary part decompositions that cannot be aligned with application-specific requirements. We present CubePart, a generative framework for open-vocabulary, part-controllable 3D mesh generation that exposes part structure as an explicit inference-time control signal. Given a global text prompt and a user-defined parts schema expressed as an open-ended list of part names, our method generates a set of meshes - one per schema element - that assemble into a coherent object while respecting the specified semantic structure. To enable this capability, we introduce a scalable data pipeline to construct a large open-vocabulary, part-labeled 3D dataset, along with a two-stage generative architecture that separates global shape synthesis from part-level decoding. We demonstrate that the resulting assets can be directly integrated into game engines and driven by animation and behavior scripts without manual post-processing. Project Page: https://cubepart.github.io/
IRAug 17, 2023
Capturing Popularity Trends: A Simplistic Non-Personalized Approach for Enhanced Item RecommendationJiazheng Jing, Yinan Zhang, Xin Zhou et al.
Recommender systems have been gaining increasing research attention over the years. Most existing recommendation methods focus on capturing users' personalized preferences through historical user-item interactions, which may potentially violate user privacy. Additionally, these approaches often overlook the significance of the temporal fluctuation in item popularity that can sway users' decision-making. To bridge this gap, we propose Popularity-Aware Recommender (PARE), which makes non-personalized recommendations by predicting the items that will attain the highest popularity. PARE consists of four modules, each focusing on a different aspect: popularity history, temporal impact, periodic impact, and side information. Finally, an attention layer is leveraged to fuse the outputs of four modules. To our knowledge, this is the first work to explicitly model item popularity in recommendation systems. Extensive experiments show that PARE performs on par or even better than sophisticated state-of-the-art recommendation methods. Since PARE prioritizes item popularity over personalized user preferences, it can enhance existing recommendation methods as a complementary component. Our experiments demonstrate that integrating PARE with existing recommendation methods significantly surpasses the performance of standalone models, highlighting PARE's potential as a complement to existing recommendation methods. Furthermore, the simplicity of PARE makes it immensely practical for industrial applications and a valuable baseline for future research.
IRApr 8, 2022
IA-GCN: Interactive Graph Convolutional Network for RecommendationYinan Zhang, Pei Wang, Congcong Liu et al.
Recently, Graph Convolutional Network (GCN) has become a novel state-of-art for Collaborative Filtering (CF) based Recommender Systems (RS). It is a common practice to learn informative user and item representations by performing embedding propagation on a user-item bipartite graph, and then provide the users with personalized item suggestions based on the representations. Despite effectiveness, existing algorithms neglect precious interactive features between user-item pairs in the embedding process. When predicting a user's preference for different items, they still aggregate the user tree in the same way, without emphasizing target-related information in the user neighborhood. Such a uniform aggregation scheme easily leads to suboptimal user and item representations, limiting the model expressiveness to some extent. In this work, we address this problem by building bilateral interactive guidance between each user-item pair and proposing a new model named IA-GCN (short for InterActive GCN). Specifically, when learning the user representation from its neighborhood, we assign higher attention weights to those neighbors similar to the target item. Correspondingly, when learning the item representation, we pay more attention to those neighbors resembling the target user. This leads to interactive and interpretable features, effectively distilling target-specific information through each graph convolutional operation. Our model is built on top of LightGCN, a state-of-the-art GCN model for CF, and can be combined with various GCN-based CF architectures in an end-to-end fashion. Extensive experiments on three benchmark datasets demonstrate the effectiveness and robustness of IA-GCN.
96.8AIMar 11
Emulating Clinician Cognition via Self-Evolving Deep Clinical ResearchRuiyang Ren, Yuhao Wang, Yunsen Liang et al.
Clinical diagnosis is a complex cognitive process, grounded in dynamic cue acquisition and continuous expertise accumulation. Yet most current artificial intelligence (AI) systems are misaligned with this reality, treating diagnosis as single-pass retrospective prediction while lacking auditable mechanisms for governed improvement. We developed DxEvolve, a self-evolving diagnostic agent that bridges these gaps through an interactive deep clinical research workflow. The framework autonomously requisitions examinations and continually externalizes clinical experience from increasing encounter exposure as diagnostic cognition primitives. On the MIMIC-CDM benchmark, DxEvolve improved diagnostic accuracy by 11.2% on average over backbone models and reached 90.4% on a reader-study subset, comparable to the clinician reference (88.8%). DxEvolve improved accuracy on an independent external cohort by 10.2% (categories covered by the source cohort) and 17.1% (uncovered categories) compared to the competitive method. By transforming experience into a governable learning asset, DxEvolve supports an accountable pathway for the continual evolution of clinical AI.
LGDec 22, 2025
RP-CATE: Recurrent Perceptron-based Channel Attention Transformer Encoder for Industrial Hybrid ModelingHaoran Yang, Yinan Zhang, Wenjie Zhang et al.
Nowadays, industrial hybrid modeling which integrates both mechanistic modeling and machine learning-based modeling techniques has attracted increasing interest from scholars due to its high accuracy, low computational cost, and satisfactory interpretability. Nevertheless, the existing industrial hybrid modeling methods still face two main limitations. First, current research has mainly focused on applying a single machine learning method to one specific task, failing to develop a comprehensive machine learning architecture suitable for modeling tasks, which limits their ability to effectively represent complex industrial scenarios. Second, industrial datasets often contain underlying associations (e.g., monotonicity or periodicity) that are not adequately exploited by current research, which can degrade model's predictive performance. To address these limitations, this paper proposes the Recurrent Perceptron-based Channel Attention Transformer Encoder (RP-CATE), with three distinctive characteristics: 1: We developed a novel architecture by replacing the self-attention mechanism with channel attention and incorporating our proposed Recurrent Perceptron (RP) Module into Transformer, achieving enhanced effectiveness for industrial modeling tasks compared to the original Transformer. 2: We proposed a new data type called Pseudo-Image Data (PID) tailored for channel attention requirements and developed a cyclic sliding window method for generating PID. 3: We introduced the concept of Pseudo-Sequential Data (PSD) and a method for converting industrial datasets into PSD, which enables the RP Module to capture the underlying associations within industrial dataset more effectively. An experiment aimed at hybrid modeling in chemical engineering was conducted by using RP-CATE and the experimental results demonstrate that RP-CATE achieves the best performance compared to other baseline models.
AIDec 9, 2025
Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and PrototypesYibowen Zhao, Yinan Zhang, Zhixiang Su et al.
Predicting diseases solely from patient-side information, such as demographics and self-reported symptoms, has attracted significant research attention due to its potential to enhance patient awareness, facilitate early healthcare engagement, and improve healthcare system efficiency. However, existing approaches encounter critical challenges, including imbalanced disease distributions and a lack of interpretability, resulting in biased or unreliable predictions. To address these issues, we propose the Knowledge graph-enhanced, Prototype-aware, and Interpretable (KPI) framework. KPI systematically integrates structured and trusted medical knowledge into a unified disease knowledge graph, constructs clinically meaningful disease prototypes, and employs contrastive learning to enhance predictive accuracy, which is particularly important for long-tailed diseases. Additionally, KPI utilizes large language models (LLMs) to generate patient-specific, medically relevant explanations, thereby improving interpretability and reliability. Extensive experiments on real-world datasets demonstrate that KPI outperforms state-of-the-art methods in predictive accuracy and provides clinically valid explanations that closely align with patient narratives, highlighting its practical value for patient-centered healthcare delivery.
CVMar 19, 2025
Cube: A Roblox View of 3D IntelligenceFoundation AI Team, Kiran Bhat, Nishchaie Khanna et al.
Foundation models trained on vast amounts of data have demonstrated remarkable reasoning and generation capabilities in the domains of text, images, audio and video. Our goal at Roblox is to build such a foundation model for 3D intelligence, a model that can support developers in producing all aspects of a Roblox experience, from generating 3D objects and scenes to rigging characters for animation to producing programmatic scripts describing object behaviors. We discuss three key design requirements for such a 3D foundation model and then present our first step towards building such a model. We expect that 3D geometric shapes will be a core data type and describe our solution for 3D shape tokenizer. We show how our tokenization scheme can be used in applications for text-to-shape generation, shape-to-text generation and text-to-scene generation. We demonstrate how these applications can collaborate with existing large language models (LLMs) to perform scene analysis and reasoning. We conclude with a discussion outlining our path to building a fully unified foundation model for 3D intelligence.
SEMar 24, 2025
ModiGen: A Large Language Model-Based Workflow for Multi-Task Modelica Code GenerationJiahui Xiang, Tong Ye, Peiyu Liu et al.
Modelica is a widely adopted language for simulating complex physical systems, yet effective model creation and optimization require substantial domain expertise. Although large language models (LLMs) have demonstrated promising capabilities in code generation, their application to modeling remains largely unexplored. To address this gap, we have developed benchmark datasets specifically designed to evaluate the performance of LLMs in generating Modelica component models and test cases. Our evaluation reveals substantial limitations in current LLMs, as the generated code often fails to simulate successfully. To overcome these challenges, we propose a specialized workflow that integrates supervised fine-tuning, graph retrieval-augmented generation, and feedback optimization to improve the accuracy and reliability of Modelica code generation. The evaluation results demonstrate significant performance gains: the maximum improvement in pass@1 reached 0.3349 for the component generation task and 0.2457 for the test case generation task. This research underscores the potential of LLMs to advance intelligent modeling tools and offers valuable insights for future developments in system modeling and engineering applications.
CLMay 5, 2024
Enabling Patient-side Disease Prediction via the Integration of Patient NarrativesZhixiang Su, Yinan Zhang, Jiazheng Jing et al.
Disease prediction holds considerable significance in modern healthcare, because of its crucial role in facilitating early intervention and implementing effective prevention measures. However, most recent disease prediction approaches heavily rely on laboratory test outcomes (e.g., blood tests and medical imaging from X-rays). Gaining access to such data for precise disease prediction is often a complex task from the standpoint of a patient and is always only available post-patient consultation. To make disease prediction available from patient-side, we propose Personalized Medical Disease Prediction (PoMP), which predicts diseases using patient health narratives including textual descriptions and demographic information. By applying PoMP, patients can gain a clearer comprehension of their conditions, empowering them to directly seek appropriate medical specialists and thereby reducing the time spent navigating healthcare communication to locate suitable doctors. We conducted extensive experiments using real-world data from Haodf to showcase the effectiveness of PoMP.
LGAug 23, 2025
UM3: Unsupervised Map to Map MatchingChaolong Ying, Yinan Zhang, Lei Zhang et al.
Map-to-map matching is a critical task for aligning spatial data across heterogeneous sources, yet it remains challenging due to the lack of ground truth correspondences, sparse node features, and scalability demands. In this paper, we propose an unsupervised graph-based framework that addresses these challenges through three key innovations. First, our method is an unsupervised learning approach that requires no training data, which is crucial for large-scale map data where obtaining labeled training samples is challenging. Second, we introduce pseudo coordinates that capture the relative spatial layout of nodes within each map, which enhances feature discriminability and enables scale-invariant learning. Third, we design an mechanism to adaptively balance feature and geometric similarity, as well as a geometric-consistent loss function, ensuring robustness to noisy or incomplete coordinate data. At the implementation level, to handle large-scale maps, we develop a tile-based post-processing pipeline with overlapping regions and majority voting, which enables parallel processing while preserving boundary coherence. Experiments on real-world datasets demonstrate that our method achieves state-of-the-art accuracy in matching tasks, surpassing existing methods by a large margin, particularly in high-noise and large-scale scenarios. Our framework provides a scalable and practical solution for map alignment, offering a robust and efficient alternative to traditional approaches.
CVJan 20, 2024
Large-scale Reinforcement Learning for Diffusion ModelsYinan Zhang, Eric Tzeng, Yilun Du et al.
Text-to-image diffusion models are a class of deep generative models that have demonstrated an impressive capacity for high-quality image generation. However, these models are susceptible to implicit biases that arise from web-scale text-image training pairs and may inaccurately model aspects of images we care about. This can result in suboptimal samples, model bias, and images that do not align with human ethics and preferences. In this paper, we present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL) across a diverse set of reward functions, such as human preference, compositionality, and fairness over millions of images. We illustrate how our approach substantially outperforms existing methods for aligning diffusion models with human preferences. We further illustrate how this substantially improves pretrained Stable Diffusion (SD) models, generating samples that are preferred by humans 80.3% of the time over those from the base SD model while simultaneously improving both the composition and diversity of generated samples.
IRJun 9, 2021
Initialization Matters: Regularizing Manifold-informed Initialization for Neural Recommendation SystemsYinan Zhang, Boyang Li, Yong Liu et al.
Proper initialization is crucial to the optimization and the generalization of neural networks. However, most existing neural recommendation systems initialize the user and item embeddings randomly. In this work, we propose a new initialization scheme for user and item embeddings called Laplacian Eigenmaps with Popularity-based Regularization for Isolated Data (LEPORID). LEPORID endows the embeddings with information regarding multi-scale neighborhood structures on the data manifold and performs adaptive regularization to compensate for high embedding variance on the tail of the data distribution. Exploiting matrix sparsity, LEPORID embeddings can be computed efficiently. We evaluate LEPORID in a wide range of neural recommendation models. In contrast to the recent surprising finding that the simple K-nearest-neighbor (KNN) method often outperforms neural recommendation systems, we show that existing neural systems initialized with LEPORID often perform on par or better than KNN. To maximize the effects of the initialization, we propose the Dual-Loss Residual Recommendation (DLR2) network, which, when initialized with LEPORID, substantially outperforms both traditional and state-of-the-art neural recommender systems.
CVDec 17, 2020
Neural Radiance Flow for 4D View Synthesis and Video ProcessingYilun Du, Yinan Zhang, Hong-Xing Yu et al.
We present a method, Neural Radiance Flow (NeRFlow),to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images. Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene. By enforcing consistency across different modalities, our representation enables multi-view rendering in diverse dynamic scenes, including water pouring, robotic interaction, and real images, outperforming state-of-the-art methods for spatial-temporal view synthesis. Our approach works even when inputs images are captured with only one camera. We further demonstrate that the learned representation can serve as an implicit scene prior, enabling video processing tasks such as image super-resolution and de-noising without any additional supervision.
CVOct 18, 2020
Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural RenderingYuxuan Zhang, Wenzheng Chen, Huan Ling et al.
Differentiable rendering has paved the way to training neural networks to perform "inverse graphics" tasks such as predicting 3D geometry from monocular photographs. To train high performing models, most of the current approaches rely on multi-view imagery which are not readily available in practice. Recent Generative Adversarial Networks (GANs) that synthesize images, in contrast, seem to acquire 3D knowledge implicitly during training: object viewpoints can be manipulated by simply manipulating the latent codes. However, these latent codes often lack further physical interpretation and thus GANs cannot easily be inverted to perform explicit 3D reasoning. In this paper, we aim to extract and disentangle 3D knowledge learned by generative models by utilizing differentiable renderers. Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties. The entire architecture is trained iteratively using cycle consistency losses. We show that our approach significantly outperforms state-of-the-art inverse graphics networks trained on existing datasets, both quantitatively and via user studies. We further showcase the disentangled GAN as a controllable 3D "neural renderer", complementing traditional graphics renderers.
IRApr 24, 2020
Learning Hierarchical Review Graph Representations for RecommendationYong Liu, Susen Yang, Yinan Zhang et al.
The user review data have been demonstrated to be effective in solving different recommendation problems. Previous review-based recommendation methods usually employ sophisticated compositional models, such as Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), to learn semantic representations from the review data for recommendation. However, these methods mainly capture the local dependency between neighbouring words in a word window, and they treat each review equally. Therefore, they may not be effective in capturing the global dependency between words, and tend to be easily biased by noise review information. In this paper, we propose a novel review-based recommendation model, named Review Graph Neural Network (RGNN). Specifically, RGNN builds a specific review graph for each individual user/item, which provides a global view about the user/item properties to help weaken the biases caused by noise review information. A type-aware graph attention mechanism is developed to learn semantic embeddings of words. Moreover, a personalized graph pooling operator is proposed to learn hierarchical representations of the review graph to form the semantic representation for each user/item. We compared RGNN with state-of-the-art review-based recommendation approaches on two real-world datasets. The experimental results indicate that RGNN consistently outperforms baseline methods, in terms of Mean Square Error (MSE).
IRFeb 20, 2020
Towards a Soft Faceted Browsing Scheme for Information AccessYinan Zhang, Parikshit Sondhi, Anjan Goswami et al.
Faceted browsing is a commonly supported feature of user interfaces for access to information. Existing interfaces generally treat facet values selected by a user as hard filters and respond to the user by only displaying information items strictly satisfying the filters and in their original ranking order. We propose a novel alternative strategy for faceted browsing, called soft faceted browsing, where the system also includes some possibly relevant items outside the selected filter in a non-intrusive way and re-ranks the items to better satisfy the user's information need. Such a soft faceted browsing strategy can be beneficial when the user does not have a very confident and strict preference for the selected facet values, and is especially appropriate for applications such as e-commerce search where the user would like to explore a larger space before finalizing a purchasing decision. We propose a probabilistic framework for modeling and solving the soft faceted browsing problem, and apply the framework to study the case of facet filter selection in e-commerce search engines. Preliminary experiment results demonstrate the soft faceted browsing scheme is better than the traditional faceted browsing scheme in terms of its efficiency in helping users navigate in the information space.
CLNov 7, 2019
Explicit Pairwise Word Interaction Modeling Improves Pretrained Transformers for English Semantic Similarity TasksYinan Zhang, Raphael Tang, Jimmy Lin
In English semantic similarity tasks, classic word embedding-based approaches explicitly model pairwise "interactions" between the word representations of a sentence pair. Transformer-based pretrained language models disregard this notion, instead modeling pairwise word interactions globally and implicitly through their self-attention mechanism. In this paper, we hypothesize that introducing an explicit, constrained pairwise word interaction mechanism to pretrained language models improves their effectiveness on semantic similarity tasks. We validate our hypothesis using BERT on four tasks in semantic textual similarity and answer sentence selection. We demonstrate consistent improvements in quality by adding an explicit pairwise word interaction module to BERT.
ROJun 20, 2019
PuzzleFlex: kinematic motion of chains with loose jointsSamuel Lensgraf, Karim Itani, Yinan Zhang et al.
This paper presents a method of computing free motions of a planar assembly of rigid bodies connected by loose joints. Joints are modeled using local distance constraints, which are then linearized with respect to configuration space velocities, yielding a linear programming formulation that allows analysis of systems with thousands of rigid bodies. Potential applications include analysis of collections of modular robots, structural stability perturbation analysis, tolerance analysis for mechanical systems, and formation control of mobile robots.
IRMar 19, 2019
Diversity-Promoting Deep Reinforcement Learning for Interactive RecommendationYong Liu, Yinan Zhang, Qiong Wu et al.
Interactive recommendation that models the explicit interactions between users and the recommender system has attracted a lot of research attentions in recent years. Most previous interactive recommendation systems only focus on optimizing recommendation accuracy while overlooking other important aspects of recommendation quality, such as the diversity of recommendation results. In this paper, we propose a novel recommendation model, named \underline{D}iversity-promoting \underline{D}eep \underline{R}einforcement \underline{L}earning (D$^2$RL), which encourages the diversity of recommendation results in interaction recommendations. More specifically, we adopt a Determinantal Point Process (DPP) model to generate diverse, while relevant item recommendations. A personalized DPP kernel matrix is maintained for each user, which is constructed from two parts: a fixed similarity matrix capturing item-item similarity, and the relevance of items dynamically learnt through an actor-critic reinforcement learning framework. We performed extensive offline experiments as well as simulated online experiments with real world datasets to demonstrate the effectiveness of the proposed model.
LGJan 19, 2019
Towards Physically Safe Reinforcement Learning under SupervisionYinan Zhang, Devin Balkcom, Haoxiang Li
This paper addresses the question of how a previously available control policy $π_s$ can be used as a supervisor to more quickly and safely train a new learned control policy $π_L$ for a robot. A weighted average of the supervisor and learned policies is used during trials, with a heavier weight initially on the supervisor, in order to allow safe and useful physical trials while the learned policy is still ineffective. During the process, the weight is adjusted to favor the learned policy. As weights are adjusted, the learned network must compensate so as to give safe and reasonable outputs under the different weights. A pioneer network is introduced that pre-learns a policy that performs similarly to the current learned policy under the planned next step for new weights; this pioneer network then replaces the currently learned network in the next set of trials. Experiments in OpenAI Gym demonstrate the effectiveness of the proposed method.
AIJun 21, 2018
Reductive Clustering: An Efficient Linear-time Graph-based Divisive Cluster Analysis ApproachChing Tarn, Yinan Zhang, Ye Feng
We propose an efficient linear-time graph-based divisive cluster analysis approach called Reductive Clustering. The approach tries to reveal the hierarchical structural information through reducing the graph into a more concise one repeatedly. With the reductions, the original graph can be divided into subgraphs recursively, and a lite informative dendrogram is constructed based on the divisions. The reduction consists of three steps: selection, connection, and partition. First a subset of vertices of the graph are selected as representatives to build a concise graph. The representatives are re-connected to maintain a consistent structure with the previous graph. If possible, the concise graph is divided into subgraphs, and each subgraph is further reduced recursively until the termination condition is met. We discuss the approach, along with several selection and connection methods, in detail both theoretically and experimentally in this paper. Our implementations run in linear time and achieve outstanding performance on various types of datasets. Experimental results show that they outperform state-of-the-art clustering algorithms with significantly less computing resource requirements.