Jingling Li

LG
h-index47
12papers
853citations
Novelty64%
AI Score49

12 Papers

LGJul 13, 2022
Hindsight Learning for MDPs with Exogenous Inputs

Sean R. Sinclair, Felipe Frujeri, Ching-An Cheng et al.

Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algorithms achieve data efficiency by leveraging a key insight: having samples of the exogenous variables, past decisions can be revisited in hindsight to infer counterfactual consequences that can accelerate policy improvements. We compare HL against classic baselines in the multi-secretary and airline revenue management problems. We also scale our algorithms to a business-critical cloud resource management problem -- allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider. We find that HL algorithms outperform domain-specific heuristics, as well as state-of-the-art reinforcement learning methods.

LGAug 14, 2024
How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

Ying Fan, Jingling Li, Adith Swaminathan et al.

We present a novel method, Contextual goal-Oriented Data Augmentation (CODA), which uses commonly available unlabeled trajectories and context-goal pairs to solve Contextual Goal-Oriented (CGO) problems. By carefully constructing an action-augmented MDP that is equivalent to the original MDP, CODA creates a fully labeled transition dataset under training contexts without additional approximation error. We conduct a novel theoretical analysis to demonstrate CODA's capability to solve CGO problems in the offline data setup. Empirical results also showcase the effectiveness of CODA, which outperforms other baseline methods across various context-goal relationships of CGO problem. This approach offers a promising direction to solving CGO problems using offline datasets.

AIFeb 16
Position: Introspective Experience from Conversational Environments as a Path to Better Learning

Claudiu Cristian Musat, Jackson Tolins, Diego Antognini et al.

Current approaches to AI training treat reasoning as an emergent property of scale. We argue instead that robust reasoning emerges from linguistic self-reflection, itself internalized from high-quality social interaction. Drawing on Vygotskian developmental psychology, we advance three core positions centered on Introspection. First, we argue for the Social Genesis of the Private Mind: learning from conversational environments rises to prominence as a new way to make sense of the world; the friction of aligning with another agent, internal or not, refines and crystallizes the reasoning process. Second, we argue that dialogically scaffolded introspective experiences allow agents to engage in sense-making that decouples learning from immediate data streams, transforming raw environmental data into rich, learnable narratives. Finally, we contend that Dialogue Quality is the New Data Quality: the depth of an agent's private reasoning, and its efficiency regarding test-time compute, is determined by the diversity and rigor of the dialogues it has mastered. We conclude that optimizing these conversational scaffolds is the primary lever for the next generation of general intelligence.

IRAug 23, 2024
CSRec: Rethinking Sequential Recommendation from A Causal Perspective

Xiaoyu Liu, Jiaxin Yuan, Yuhang Zhou et al.

The essence of sequential recommender systems (RecSys) lies in understanding how users make decisions. Most existing approaches frame the task as sequential prediction based on users' historical purchase records. While effective in capturing users' natural preferences, this formulation falls short in accurately modeling actual recommendation scenarios, particularly in accounting for how unsuccessful recommendations influence future purchases. Furthermore, the impact of the RecSys itself on users' decisions has not been appropriately isolated and quantitatively analyzed. To address these challenges, we propose a novel formulation of sequential recommendation, termed Causal Sequential Recommendation (CSRec). Instead of predicting the next item in the sequence, CSRec aims to predict the probability of a recommended item's acceptance within a sequential context and backtrack how current decisions are made. Critically, CSRec facilitates the isolation of various factors that affect users' final decisions, especially the influence of the recommender system itself, thereby opening new avenues for the design of recommender systems. CSRec can be seamlessly integrated into existing methodologies. Experimental evaluations on both synthetic and real-world datasets demonstrate that the proposed implementation significantly improves upon state-of-the-art baselines.

AIFeb 17
Improving Interactive In-Context Learning from Natural Language Feedback

Martin Klissarov, Jonathan Cook, Diego Antognini et al.

Adapting one's thought process based on corrective feedback is an essential ability in human learning, particularly in collaborative settings. In contrast, the current large language model training paradigm relies heavily on modeling vast, static corpora. While effective for knowledge acquisition, it overlooks the interactive feedback loops essential for models to adapt dynamically to their context. In this work, we propose a framework that treats this interactive in-context learning ability not as an emergent property, but as a distinct, trainable skill. We introduce a scalable method that transforms single-turn verifiable tasks into multi-turn didactic interactions driven by information asymmetry. We first show that current flagship models struggle to integrate corrective feedback on hard reasoning tasks. We then demonstrate that models trained with our approach dramatically improve the ability to interactively learn from language feedback. More specifically, the multi-turn performance of a smaller model nearly reaches that of a model an order of magnitude larger. We also observe robust out-of-distribution generalization: interactive training on math problems transfers to diverse domains like coding, puzzles and maze navigation. Our qualitative analysis suggests that this improvement is due to an enhanced in-context plasticity. Finally, we show that this paradigm offers a unified path to self-improvement. By training the model to predict the teacher's critiques, effectively modeling the feedback environment, we convert this external signal into an internal capability, allowing the model to self-correct even without a teacher.

CLMar 13, 2024
Prompting Fairness: Integrating Causality to Debias Large Language Models

Jingling Li, Zeyu Tang, Xiaoyu Liu et al. · stanford

Large language models (LLMs), despite their remarkable capabilities, are susceptible to generating biased and discriminatory responses. As LLMs increasingly influence high-stakes decision-making (e.g., hiring and healthcare), mitigating these biases becomes critical. In this work, we propose a causality-guided debiasing framework to tackle social biases, aiming to reduce the objectionable dependence between LLMs' decisions and the social information in the input. Our framework introduces a novel perspective to identify how social information can affect an LLM's decision through different causal pathways. Leveraging these causal insights, we outline principled prompting strategies that regulate these pathways through selection mechanisms. This framework not only unifies existing prompting-based debiasing techniques, but also opens up new directions for reducing bias by encouraging the model to prioritize fact-based reasoning over reliance on biased social cues. We validate our framework through extensive experiments on real-world datasets across multiple domains, demonstrating its effectiveness in debiasing LLM decisions, even with only black-box access to the model.

LGOct 27, 2021
VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization

Mucong Ding, Kezhi Kong, Jingling Li et al.

Most state-of-the-art Graph Neural Networks (GNNs) can be defined as a form of graph convolution which can be realized by message passing between direct neighbors or beyond. To scale such GNNs to large graphs, various neighbor-, layer-, or subgraph-sampling techniques are proposed to alleviate the "neighbor explosion" problem by considering only a small subset of messages passed to the nodes in a mini-batch. However, sampling-based methods are difficult to apply to GNNs that utilize many-hops-away or global context each layer, show unstable performance for different tasks and datasets, and do not speed up model inference. We propose a principled and fundamentally different approach, VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. In contrast to sampling-based techniques, our approach can effectively preserve all the messages passed to a mini-batch of nodes by learning and updating a small number of quantized reference vectors of global node representations, using VQ within each GNN layer. Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix. We show that such a compact low-rank version of the gigantic convolution matrix is sufficient both theoretically and experimentally. In company with VQ, we design a novel approximated message passing algorithm and a nontrivial back-propagation rule for our framework. Experiments on various types of GNN backbones demonstrate the scalability and competitive performance of our framework on large-graph node classification and link prediction benchmarks.

LGDec 23, 2020
How Does a Neural Network's Architecture Impact Its Robustness to Noisy Labels?

Jingling Li, Mozhi Zhang, Keyulu Xu et al.

Noisy labels are inevitable in large real-world datasets. In this work, we explore an area understudied by previous works -- how the network's architecture impacts its robustness to noisy labels. We provide a formal framework connecting the robustness of a network to the alignments between its architecture and target/noise functions. Our framework measures a network's robustness via the predictive power in its representations -- the test performance of a linear model trained on the learned representations using a small set of clean labels. We hypothesize that a network is more robust to noisy labels if its architecture is more aligned with the target function than the noise. To support our hypothesis, we provide both theoretical and empirical evidence across various neural network architectures and different domains. We also find that when the network is well-aligned with the target function, its predictive power in representations could improve upon state-of-the-art (SOTA) noisy-label-training methods in terms of test accuracy and even outperform sophisticated methods that use clean labels.

LGSep 24, 2020
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Keyulu Xu, Mozhi Zhang, Jingling Li et al.

We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neural networks, a.k.a. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs) -- structured networks with MLP modules -- have shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most nonlinear functions. But, they can provably learn a linear target function when the training distribution is sufficiently "diverse". Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features. Our theoretical analysis builds on a connection of over-parameterized networks to the neural tangent kernel. Empirically, our theory holds across different training settings.

LGJan 14, 2020
Understanding Generalization in Deep Learning via Tensor Methods

Jingling Li, Yanchao Sun, Jiahao Su et al.

Deep neural networks generalize well on unseen data though the number of parameters often far exceeds the number of training examples. Recently proposed complexity measures have provided insights to understanding the generalizability in neural networks from perspectives of PAC-Bayes, robustness, overparametrization, compression and so on. In this work, we advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. Using tensor analysis, we propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks; thus, in practice, our generalization bound outperforms the previous compression-based ones, especially for neural networks using tensors as their weight kernels (e.g. CNNs). Moreover, these intuitive measurements provide further insights into designing neural network architectures with properties favorable for better/guaranteed generalizability. Our experimental results demonstrate that through the proposed measurable properties, our generalization error bound matches the trend of the test error well. Our theoretical analysis further provides justifications for the empirical success and limitations of some widely-used tensor-based compression approaches. We also discover the improvements to the compressibility and robustness of current neural networks when incorporating tensor operations via our proposed layer-wise structure.

LGMay 30, 2019
What Can Neural Networks Reason About?

Keyulu Xu, Jingling Li, Mozhi Zhang et al.

Neural networks have succeeded in many reasoning tasks. Empirically, these tasks require specialized network structures, e.g., Graph Neural Networks (GNNs) perform well on many such tasks, but less structured networks fail. Theoretically, there is limited understanding of why and when a network structure generalizes better than others, although they have equal expressive power. In this paper, we develop a framework to characterize which reasoning tasks a network can learn well, by studying how well its computation structure aligns with the algorithmic structure of the relevant reasoning process. We formally define this algorithmic alignment and derive a sample complexity bound that decreases with better alignment. This framework offers an explanation for the empirical success of popular reasoning models, and suggests their limitations. As an example, we unify seemingly different reasoning tasks, such as intuitive physics, visual question answering, and shortest paths, via the lens of a powerful algorithmic paradigm, dynamic programming (DP). We show that GNNs align with DP and thus are expected to solve these tasks. On several reasoning tasks, our theory is supported by empirical results.

MLMay 25, 2018
Tensorial Neural Networks: Generalization of Neural Networks and Application to Model Compression

Jiahao Su, Jingling Li, Bobby Bhattacharjee et al.

We propose tensorial neural networks (TNNs), a generalization of existing neural networks by extending tensor operations on low order operands to those on high order ones. The problem of parameter learning is challenging, as it corresponds to hierarchical nonlinear tensor decomposition. We propose to solve the learning problem using stochastic gradient descent by deriving nontrivial backpropagation rules in generalized tensor algebra we introduce. Our proposed TNNs has three advantages over existing neural networks: (1) TNNs naturally apply to high order input object and thus preserve the multi-dimensional structure in the input, as there is no need to flatten the data. (2) TNNs interpret designs of existing neural network architectures. (3) Mapping a neural network to TNNs with the same expressive power results in a TNN of fewer parameters. TNN based compression of neural network improves existing low-rank approximation based compression methods as TNNs exploit two other types of invariant structures, periodicity and modulation, in addition to the low rankness. Experiments on LeNet-5 (MNIST), ResNet-32 (CIFAR10) and ResNet-50 (ImageNet) demonstrate that our TNN based compression outperforms (5% test accuracy improvement universally on CIFAR10) the state-of-the-art low-rank approximation based compression methods under the same compression rate, besides achieving orders of magnitude faster convergence rates due to the efficiency of TNNs.