Faramarz Fekri

LG
h-index38
34papers
667citations
Novelty63%
AI Score61

34 Papers

LGJan 4, 2023
NODAGS-Flow: Nonlinear Cyclic Causal Structure Learning

Muralikrishnna G. Sethuraman, Romain Lopez, Rahul Mohan et al. · berkeley, gatech

Learning causal relationships between variables is a well-studied problem in statistics, with many important applications in science. However, modeling real-world systems remain challenging, as most existing algorithms assume that the underlying causal graph is acyclic. While this is a convenient framework for developing theoretical developments about causal reasoning and inference, the underlying modeling assumption is likely to be violated in real systems, because feedback loops are common (e.g., in biological systems). Although a few methods search for cyclic causal models, they usually rely on some form of linearity, which is also limiting, or lack a clear underlying probabilistic model. In this work, we propose a novel framework for learning nonlinear cyclic causal graphical models from interventional data, called NODAGS-Flow. We perform inference via direct likelihood optimization, employing techniques from residual normalizing flows for likelihood estimation. Through synthetic experiments and an application to single-cell high-content perturbation screening data, we show significant performance improvements with our approach compared to state-of-the-art methods with respect to structure recovery and predictive performance.

CLSep 2, 2024Code
The Compressor-Retriever Architecture for Language Model OS

Yuan Yang, Siheng Xiong, Ehsan Shareghi et al.

Recent advancements in large language models (LLMs) have significantly enhanced their capacity to aggregate and process information across multiple modalities, enabling them to perform a wide range of tasks such as multimodal data querying, tool usage, web interactions, and handling long documents. These capabilities pave the way for transforming LLMs from mere chatbots into general-purpose agents capable of interacting with the real world. This paper explores the concept of using a language model as the core component of an operating system (OS), effectively acting as a CPU that processes data stored in a context window, which functions as RAM. A key challenge in realizing such an LM OS is managing the life-long context and ensuring statefulness across sessions, a feature limited by the current session-based interaction paradigm due to context window size limit. To address this, we introduce compressor-retriever, a model-agnostic architecture designed for life-long context management. Unlike other long-context solutions such as retrieval-augmented generation, our approach exclusively uses the base model's forward function to compress and retrieve context, ensuring end-to-end differentiability. Preliminary experiments demonstrate the effectiveness of this architecture in in-context learning tasks, marking a step towards the development of a fully stateful LLM OS. Project repo available at: https://github.com/gblackout/LM-OS

LGMar 21
RECLAIM: Cyclic Causal Discovery Amid Measurement Noise

Muralikrishnna G. Sethuraman, Faramarz Fekri · gatech

Uncovering causal relationships is a fundamental problem across science and engineering. However, most existing causal discovery methods assume acyclicity and direct access to the system variables -- assumptions that fail to hold in many real-world settings. For instance, in genomics, cyclic regulatory networks are common, and measurements are often corrupted by instrumental noise. To address these challenges, we propose RECLAIM, a causal discovery framework that natively handles both cycles and measurement noise. RECLAIM learns the causal graph structure by maximizing the likelihood of the observed measurements via expectation-maximization (EM), using residual normalizing flows for tractable likelihood computation. We consider two measurement models: (i) Gaussian additive noise, and (ii) a linear measurement system with additive Gaussian noise. We provide theoretical consistency guarantees for both the settings. Experiments on synthetic data and real-world protein signaling datasets demonstrate the efficacy of the proposed method.

ITMar 17, 2022
A Density Evolution framework for Preferential Recovery of Covariance and Causal Graphs from Compressed Measurements

Muralikrishnna G. Sethuraman, Hang Zhang, Faramarz Fekri · gatech

In this paper, we propose a general framework for designing sensing matrix $\boldsymbol{A} \in \mathbb{R}^{d\times p}$, for estimation of sparse covariance matrix from compressed measurements of the form $\boldsymbol{y} = \boldsymbol{A}\boldsymbol{x} + \boldsymbol{n}$, where $\boldsymbol{y}, \boldsymbol{n} \in \mathbb{R}^d$, and $\boldsymbol{x} \in \mathbb{R}^p$. By viewing covariance recovery as inference over factor graphs via message passing algorithm, ideas from coding theory, such as \textit{Density Evolution} (DE), are leveraged to construct a framework for the design of the sensing matrix. The proposed framework can handle both (1) regular sensing, i.e., equal importance is given to all entries of the covariance, and (2) preferential sensing, i.e., higher importance is given to a part of the covariance matrix. Through experiments, we show that the sensing matrix designed via density evolution can match the state-of-the-art for covariance recovery in the regular sensing paradigm and attain improved performance in the preferential sensing regime. Additionally, we study the feasibility of causal graph structure recovery using the estimated covariance matrix obtained from the compressed measurements.

LGJun 9, 2022
Temporal Inductive Logic Reasoning over Hypergraphs

Yuan Yang, Siheng Xiong, Ali Payani et al.

Inductive logic reasoning is a fundamental task in graph analysis, which aims to generalize patterns from data. This task has been extensively studied for traditional graph representations, such as knowledge graphs (KGs), using techniques like inductive logic programming (ILP). Existing ILP methods assume learning from KGs with static facts and binary relations. Beyond KGs, graph structures are widely present in other applications such as procedural instructions, scene graphs, and program executions. While ILP is beneficial for these applications, applying it to those graphs is nontrivial: they are more complex than KGs, which usually involve timestamps and n-ary relations, effectively a type of hypergraph with temporal events. In this work, we propose temporal inductive logic reasoning (TILR), an ILP method that reasons on temporal hypergraphs. To enable hypergraph reasoning, we introduce the multi-start random B-walk, a novel graph traversal method for hypergraphs. By combining it with a path-consistency algorithm, TILR learns logic rules by generalizing from both temporal and relational data. To address the lack of hypergraph benchmarks, we create and release two temporal hypergraph datasets: YouCook2-HG and nuScenes-HG. Experiments on these benchmarks demonstrate that TILR achieves superior reasoning capability over various strong baselines.

CLFeb 2
Scaling Search-Augmented LLM Reasoning via Adaptive Information Control

Siheng Xiong, Oguzhan Gungordu, Blair Johnson et al.

Search-augmented reasoning agents interleave multi-step reasoning with external information retrieval, but uncontrolled retrieval often leads to redundant evidence, context saturation, and unstable learning. Existing approaches rely on outcome-based reinforcement learning (RL), which provides limited guidance for regulating information acquisition. We propose DeepControl, a framework for adaptive information control based on a formal notion of information utility, which measures the marginal value of retrieved evidence under a given reasoning state. Building on this utility, we introduce retrieval continuation and granularity control mechanisms that selectively regulate when to continue and stop retrieval, and how much information to expand. An annealed control strategy enables the agent to internalize effective information acquisition behaviors during training. Extensive experiments across seven benchmarks demonstrate that our method consistently outperforms strong baselines. In particular, our approach achieves average performance improvements of 9.4% and 8.6% on Qwen2.5-7B and Qwen2.5-3B, respectively, over strong outcome-based RL baselines, and consistently outperforms both retrieval-free and retrieval-based reasoning methods without explicit information control. These results highlight the importance of adaptive information control for scaling search-augmented reasoning agents to complex, real-world information environments.

AIDec 8, 2022
Generalizing LTL Instructions via Future Dependent Options

Duo Xu, Faramarz Fekri

In many real-world applications of control system and robotics, linear temporal logic (LTL) is a widely-used task specification language which has a compositional grammar that naturally induces temporally extended behaviours across tasks, including conditionals and alternative realizations. An important problem in RL with LTL tasks is to learn task-conditioned policies which can zero-shot generalize to new LTL instructions not observed in the training. However, because symbolic observation is often lossy and LTL tasks can have long time horizon, previous works can suffer from issues such as training sampling inefficiency and infeasibility or sub-optimality of the found solutions. In order to tackle these issues, this paper proposes a novel multi-task RL algorithm with improved learning efficiency and optimality. To achieve the global optimality of task completion, we propose to learn options dependent on the future subgoals via a novel off-policy approach. In order to propagate the rewards of satisfying future subgoals back more efficiently, we propose to train a multi-step value function conditioned on the subgoal sequence which is updated with Monte Carlo estimates of multi-step discounted returns. In experiments on three different domains, we evaluate the LTL generalization capability of the agent trained by the proposed method, showing its advantage over previous representative methods.

AIJan 28
PathWise: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs

Oguzhan Gungordu, Siheng Xiong, Faramarz Fekri

Large Language Models (LLMs) have enabled automated heuristic design (AHD) for combinatorial optimization problems (COPs), but existing frameworks' reliance on fixed evolutionary rules and static prompt templates often leads to myopic heuristic generation, redundant evaluations, and limited reasoning about how new heuristics should be derived. We propose a novel multi-agent reasoning framework, referred to as Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs (PathWise), which formulates heuristic generation as a sequential decision process over an entailment graph serving as a compact, stateful memory of the search trajectory. This approach allows the system to carry forward past decisions and reuse or avoid derivation information across generations. A policy agent plans evolutionary actions, a world model agent generates heuristic rollouts conditioned on those actions, and critic agents provide routed reflections summarizing lessons from prior steps, shifting LLM-based AHD from trial-and-error evolution toward state-aware planning through reasoning. Experiments across diverse COPs show that PathWise converges faster to better heuristics, generalizes across different LLM backbones, and scales to larger problem sizes.

MLMay 6, 2022
Structure Learning in Graphical Models from Indirect Observations

Hang Zhang, Afshin Abdi, Faramarz Fekri

This paper considers learning of the graphical structure of a $p$-dimensional random vector $X \in R^p$ using both parametric and non-parametric methods. Unlike the previous works which observe $x$ directly, we consider the indirect observation scenario in which samples $y$ are collected via a sensing matrix $A \in R^{d\times p}$, and corrupted with some additive noise $w$, i.e, $Y = AX + W$. For the parametric method, we assume $X$ to be Gaussian, i.e., $x\in R^p\sim N(μ, Σ)$ and $Σ\in R^{p\times p}$. For the first time, we show that the correct graphical structure can be correctly recovered under the indefinite sensing system ($d < p$) using insufficient samples ($n < p$). In particular, we show that for the exact recovery, we require dimension $d = Ω(p^{0.8})$ and sample number $n = Ω(p^{0.8}\log^3 p)$. For the nonparametric method, we assume a nonparanormal distribution for $X$ rather than Gaussian. Under mild conditions, we show that our graph-structure estimator can obtain the correct structure. We derive the minimum sample number $n$ and dimension $d$ as $n\gtrsim (deg)^4 \log^4 n$ and $d \gtrsim p + (deg\cdot\log(d-p))^{β/4}$, respectively, where deg is the maximum Markov blanket in the graphical model and $β> 0$ is some fixed positive constant. Additionally, we obtain a non-asymptotic uniform bound on the estimation error of the CDF of $X$ from indirect observations with inexact knowledge of the noise distribution. To the best of our knowledge, this bound is derived for the first time and may serve as an independent interest. Numerical experiments on both real-world and synthetic data are provided confirm the theoretical results.

LGMay 15
SCOUT: Cyclic Causal Discovery Under Soft Interventions with Unknown Targets

Alpar Turkoglu, Muralikrishnna G. Sethuraman, Faramarz Fekri

Learning causal relationships between variables from data is a fundamental research area with many applications across disciplines. Most existing causal discovery algorithms rely on the assumptions that (i) the underlying system is acyclic, (ii) the exogenous noise variables are Gaussian, and (iii) the intervention targets for the data-generating experiments are known. While these assumptions simplify the analysis, they are violated in real-life systems. Most existing methods that address these issues either assume the underlying model is linear or are constrained to operate in limited interventional settings. To that end, we propose SCOUT, a novel causal discovery framework for learning nonlinear cyclic causal relationships from soft interventional data with unknown targets. Our approach maximizes the data log-likelihood to recover the graph structure, using two normalizing-flow architectures: contractive residual flows and neural spline flows. Through experiments on synthetic and real-world data, we show that SCOUT outperforms state-of-the-art methods in both causal graph recovery and unknown target recovery across various interventional and noise settings.

CLJan 12, 2024
Large Language Models Can Learn Temporal Reasoning

Siheng Xiong, Ali Payani, Ramana Kompella et al.

While large language models (LLMs) have demonstrated remarkable reasoning capabilities, they are not without their flaws and inaccuracies. Recent studies have introduced various methods to mitigate these limitations. Temporal reasoning (TR), in particular, presents a significant challenge for LLMs due to its reliance on diverse temporal concepts and intricate temporal logic. In this paper, we propose TG-LLM, a novel framework towards language-based TR. Instead of reasoning over the original context, we adopt a latent representation, temporal graph (TG) that enhances the learning of TR. A synthetic dataset (TGQA), which is fully controllable and requires minimal supervision, is constructed for fine-tuning LLMs on this text-to-TG translation task. We confirmed in experiments that the capability of TG translation learned on our dataset can be transferred to other TR tasks and benchmarks. On top of that, we teach LLM to perform deliberate reasoning over the TGs via Chain-of-Thought (CoT) bootstrapping and graph data augmentation. We observed that those strategies, which maintain a balance between usefulness and diversity, bring more reliable CoTs and final results than the vanilla CoT distillation.

CLJun 19, 2024Code
Can LLMs Reason in the Wild with Programs?

Yuan Yang, Siheng Xiong, Ali Payani et al.

Large Language Models (LLMs) have shown superior capability to solve reasoning problems with programs. While being a promising direction, most of such frameworks are trained and evaluated in settings with a prior knowledge of task requirements. However, as LLMs become more capable, it is necessary to assess their reasoning abilities in more realistic scenarios where many real-world problems are open-ended with ambiguous scope, and often require multiple formalisms to solve. To investigate this, we introduce the task of reasoning in the wild, where an LLM is tasked to solve a reasoning problem of unknown type by identifying the subproblems and their corresponding formalisms, and writing a program to solve each subproblem, guided by a tactic. We create a large tactic-guided trajectory dataset containing detailed solutions to a diverse set of reasoning problems, ranging from well-defined single-form reasoning (e.g., math, logic), to ambiguous and hybrid ones (e.g., commonsense, combined math and logic). This allows us to test various aspects of LLMs reasoning at the fine-grained level such as the selection and execution of tactics, and the tendency to take undesired shortcuts. In experiments, we highlight that existing LLMs fail significantly on problems with ambiguous and mixed scope, revealing critical limitations and overfitting issues (e.g. accuracy on GSM8K drops by at least 50\%). We further show the potential of finetuning a local LLM on the tactic-guided trajectories in achieving better performance. Project repo is available at github.com/gblackout/Reason-in-the-Wild

CLMay 24, 2023Code
Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation

Yuan Yang, Siheng Xiong, Ali Payani et al.

Translating natural language sentences to first-order logic (NL-FOL translation) is a longstanding challenge in the NLP and formal logic literature. This paper introduces LogicLLaMA, a LLaMA-7B model fine-tuned for NL-FOL translation using LoRA on a single GPU. LogicLLaMA is capable of directly translating natural language into FOL rules, which outperforms GPT-3.5. LogicLLaMA is also equipped to correct FOL rules predicted by GPT-3.5, and can achieve similar performance as GPT-4 with a fraction of the cost. This correction ability was achieved by a novel supervised fine-tuning (SFT) + reinforcement learning with human feedback (RLHF) framework, which initially trains on synthetically perturbed NL-FOL pairs to encourage chain-of-thought reasoning and then fine-tunes with RLHF on GPT-3.5 outputs using a FOL verifier as the reward model. To train LogicLLaMA, we present MALLS (large language $\textbf{M}$odel gener$\textbf{A}$ted N$\textbf{L}$-FO$\textbf{L}$ pair$\textbf{S}$), a dataset of 34K high-quality and diverse sentence-level NL-FOL pairs collected from GPT-4. The dataset was created by implementing a pipeline that prompts GPT-4 for pairs, and dynamically adjusts the prompts to ensure the collection of pairs with rich and diverse contexts at different levels of complexity, and verifies the validity of the generated FOL rules. Codes, weights, and data are available at $\href{https://github.com/gblackout/LogicLLaMA}{\small \text{https://github.com/gblackout/LogicLLaMA}}$.

CLFeb 19, 2024
TILP: Differentiable Learning of Temporal Logical Rules on Knowledge Graphs

Siheng Xiong, Yuan Yang, Faramarz Fekri et al.

Compared with static knowledge graphs, temporal knowledge graphs (tKG), which can capture the evolution and change of information over time, are more realistic and general. However, due to the complexity that the notion of time introduces to the learning of the rules, an accurate graph reasoning, e.g., predicting new links between entities, is still a difficult problem. In this paper, we propose TILP, a differentiable framework for temporal logical rules learning. By designing a constrained random walk mechanism and the introduction of temporal operators, we ensure the efficiency of our model. We present temporal features modeling in tKG, e.g., recurrence, temporal order, interval between pair of relations, and duration, and incorporate it into our learning process. We compare TILP with state-of-the-art methods on two benchmark datasets. We show that our proposed framework can improve upon the performance of baseline methods while providing interpretable results. In particular, we consider various scenarios in which training samples are limited, data is biased, and the time range between training and inference are different. In all these cases, TILP works much better than the state-of-the-art methods.

CLDec 25, 2023
TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning

Siheng Xiong, Yuan Yang, Ali Payani et al.

Conventional embedding-based models approach event time prediction in temporal knowledge graphs (TKGs) as a ranking problem. However, they often fall short in capturing essential temporal relationships such as order and distance. In this paper, we propose TEILP, a logical reasoning framework that naturally integrates such temporal elements into knowledge graph predictions. We first convert TKGs into a temporal event knowledge graph (TEKG) which has a more explicit representation of time in term of nodes of the graph. The TEKG equips us to develop a differentiable random walk approach to time prediction. Finally, we introduce conditional probability density functions, associated with the logical rules involving the query interval, using which we arrive at the time prediction. We compare TEILP with state-of-the-art methods on five benchmark datasets. We show that our model achieves a significant improvement over baselines while providing interpretable explanations. In particular, we consider several scenarios where training samples are limited, event types are imbalanced, and forecasting the time of future events based on only past events is desired. In all these cases, TEILP outperforms state-of-the-art methods in terms of robustness.

ITApr 21
Goal-Oriented Semantic Communication for Logical Decision Making

Ahmet Faruk Saz, Faramarz Fekri

This paper develops a principled foundation for goal-oriented semantic communication for logical decision-making. Consider a setting where autonomous agents engage in collaborative perception. In such settings, the volume of sensory data and limited bandwidth often make transmission of raw observations infeasible, requiring intelligent selection of task-relevant information. Because these scenarios are safety-critical, the selection and decision processes must also be transparent and verifiable. To address this, we propose an explainable semantic communication framework grounded in a First-Order Logic (FOL) hierarchical representation of the world. We define semantic information, entropy, conditional entropy, and mutual information by assigning an inductive logical probability measure over semantic structures in the language. Based on these definitions, we formulate a goal-oriented semantic communication objective through semantic rate-distortion theory and, equivalently, through the semantic information bottleneck principle. In this framework, task rules are represented as goal-oriented states, defined as a layer over the world states to capture decision-relevant abstractions. The resulting principle selects evidence that is most informative about these states, aiming to transmit only those FOL clauses most critical for decision-making while preserving logical verifiability. We demonstrate the effectiveness of the approach in a deduction-based safe path-following task within an FOL-based urban environment simulator with multiple dynamic agents.

AIOct 13, 2024
Generalization of Compositional Tasks with Logical Specification via Implicit Planning

Duo Xu, Faramarz Fekri

In this study, we address the challenge of learning generalizable policies for compositional tasks defined by logical specifications. These tasks consist of multiple temporally extended sub-tasks. Due to the sub-task inter-dependencies and sparse reward issue in long-horizon tasks, existing reinforcement learning (RL) approaches, such as task-conditioned and goal-conditioned policies, continue to struggle with slow convergence and sub-optimal performance in generalizing to compositional tasks. To overcome these limitations, we introduce a new hierarchical RL framework that enhances the efficiency and optimality of task generalization. At the high level, we present an implicit planner specifically designed for generalizing compositional tasks. This planner selects the next sub-task and estimates the multi-step return for completing the remaining task to complete from the current state. It learns a latent transition model and performs planning in the latent space by using a graph neural network (GNN). Subsequently, the high-level planner's selected sub-task guides the low-level agent to effectively handle long-horizon tasks, while the multi-step return encourages the low-level policy to account for future sub-task dependencies, enhancing its optimality. We conduct comprehensive experiments to demonstrate the framework's advantages over previous methods in terms of both efficiency and optimality.

MLFeb 23, 2024
Learning Cyclic Causal Models from Incomplete Data

Muralikrishnna G. Sethuraman, Faramarz Fekri · gatech

Causal learning is a fundamental problem in statistics and science, offering insights into predicting the effects of unseen treatments on a system. Despite recent advances in this topic, most existing causal discovery algorithms operate under two key assumptions: (i) the underlying graph is acyclic, and (ii) the available data is complete. These assumptions can be problematic as many real-world systems contain feedback loops (e.g., biological systems), and practical scenarios frequently involve missing data. In this work, we propose a novel framework, named MissNODAGS, for learning cyclic causal graphs from partially missing data. Under the additive noise model, MissNODAGS learns the causal graph by alternating between imputing the missing data and maximizing the expected log-likelihood of the visible part of the data in each training step, following the principles of the expectation-maximization (EM) framework. Through synthetic experiments and real-world single-cell perturbation data, we demonstrate improved performance when compared to using state-of-the-art imputation techniques followed by causal learning on partially missing interventional data.

CLOct 28, 2025
Long-Context Modeling with Dynamic Hierarchical Sparse Attention for On-Device LLMs

Siheng Xiong, Joe Zou, Faramarz Fekri et al.

The quadratic cost of attention hinders the scalability of long-context LLMs, especially in resource-constrained settings. Existing static sparse methods such as sliding windows or global tokens utilizes the sparsity of attention to reduce the cost of attention, but poorly adapts to the content-dependent variations in attention due to their staticity. While previous work has proposed several dynamic approaches to improve flexibility, they still depend on predefined templates or heuristic mechanisms. Such strategies reduce generality and prune tokens that remain contextually important, limiting their accuracy across diverse tasks. To tackle these bottlenecks of existing methods for long-context modeling, we introduce Dynamic Hierarchical Sparse Attention (DHSA), a data-driven framework that dynamically predicts attention sparsity online without retraining. Our proposed DHSA adaptively segments sequences into variable-length chunks, then computes chunk representations by aggregating the token embeddings within each chunk. To avoid the bias introduced by varying chunk lengths, we apply length-normalized aggregation that scales the averaged embeddings by the square root of the chunk size. Finally, DHSA upsamples the chunk-level similarity scores to token level similarities to calculate importance scores that determine which token-level interactions should be preserved. Our experiments on Gemma2 with Needle-in-a-Haystack Test and LongBench show that DHSA matches dense attention in accuracy, while reducing prefill latency by 20-60% and peak memory usage by 35%. Compared to other representative baselines such as block sparse attention, DHSA achieves consistently higher accuracy (6-18% relative gains) with comparable or lower cost, offering an efficient and adaptable solution for long-context on-device LLMs.

CLOct 13, 2025
Enhancing Long Chain-of-Thought Reasoning through Multi-Path Plan Aggregation

Siheng Xiong, Ali Payani, Faramarz Fekri

Inference-time scaling enhances the reasoning ability of a language model (LM) by extending its chain-of-thought (CoT). However, existing approaches typically generate the entire reasoning chain in a single forward pass, which often leads to CoT derailment, i.e., the reasoning trajectory drifting off course due to compounding errors. This problem is particularly severe for smaller LMs with long CoTs due to their limited capacity. To address this, we analyze raw long CoTs and uncover a reasoning hierarchy consisting of planning and execution steps. Our analysis reveals that most reasoning errors stem from incorrect planning. Motivated by this observation, we propose Multi-Path Plan Aggregation (MPPA), a framework that augments single-pass reasoning with plan exploration and aggregation. Following a variable interval schedule based on the token position, MPPA generates multiple candidate plans and aggregates them into a refined planning step. To maintain efficiency, we adopt a minimal design in which the base LM serves as the primary policy, while a lightweight LoRA module implements the plan aggregation policy. We further observe that outcome-reward RL is inefficient for long trajectories (e.g., exceeding 4K tokens). To overcome this, we introduce online Step-DPO, a process-level preference optimization scheme that leverages Twisted Sequential Monte Carlo (TSMC) to provide scalable stepwise supervision using small LMs. This yields more efficient training, improved stability, and higher accuracy. Extensive experiments on challenging math, science, and logical reasoning benchmarks demonstrate that, with only 10% SFT data and 5% of preference pairs, our method outperforms both the DeepSeek-R1 distillation baseline and the outcome-reward RL baseline across multiple base models and tasks.

LGAug 11, 2025
Differentiable Cyclic Causal Discovery Under Unmeasured Confounders

Muralikrishnna G. Sethuraman, Faramarz Fekri · gatech

Understanding causal relationships between variables is fundamental across scientific disciplines. Most causal discovery algorithms rely on two key assumptions: (i) all variables are observed, and (ii) the underlying causal graph is acyclic. While these assumptions simplify theoretical analysis, they are often violated in real-world systems, such as biological networks. Existing methods that account for confounders either assume linearity or struggle with scalability. To address these limitations, we propose DCCD-CONF, a novel framework for differentiable learning of nonlinear cyclic causal graphs in the presence of unmeasured confounders using interventional data. Our approach alternates between optimizing the graph structure and estimating the confounder distribution by maximizing the log-likelihood of the data. Through experiments on synthetic data and real-world gene perturbation datasets, we show that DCCD-CONF outperforms state-of-the-art methods in both causal graph recovery and confounder identification. Additionally, we also provide consistency guarantees for our framework, reinforcing its theoretical soundness.

AIAug 8, 2025
GLIDR: Graph-Like Inductive Logic Programming with Differentiable Reasoning

Blair Johnson, Clayton Kerce, Faramarz Fekri

Differentiable inductive logic programming (ILP) techniques have proven effective at finding approximate rule-based solutions to link prediction and node classification problems on knowledge graphs; however, the common assumption of chain-like rule structure can hamper the performance and interpretability of existing approaches. We introduce GLIDR, a differentiable rule learning method that models the inference of logic rules with more expressive syntax than previous methods. GLIDR uses a differentiable message passing inference algorithm that generalizes previous chain-like rule learning methods to allow rules with features like branches and cycles. GLIDR has a simple and expressive rule search space which is parameterized by a limit on the maximum number of free variables that may be included in a rule. Explicit logic rules can be extracted from the weights of a GLIDR model for use with symbolic solvers. We demonstrate that GLIDR can significantly outperform existing rule learning methods on knowledge graph completion tasks and even compete with embedding methods despite the inherent disadvantage of being a structure-only prediction method. We show that rules extracted from GLIDR retain significant predictive performance, and that GLIDR is highly robust to training data noise. Finally, we demonstrate that GLIDR can be chained with deep neural networks and optimized end-to-end for rule learning on arbitrary data modalities.

LGNov 3, 2024
Learning Hidden Subgoals under Temporal Ordering Constraints in Reinforcement Learning

Duo Xu, Faramarz Fekri

In real-world applications, the success of completing a task is often determined by multiple key steps which are distant in time steps and have to be achieved in a fixed time order. For example, the key steps listed on the cooking recipe should be achieved one-by-one in the right time order. These key steps can be regarded as subgoals of the task and their time orderings are described as temporal ordering constraints. However, in many real-world problems, subgoals or key states are often hidden in the state space and their temporal ordering constraints are also unknown, which make it challenging for previous RL algorithms to solve this kind of tasks. In order to address this issue, in this work we propose a novel RL algorithm for {\bf l}earning hidden {\bf s}ubgoals under {\bf t}emporal {\bf o}rdering {\bf c}onstraints (LSTOC). We propose a new contrastive learning objective which can effectively learn hidden subgoals (key states) and their temporal orderings at the same time, based on first-occupancy representation and temporal geometric sampling. In addition, we propose a sample-efficient learning strategy to discover subgoals one-by-one following their temporal order constraints by building a subgoal tree to represent discovered subgoals and their temporal ordering relationships. Specifically, this tree can be used to improve the sample efficiency of trajectory collection, fasten the task solving and generalize to unseen tasks. The LSTOC framework is evaluated on several environments with image-based observations, showing its significant improvement over baseline methods.

MLOct 24, 2024
MissNODAG: Differentiable Cyclic Causal Graph Learning from Incomplete Data

Muralikrishnna G. Sethuraman, Razieh Nabi, Faramarz Fekri · gatech

Causal discovery in real-world systems, such as biological networks, is often complicated by feedback loops and incomplete data. Standard algorithms, which assume acyclic structures or fully observed data, struggle with these challenges. To address this gap, we propose MissNODAG, a differentiable framework for learning both the underlying cyclic causal graph and the missingness mechanism from partially observed data, including data missing not at random. Our framework integrates an additive noise model with an expectation-maximization procedure, alternating between imputing missing values and optimizing the observed data likelihood, to uncover both the cyclic structures and the missingness mechanism. We demonstrate the effectiveness of MissNODAG through synthetic experiments and an application to real-world gene perturbation data.

LGJan 24, 2022
A Machine Learning Framework for Distributed Functional Compression over Wireless Channels in IoT

Yashas Malur Saidutta, Afshin Abdi, Faramarz Fekri

IoT devices generating enormous data and state-of-the-art machine learning techniques together will revolutionize cyber-physical systems. In many diverse fields, from autonomous driving to augmented reality, distributed IoT devices compute specific target functions without simple forms like obstacle detection, object recognition, etc. Traditional cloud-based methods that focus on transferring data to a central location either for training or inference place enormous strain on network resources. To address this, we develop, to the best of our knowledge, the first machine learning framework for distributed functional compression over both the Gaussian Multiple Access Channel (GMAC) and orthogonal AWGN channels. Due to the Kolmogorov-Arnold representation theorem, our machine learning framework can, by design, compute any arbitrary function for the desired functional compression task in IoT. Importantly the raw sensory data are never transferred to a central node for training or inference, thus reducing communication. For these algorithms, we provide theoretical convergence guarantees and upper bounds on communication. Our simulations show that the learned encoders and decoders for functional compression perform significantly better than traditional approaches, are robust to channel condition changes and sensor outages. Compared to the cloud-based scenario, our algorithms reduce channel use by two orders of magnitude.

CVNov 8, 2021
Visual Question Answering based on Formal Logic

Muralikrishnna G. Sethuraman, Ali Payani, Faramarz Fekri et al.

Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In VQA, a series of questions are posed based on a set of images and the task at hand is to arrive at the answer. To achieve this, we take a symbolic reasoning based approach using the framework of formal logic. The image and the questions are converted into symbolic representations on which explicit reasoning is performed. We propose a formal logic framework where (i) images are converted to logical background facts with the help of scene graphs, (ii) the questions are translated to first-order predicate logic clauses using a transformer based deep learning model, and (iii) perform satisfiability checks, by using the background knowledge and the grounding of predicate clauses, to obtain the answer. Our proposed method is highly interpretable and each step in the pipeline can be easily analyzed by a human. We validate our approach on the CLEVR and the GQA dataset. We achieve near perfect accuracy of 99.6% on the CLEVR dataset comparable to the state of art models, showcasing that formal logic is a viable tool to tackle visual question answering. Our model is also data efficient, achieving 99.1% accuracy on CLEVR dataset when trained on just 10% of the training data.

LGJun 21, 2021
Interpretable Model-based Hierarchical Reinforcement Learning using Inductive Logic Programming

Duo Xu, Faramarz Fekri

Recently deep reinforcement learning has achieved tremendous success in wide ranges of applications. However, it notoriously lacks data-efficiency and interpretability. Data-efficiency is important as interacting with the environment is expensive. Further, interpretability can increase the transparency of the black-box-style deep RL models and hence gain trust from the users. In this work, we propose a new hierarchical framework via symbolic RL, leveraging a symbolic transition model to improve the data-efficiency and introduce the interpretability for learned policy. This framework consists of a high-level agent, a subtask solver and a symbolic transition model. Without assuming any prior knowledge on the state transition, we adopt inductive logic programming (ILP) to learn the rules of symbolic state transitions, introducing interpretability and making the learned behavior understandable to users. In empirical experiments, we confirmed that the proposed framework offers approximately between 30\% to 40\% more data efficiency over previous methods.

LGMar 22, 2021
Improving Actor-Critic Reinforcement Learning via Hamiltonian Monte Carlo Method

Duo Xu, Faramarz Fekri

The actor-critic RL is widely used in various robotic control tasks. By viewing the actor-critic RL from the perspective of variational inference (VI), the policy network is trained to obtain the approximate posterior of actions given the optimality criteria. However, in practice, the actor-critic RL may yield suboptimal policy estimates due to the amortization gap and insufficient exploration. In this work, inspired by the previous use of Hamiltonian Monte Carlo (HMC) in VI, we propose to integrate the policy network of actor-critic RL with HMC, which is termed as {\it Hamiltonian Policy}. As such we propose to evolve actions from the base policy according to HMC, and our proposed method has many benefits. First, HMC can improve the policy distribution to better approximate the posterior and hence reduce the amortization gap. Second, HMC can also guide the exploration more to the regions of action spaces with higher Q values, enhancing the exploration efficiency. Further, instead of directly applying HMC into RL, we propose a new leapfrog operator to simulate the Hamiltonian dynamics. Finally, in safe RL problems, we find that the proposed method can not only improve the achieved return, but also reduce safety constraint violations by discarding potentially unsafe actions. With comprehensive empirical experiments on continuous control baselines, including MuJoCo and PyBullet Roboschool, we show that the proposed approach is a data-efficient and easy-to-implement improvement over previous actor-critic methods.

LGAug 19, 2020
Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference

Afshin Abdi, Saeed Rashidi, Faramarz Fekri et al.

Using multiple nodes and parallel computing algorithms has become a principal tool to improve training and execution times of deep neural networks as well as effective collective intelligence in sensor networks. In this paper, we consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers) where the deep model is divided into several parallel sub-models, each of which is executed by a worker. Since latency due to synchronization and data transfer among workers negatively impacts the performance of the parallel implementation, it is desirable to have minimum interdependency among parallel sub-models. To achieve this goal, we propose to rearrange the neurons in the neural network and partition them (without changing the general topology of the neural network), such that the interdependency among sub-models is minimized under the computations and communications constraints of the workers. We propose RePurpose, a layer-wise model restructuring and pruning technique that guarantees the performance of the overall parallelized model. To efficiently apply RePurpose, we propose an approach based on $\ell_0$ optimization and the Munkres assignment algorithm. We show that, compared to the existing methods, RePurpose significantly improves the efficiency of the distributed inference via parallel implementation, both in terms of communication and computational complexity.

NEJun 30, 2020
Accelerating Reinforcement Learning Agent with EEG-based Implicit Human Feedback

Duo Xu, Mohit Agarwal, Ekansh Gupta et al.

Providing Reinforcement Learning (RL) agents with human feedback can dramatically improve various aspects of learning. However, previous methods require human observer to give inputs explicitly (e.g., press buttons, voice interface), burdening the human in the loop of RL agent's learning process. Further, it is sometimes difficult or impossible to obtain the explicit human advise (feedback), e.g., autonomous driving, disabled rehabilitation, etc. In this work, we investigate capturing human's intrinsic reactions as implicit (and natural) feedback through EEG in the form of error-related potentials (ErrP), providing a natural and direct way for humans to improve the RL agent learning. As such, the human intelligence can be integrated via implicit feedback with RL algorithms to accelerate the learning of RL agent. We develop three reasonably complex 2D discrete navigational games to experimentally evaluate the overall performance of the proposed work. Major contributions of our work are as follows, (i) we propose and experimentally validate the zero-shot learning of ErrPs, where the ErrPs can be learned for one game, and transferred to other unseen games, (ii) we propose a novel RL framework for integrating implicit human feedbacks via ErrPs with RL agent, improving the label efficiency and robustness to human mistakes, and (iii) compared to prior works, we scale the application of ErrPs to reasonably complex environments, and demonstrate the significance of our approach for accelerated learning through real user experiments.

LGMar 23, 2020
Incorporating Relational Background Knowledge into Reinforcement Learning via Differentiable Inductive Logic Programming

Ali Payani, Faramarz Fekri

Relational Reinforcement Learning (RRL) can offers various desirable features. Most importantly, it allows for incorporating expert knowledge into the learning, and hence leading to much faster learning and better generalization compared to the standard deep reinforcement learning. However, most of the existing RRL approaches are either incapable of incorporating expert background knowledge (e.g., in the form of explicit predicate language) or are not able to learn directly from non-relational data such as image. In this paper, we propose a novel deep RRL based on a differentiable Inductive Logic Programming (ILP) that can effectively learn relational information from image and present the state of the environment as first order logic predicates. Additionally, it can take the expert background knowledge and incorporate it into the learning problem using appropriate predicates. The differentiable ILP allows an end to end optimization of the entire framework for learning the policy in RRL. We show the efficacy of this novel RRL framework using environments such as BoxWorld, GridWorld as well as relational reasoning for the Sort-of-CLEVR dataset.

AIJun 8, 2019
Inductive Logic Programming via Differentiable Deep Neural Logic Networks

Ali Payani, Faramarz Fekri

We propose a novel paradigm for solving Inductive Logic Programming (ILP) problems via deep recurrent neural networks. This proposed ILP solver is designed based on differentiable implementation of the deduction via forward chaining. In contrast to the majority of past methods, instead of searching through the space of possible first-order logic rules by using some restrictive rule templates, we directly learn the symbolic logical predicate rules by introducing a novel differentiable Neural Logic (dNL) network. The proposed dNL network is able to learn and represent Boolean functions efficiently and in an explicit manner. We show that the proposed dNL-ILP solver supports desirable features such as recursion and predicate invention. Further, we investigate the performance of the proposed ILP solver in classification tasks involving benchmark relational datasets. In particular, we show that our proposed method outperforms the state of the art ILP solvers in classification tasks for Mutagenesis, Cora and IMDB datasets.

LGApr 2, 2019
Learning Algorithms via Neural Logic Networks

Ali Payani, Faramarz Fekri

We propose a novel learning paradigm for Deep Neural Networks (DNN) by using Boolean logic algebra. We first present the basic differentiable operators of a Boolean system such as conjunction, disjunction and exclusive-OR and show how these elementary operators can be combined in a simple and meaningful way to form Neural Logic Networks (NLNs). We examine the effectiveness of the proposed NLN framework in learning Boolean functions and discrete-algorithmic tasks. We demonstrate that, in contrast to the implicit learning in MLP approach, the proposed neural logic networks can learn the logical functions explicitly that can be verified and interpreted by human. In particular, we propose a new framework for learning the inductive logic programming (ILP) problems by exploiting the explicit representational power of NLN. We show the proposed neural ILP solver is capable of feats such as predicate invention and recursion and can outperform the current state of the art neural ILP solvers using a variety of benchmark tasks such as decimal addition and multiplication, and sorting on ordered list.

LGSep 24, 2012
BPRS: Belief Propagation Based Iterative Recommender System

Erman Ayday, Arash Einolghozati, Faramarz Fekri

In this paper we introduce the first application of the Belief Propagation (BP) algorithm in the design of recommender systems. We formulate the recommendation problem as an inference problem and aim to compute the marginal probability distributions of the variables which represent the ratings to be predicted. However, computing these marginal probability functions is computationally prohibitive for large-scale systems. Therefore, we utilize the BP algorithm to efficiently compute these functions. Recommendations for each active user are then iteratively computed by probabilistic message passing. As opposed to the previous recommender algorithms, BPRS does not require solving the recommendation problem for all the users if it wishes to update the recommendations for only a single active. Further, BPRS computes the recommendations for each user with linear complexity and without requiring a training period. Via computer simulations (using the 100K MovieLens dataset), we verify that BPRS iteratively reduces the error in the predicted ratings of the users until it converges. Finally, we confirm that BPRS is comparable to the state of art methods such as Correlation-based neighborhood model (CorNgbr) and Singular Value Decomposition (SVD) in terms of rating and precision accuracy. Therefore, we believe that the BP-based recommendation algorithm is a new promising approach which offers a significant advantage on scalability while providing competitive accuracy for the recommender systems.