Khalid Qaraqe

h-index46

5papers

61citations

Novelty60%

AI Score46

Ranked #35,751 of 194,257 authors (top 18%)#7,318 in CL (top 24%)

5 Papers

12.0CLJun 8, 2025Code

Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models

Samir Abdaljalil, Hasan Kurban, Khalid Qaraqe et al.

Large language models (LLMs) have shown strong performance across natural language reasoning tasks, yet their reasoning processes remain brittle and difficult to interpret. Prompting techniques like Chain-of-Thought (CoT) enhance reliability by eliciting intermediate reasoning steps or aggregating multiple outputs. However, they lack mechanisms for enforcing logical structure and assessing internal coherence. We introduce Theorem-of-Thought (ToTh), a novel framework that models reasoning as collaboration among three parallel agents, each simulating a distinct mode of inference: abductive, deductive, and inductive. Each agent produces a reasoning trace, which is structured into a formal reasoning graph. To evaluate consistency, we apply Bayesian belief propagation guided by natural language inference (NLI), assigning confidence scores to each step. The most coherent graph is selected to derive the final answer. Experiments on symbolic (WebOfLies) and numerical (MultiArith) reasoning benchmarks show that ToTh consistently outperforms CoT, Self-Consistency, and CoT-Decoding across multiple LLMs, while producing interpretable and logically grounded reasoning chains. Our findings suggest a promising direction for building more robust and cognitively inspired LLM reasoning. The implementation is available at https://github.com/KurbanIntelligenceLab/theorem-of-thought.

2.7CLOct 11, 2025Code

Audit-of-Understanding: Posterior-Constrained Inference for Mathematical Reasoning in Language Models

Samir Abdaljalil, Erchin Serpedin, Khalid Qaraqe et al.

Large language models (LLMs) often generate reasoning traces that appear coherent but rest on unsupported assumptions, leading to hallucinated conclusions. Prior work mainly addresses factual hallucinations or relies on post-hoc verification, leaving reasoning-induced hallucinations largely unaddressed. We propose Audit-of-Understanding (AoU), a framework that constrains inference to validated premises through three phases: (1) decomposing a query into candidate assumptions, (2) auditing their support, and (3) conditioning inference only on the validated subset. Formally, AoU is \emph{posterior-constrained inference}, connecting to selective prediction and rejection learning. Our contributions are threefold: (i) theoretical guarantees under perfect validation, (ii) excess-risk bounds under imperfect audits, and (iii) tractability analysis. Empirically, AoU improves both accuracy and faithfulness on GSM8K, MultiArith, and SVAMP, achieving up to +30% gains on GSM8K, +45% on MultiArith, and consistent +20--28% improvements on SVAMP over Chain-of-Thought, Self-Consistency, and CoT-Decoding. Code is available at https://anonymous.4open.science/r/audit-of-understanding-E28B.

6.7CLAug 20, 2025Code

Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference

Samir Abdaljalil, Erchin Serpedin, Khalid Qaraqe et al.

Large language models (LLMs) are increasingly applied in multilingual contexts, yet their capacity for consistent, logically grounded alignment across languages remains underexplored. We present a controlled evaluation framework for multilingual natural language inference (NLI) that generates synthetic, logic-based premise-hypothesis pairs and translates them into a typologically diverse set of languages. This design enables precise control over semantic relations and allows testing in both monolingual and mixed-language (code-switched) conditions. Surprisingly, code-switching does not degrade, and can even improve, performance, suggesting that translation-induced lexical variation may serve as a regularization signal. We validate semantic preservation through embedding-based similarity analyses and cross-lingual alignment visualizations, confirming the fidelity of translated pairs. Our findings expose both the potential and the brittleness of current LLM cross-lingual reasoning, and identify code-switching as a promising lever for improving multilingual robustness. Code available at: https://github.com/KurbanIntelligenceLab/nli-stress-testing

2.3SPMar 17, 2020

Spectrum Sensing and Signal Identification with Deep Learning based on Spectral Correlation Function

Kürşat Tekbıyık, Özkan Akbunar, Ali Rıza Ekti et al.

Spectrum sensing is one of the means of utilizing the scarce source of wireless spectrum efficiently. In this paper, a convolutional neural network (CNN) model employing spectral correlation function which is an effective characterization of cyclostationarity property, is proposed for wireless spectrum sensing and signal identification. The proposed method classifies wireless signals without a priori information and it is implemented in two different settings entitled CASE1 and CASE2. In CASE1, signals are jointly sensed and classified. In CASE2, sensing and classification are conducted in a sequential manner. In contrary to the classical spectrum sensing techniques, the proposed CNN method does not require a statistical decision process and does not need to know the distinct features of signals beforehand. Implementation of the method on the measured overthe-air real-world signals in cellular bands indicates important performance gains when compared to the signal classifying deep learning networks available in the literature and against classical sensing methods. Even though the implementation herein is over cellular signals, the proposed approach can be extended to the detection and classification of any signal that exhibits cyclostationary features. Finally, the measurement-based dataset which is utilized to validate the method is shared for the purposes of reproduction of the results and further research and development.

5.2LGSep 6, 2018

Deep Recurrent Electricity Theft Detection in AMI Networks with Random Tuning of Hyper-parameters

Mahmoud Nabil, Muhammad Ismail, Mohamed Mahmoud et al.

Modern smart grids rely on advanced metering infrastructure (AMI) networks for monitoring and billing purposes. However, such an approach suffers from electricity theft cyberattacks. Different from the existing research that utilizes shallow, static, and customer-specific-based electricity theft detectors, this paper proposes a generalized deep recurrent neural network (RNN)-based electricity theft detector that can effectively thwart these cyberattacks. The proposed model exploits the time series nature of the customers' electricity consumption to implement a gated recurrent unit (GRU)-RNN, hence, improving the detection performance. In addition, the proposed RNN-based detector adopts a random search analysis in its learning stage to appropriately fine-tune its hyper-parameters. Extensive test studies are carried out to investigate the detector's performance using publicly available real data of 107,200 energy consumption days from 200 customers. Simulation results demonstrate the superior performance of the proposed detector compared with state-of-the-art electricity theft detectors.