SPApr 2, 2023
Deep Graph Unfolding for Beamforming in MU-MIMO Interference NetworksArindam Chowdhury, Gunjan Verma, Ananthram Swami et al.
We develop an efficient and near-optimal solution for beamforming in multi-user multiple-input-multiple-output single-hop wireless ad-hoc interference networks. Inspired by the weighted minimum mean squared error (WMMSE) method, a classical approach to solving this problem, and the principle of algorithm unfolding, we present unfolded WMMSE (UWMMSE) for MU-MIMO. This method learns a parameterized functional transformation of key WMMSE parameters using graph neural networks (GNNs), where the channel and interference components of a wireless network constitute the underlying graph. These GNNs are trained through gradient descent on a network utility metric using multiple instances of the beamforming problem. Comprehensive experimental analyses illustrate the superiority of UWMMSE over the classical WMMSE and state-of-the-art learning-based methods in terms of performance, generalizability, and robustness.
IVAug 27, 2024
Automatic Detection of COVID-19 from Chest X-ray Images Using Deep Learning ModelAlloy Das, Rohit Agarwal, Rituparna Singh et al.
The infectious disease caused by novel corona virus (2019-nCoV) has been widely spreading since last year and has shaken the entire world. It has caused an unprecedented effect on daily life, global economy and public health. Hence this disease detection has life-saving importance for both patients as well as doctors. Due to limited test kits, it is also a daunting task to test every patient with severe respiratory problems using conventional techniques (RT-PCR). Thus implementing an automatic diagnosis system is urgently required to overcome the scarcity problem of Covid-19 test kits at hospital, health care systems. The diagnostic approach is mainly classified into two categories-laboratory based and Chest radiography approach. In this paper, a novel approach for computerized corona virus (2019-nCoV) detection from lung x-ray images is presented. Here, we propose models using deep learning to show the effectiveness of diagnostic systems. In the experimental result, we evaluate proposed models on publicly available data-set which exhibit satisfactory performance and promising results compared with other previous existing methods.
LGOct 7, 2025Code
When Does Global Attention Help? A Unified Empirical Study on Atomistic Graph LearningArindam Chowdhury, Massimiliano Lupo Pasini
Graph neural networks (GNNs) are widely used as surrogates for costly experiments and first-principles simulations to study the behavior of compounds at atomistic scale, and their architectural complexity is constantly increasing to enable the modeling of complex physics. While most recent GNNs combine more traditional message passing neural networks (MPNNs) layers to model short-range interactions with more advanced graph transformers (GTs) with global attention mechanisms to model long-range interactions, it is still unclear when global attention mechanisms provide real benefits over well-tuned MPNN layers due to inconsistent implementations, features, or hyperparameter tuning. We introduce the first unified, reproducible benchmarking framework - built on HydraGNN - that enables seamless switching among four controlled model classes: MPNN, MPNN with chemistry/topology encoders, GPS-style hybrids of MPNN with global attention, and fully fused local - global models with encoders. Using seven diverse open-source datasets for benchmarking across regression and classification tasks, we systematically isolate the contributions of message passing, global attention, and encoder-based feature augmentation. Our study shows that encoder-augmented MPNNs form a robust baseline, while fused local-global models yield the clearest benefits for properties governed by long-range interaction effects. We further quantify the accuracy - compute trade-offs of attention, reporting its overhead in memory. Together, these results establish the first controlled evaluation of global attention in atomistic graph learning and provide a reproducible testbed for future model development.
COMP-PHJun 10, 2025
Exploring the Capabilities of the Frontier Large Language Models for Nuclear Energy ResearchAhmed Almeldein, Mohammed Alnaggar, Rick Archibald et al.
The AI for Nuclear Energy workshop at Oak Ridge National Laboratory evaluated the potential of Large Language Models (LLMs) to accelerate fusion and fission research. Fourteen interdisciplinary teams explored diverse nuclear science challenges using ChatGPT, Gemini, Claude, and other AI models over a single day. Applications ranged from developing foundation models for fusion reactor control to automating Monte Carlo simulations, predicting material degradation, and designing experimental programs for advanced reactors. Teams employed structured workflows combining prompt engineering, deep research capabilities, and iterative refinement to generate hypotheses, prototype code, and research strategies. Key findings demonstrate that LLMs excel at early-stage exploration, literature synthesis, and workflow design, successfully identifying research gaps and generating plausible experimental frameworks. However, significant limitations emerged, including difficulties with novel materials designs, advanced code generation for modeling and simulation, and domain-specific details requiring expert validation. The successful outcomes resulted from expert-driven prompt engineering and treating AI as a complementary tool rather than a replacement for physics-based methods. The workshop validated AI's potential to accelerate nuclear energy research through rapid iteration and cross-disciplinary synthesis while highlighting the need for curated nuclear-specific datasets, workflow automation, and specialized model development. These results provide a roadmap for integrating AI tools into nuclear science workflows, potentially reducing development cycles for safer, more efficient nuclear energy systems while maintaining rigorous scientific standards.
SPJan 18, 2024
Learning Non-myopic Power Allocation in Constrained ScenariosArindam Chowdhury, Santiago Paternain, Gunjan Verma et al.
We propose a learning-based framework for efficient power allocation in ad hoc interference networks under episodic constraints. The problem of optimal power allocation -- for maximizing a given network utility metric -- under instantaneous constraints has recently gained significant popularity. Several learnable algorithms have been proposed to obtain fast, effective, and near-optimal performance. However, a more realistic scenario arises when the utility metric has to be optimized for an entire episode under time-coupled constraints. In this case, the instantaneous power needs to be regulated so that the given utility can be optimized over an entire sequence of wireless network realizations while satisfying the constraint at all times. Solving each instance independently will be myopic as the long-term constraint cannot modulate such a solution. Instead, we frame this as a constrained and sequential decision-making problem, and employ an actor-critic algorithm to obtain the constraint-aware power allocation at each step. We present experimental analyses to illustrate the effectiveness of our method in terms of superior episodic network-utility performance and its efficiency in terms of time and computational complexity.
SPOct 14, 2021
Stability Analysis of Unfolded WMMSE for Power AllocationArindam Chowdhury, Fernando Gama, Santiago Segarra
Power allocation is one of the fundamental problems in wireless networks and a wide variety of algorithms address this problem from different perspectives. A common element among these algorithms is that they rely on an estimation of the channel state, which may be inaccurate on account of hardware defects, noisy feedback systems, and environmental and adversarial disturbances. Therefore, it is essential that the output power allocation of these algorithms is stable with respect to input perturbations, to the extent that the variations in the output are bounded for bounded variations in the input. In this paper, we focus on UWMMSE -- a modern algorithm leveraging graph neural networks --, and illustrate its stability to additive input perturbations of bounded energy through both theoretical analysis and empirical validation.
LGOct 7, 2021
Label Propagation across Graphs: Node Classification using Graph Neural Tangent KernelsArtun Bayer, Arindam Chowdhury, Santiago Segarra
Graph neural networks (GNNs) have achieved superior performance on node classification tasks in the last few years. Commonly, this is framed in a transductive semi-supervised learning setup wherein the entire graph, including the target nodes to be labeled, is available for training. Driven in part by scalability, recent works have focused on the inductive case where only the labeled portion of a graph is available for training. In this context, our current work considers a challenging inductive setting where a set of labeled graphs are available for training while the unlabeled target graph is completely separate, i.e., there are no connections between labeled and unlabeled nodes. Under the implicit assumption that the testing and training graphs come from similar distributions, our goal is to develop a labeling function that generalizes to unobserved connectivity structures. To that end, we employ a graph neural tangent kernel (GNTK) that corresponds to infinitely wide GNNs to find correspondences between nodes in different graphs based on both the topology and the node features. We augment the capabilities of the GNTK with residual connections and empirically illustrate its performance gains on standard benchmarks.
LGFeb 20, 2021
GIST: Distributed Training for Large-Scale Graph Convolutional NetworksCameron R. Wolfe, Jingkang Yang, Arindam Chowdhury et al.
The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters. Although some work has explored training on large-scale graphs (e.g., GraphSAGE, ClusterGCN, etc.), we pioneer efficient training of large-scale GCN models (i.e., ultra-wide, overparameterized models) with the proposal of a novel, distributed training framework. Our proposed training methodology, called GIST, disjointly partitions the parameters of a GCN model into several, smaller sub-GCNs that are trained independently and in parallel. In addition to being compatible with all GCN architectures and existing sampling techniques for efficient GCN training, GIST i) improves model performance, ii) scales to training on arbitrarily large graphs, iii) decreases wall-clock training time, and iv) enables the training of markedly overparameterized GCN models. Remarkably, with GIST, we train an astonishgly-wide 32,768-dimensional GraphSAGE model, which exceeds the capacity of a single GPU by a factor of 8x, to SOTA performance on the Amazon2M dataset.
SPNov 18, 2020
Efficient power allocation using graph neural networks and deep algorithm unfoldingArindam Chowdhury, Gunjan Verma, Chirag Rao et al.
We study the problem of optimal power allocation in a single-hop ad hoc wireless network. In solving this problem, we propose a hybrid neural architecture inspired by the algorithmic unfolding of the iterative weighted minimum mean squared error (WMMSE) method, that we denote as unfolded WMMSE (UWMMSE). The learnable weights within UWMMSE are parameterized using graph neural networks (GNNs), where the time-varying underlying graphs are given by the fading interference coefficients in the wireless network. These GNNs are trained through a gradient descent approach based on multiple instances of the power allocation problem. Once trained, UWMMSE achieves performance comparable to that of WMMSE while significantly reducing the computational complexity. This phenomenon is illustrated through numerical experiments along with the robustness and generalization to wireless networks of different densities and sizes.
CVNov 21, 2019
ChartNet: Visual Reasoning over Statistical Charts using MAC-NetworksMonika Sharma, Shikha Gupta, Arindam Chowdhury et al.
Despite the improvements in perception accuracies brought about via deep learning, developing systems combining accurate visual perception with the ability to reason over the visual percepts remains extremely challenging. A particular application area of interest from an accessibility perspective is that of reasoning over statistical charts such as bar and pie charts. To this end, we formulate the problem of reasoning over statistical charts as a classification task using MAC-Networks to give answers from a predefined vocabulary of generic answers. Additionally, we enhance the capabilities of MAC-Networks to give chart-specific answers to open-ended questions by replacing the classification layer by a regression layer to localize the textual answers present over the images. We call our network ChartNet, and demonstrate its efficacy on predicting both in vocabulary and out of vocabulary answers. To test our methods, we generated our own dataset of statistical chart images and corresponding question answer pairs. Results show that ChartNet consistently outperform other state-of-the-art methods on reasoning over these questions and may be a viable candidate for applications containing images of statistical charts.
CVDec 11, 2018
Reading Industrial Inspection Sheets by Inferring Visual RelationsRohit Rahul, Arindam Chowdhury, Animesh et al.
The traditional mode of recording faults in heavy factory equipment has been via hand marked inspection sheets, wherein a machine engineer manually marks the faulty machine regions on a paper outline of the machine. Over the years, millions of such inspection sheets have been recorded and the data within these sheets has remained inaccessible. However, with industries going digital and waking up to the potential value of fault data for machine health monitoring, there is an increased impetus towards digitization of these hand marked inspection records. To target this digitization, we propose a novel visual pipeline combining state of the art deep learning models, with domain knowledge and low level vision techniques, followed by inference of visual relationships. Our framework is robust to the presence of both static and non-static background in the document, variability in the machine template diagrams, unstructured shape of graphical objects to be identified and variability in the strokes of handwritten text. The proposed pipeline incorporates a capsule and spatial transformer network based classifier for accurate text reading, and a customized CTPN network for text detection in addition to hybrid techniques for arrow detection and dialogue cloud removal. We have tested our approach on a real world dataset of 50 inspection sheets for large containers and boilers. The results are visually appealing and the pipeline achieved an accuracy of 87.1% for text detection and 94.6% for text reading.
CVDec 11, 2018
Deep Reader: Information extraction from Document images via relation extraction and Natural LanguageVishwanath D, Rohit Rahul, Gunjan Sehgal et al.
Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant information extraction as documents also have a visual structure that is not captured by OCR. Extracting information from tables, charts, footnotes, boxes, headings and retrieving the corresponding structured representation for the document remains a challenge and finds application in a large number of real-world use cases. In this paper, we propose a novel enterprise based end-to-end framework called DeepReader which facilitates information extraction from document images via identification of visual entities and populating a meta relational model across different entities in the document image. The model schema allows for an easy to understand abstraction of the entities detected by the deep vision models and the relationships between them. DeepReader has a suite of state-of-the-art vision algorithms which are applied to recognize handwritten and printed text, eliminate noisy effects, identify the type of documents and detect visual entities like tables, lines and boxes. Deep Reader maps the extracted entities into a rich relational schema so as to capture all the relevant relationships between entities (words, textboxes, lines etc) detected in the document. Relevant information and fields can then be extracted from the document by writing SQL queries on top of the relationship tables. A natural language based interface is added on top of the relationship schema so that a non-technical user, specifying the queries in natural language, can fetch the information with minimal effort. In this paper, we also demonstrate many different capabilities of Deep Reader and report results on a real-world use case.
CLJul 20, 2018
An Efficient End-to-End Neural Model for Handwritten Text RecognitionArindam Chowdhury, Lovekesh Vig
Offline handwritten text recognition from images is an important problem for enterprises attempting to digitize large volumes of handmarked scanned documents/reports. Deep recurrent models such as Multi-dimensional LSTMs have been shown to yield superior performance over traditional Hidden Markov Model based approaches that suffer from the Markov assumption and therefore lack the representational power of RNNs. In this paper we introduce a novel approach that combines a deep convolutional network with a recurrent Encoder-Decoder network to map an image to a sequence of characters corresponding to the text present in the image. The entire model is trained end-to-end using Focal Loss, an improvement over the standard Cross-Entropy loss that addresses the class imbalance problem, inherent to text recognition. To enhance the decoding capacity of the model, Beam Search algorithm is employed which searches for the best sequence out of a set of hypotheses based on a joint distribution of individual characters. Our model takes as input a downsampled version of the original image thereby making it both computationally and memory efficient. The experimental results were benchmarked against two publicly available datasets, IAM and RIMES. We surpass the state-of-the-art word level accuracy on the evaluation set of both datasets by 3.5% & 1.1%, respectively.