Sukhdeep Singh

CV
h-index37
14papers
328citations
Novelty40%
AI Score44

14 Papers

MANov 24, 2022
Melting Pot 2.0

John P. Agapiou, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán et al. · deepmind

Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a "substrate") with a reference set of co-players (a "background population"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.

LGMar 1, 2022
Learning Robust Real-Time Cultural Transmission without Human Data

Cultural General Intelligence Team, Avishkar Bhoopchand, Bethanie Brownfield et al.

Cultural transmission is the domain-general social skill that allows agents to acquire and use information from each other in real-time with high fidelity and recall. In humans, it is the inheritance process that powers cumulative cultural evolution, expanding our skills, tools and knowledge across generations. We provide a method for generating zero-shot, high recall cultural transmission in artificially intelligent agents. Our agents succeed at real-time cultural transmission from humans in novel contexts without using any pre-collected human data. We identify a surprisingly simple set of ingredients sufficient for generating cultural transmission and develop an evaluation methodology for rigorously assessing it. This paves the way for cultural evolution as an algorithm for developing artificial general intelligence.

CVSep 11, 2022
Lexicon and Attention based Handwritten Text Recognition System

Lalita Kumari, Sukhdeep Singh, VVS Rathore et al.

The handwritten text recognition problem is widely studied by the researchers of computer vision community due to its scope of improvement and applicability to daily lives, It is a sub-domain of pattern recognition. Due to advancement of computational power of computers since last few decades neural networks based systems heavily contributed towards providing the state-of-the-art handwritten text recognizers. In the same direction, we have taken two state-of-the art neural networks systems and merged the attention mechanism with it. The attention technique has been widely used in the domain of neural machine translations and automatic speech recognition and now is being implemented in text recognition domain. In this study, we are able to achieve 4.15% character error rate and 9.72% word error rate on IAM dataset, 7.07% character error rate and 16.14% word error rate on GW dataset after merging the attention and word beam search decoder with existing Flor et al. architecture. To analyse further, we have also used system similar to Shi et al. neural network system with greedy decoder and observed 23.27% improvement in character error rate from the base model.

CVMay 23, 2022
A Comprehensive Handwritten Paragraph Text Recognition System: LexiconNet

Lalita Kumari, Sukhdeep Singh, Vaibhav Varish Singh Rathore et al.

In this study, we have presented an efficient procedure using two state-of-the-art approaches from the literature of handwritten text recognition as Vertical Attention Network and Word Beam Search. The attention module is responsible for internal line segmentation that consequently processes a page in a line-by-line manner. At the decoding step, we have added a connectionist temporal classification-based word beam search decoder as a post-processing step. In this study, an end-to-end paragraph recognition system is presented with a lexicon decoder as a post-processing step. Our procedure reports state-of-the-art results on standard datasets. The reported character error rate is 3.24% on the IAM dataset with 27.19% improvement, 1.13% on RIMES with 40.83% improvement and 2.43% on the READ-16 dataset with 32.31% improvement from existing literature and the word error rate is 8.29% on IAM dataset with 43.02% improvement, 2.94% on RIMES dataset with 56.25% improvement and 7.35% on READ-2016 dataset with 47.27% improvement from the existing results. The character error rate and word error rate reported in this work surpass the results reported in the literature.

CVJul 11, 2022
A Lexicon and Depth-wise Separable Convolution Based Handwritten Text Recognition System

Lalita Kumari, Sukhdeep Singh, VVS Rathore et al.

Cursive handwritten text recognition is a challenging research problem in the domain of pattern recognition. The current state-of-the-art approaches include models based on convolutional recurrent neural networks and multi-dimensional long short-term memory recurrent neural networks techniques. These methods are highly computationally extensive as well model is complex at design level. In recent studies, combination of convolutional neural network and gated convolutional neural networks based models demonstrated less number of parameters in comparison to convolutional recurrent neural networks based models. In the direction to reduced the total number of parameters to be trained, in this work, we have used depthwise convolution in place of standard convolutions with a combination of gated-convolutional neural network and bidirectional gated recurrent unit to reduce the total number of parameters to be trained. Additionally, we have also included a lexicon based word beam search decoder at testing step. It also helps in improving the the overall accuracy of the model. We have obtained 3.84% character error rate and 9.40% word error rate on IAM dataset; 4.88% character error rate and 14.56% word error rate in George Washington dataset, respectively.

AIMay 10
Towards Conversational Medical AI with Eyes, Ears and a Voice

Meet Shah, Jason Gusdorf, Anil Palepu et al.

The practice of medicine relies not only upon skillful dialogue but also on the nuanced exchange and interpretation of rich auditory and visual cues between doctors and patients. Building on the low-latency voice and video processing capabilities of Gemini, we introduce AI co-clinician, a first-of-its-kind conversational AI system utilizing continuous streams of audio-visual data from live patient conversations to inform real-time clinical decisions. Its dual-agent architecture balances deep clinical reasoning with the low latency required for natural dialogue. To assess this system, we implemented a video-based interface emulating telemedicine consultations. We crafted 20 standardized outpatient scenarios requiring proactive real-time auditory and visual reasoning and designed "TelePACES" evaluation criteria alongside case-specific rubrics. In a randomized, interface-blinded, crossover simulation study (n = 120 encounters) with 10 internal medicine residents as patient actors, we compared AI co-clinician with primary care physicians (PCPs), GPT-Realtime, and a baseline agent. AI co-clinician approached PCPs in key TelePACES dimensions, including management plans and differential diagnosis, while significantly outperforming GPT-Realtime across all general criteria. While our agent demonstrated parity with PCPs in case-specific triage measures, physicians maintained superior overall performance in case-specific assessments. Although AI co-clinician marks a significant advance in real-time telemedical AI, gaps remain in physical examination and disease-specific reasoning. Our work shows that text-only approaches fail to capture the true challenges of medical consultation and suggests that high-stakes real-time diagnostic AI is most safely advanced in collaborative, triadic models where AI can be a supportive co-clinician for doctors and patients.

CVAug 15, 2021Code
HCR-Net: A deep learning based script independent handwritten character recognition network

Vinod Kumar Chauhan, Sukhdeep Singh, Anuj Sharma

Handwritten character recognition (HCR) remains a challenging pattern recognition problem despite decades of research, and lacks research on script independent recognition techniques. {\color{black}This is mainly because of similar character structures, different handwriting styles, diverse scripts, handcrafted feature extraction techniques, unavailability of data and code, and the development of script-specific deep learning techniques. To address these limitations, we have proposed a script independent deep learning network for HCR research, called HCR-Net, that sets a new research direction for the field. HCR-Net is based on a novel transfer learning approach for HCR, which \textit{partly utilizes} feature extraction layers of a pre-trained network.} Due to transfer learning and image augmentation, HCR-Net provides faster and computationally efficient training, better performance and generalizations, and can work with small datasets. HCR-Net is extensively evaluated on 40 publicly available datasets of Bangla, Punjabi, Hindi, English, Swedish, Urdu, Farsi, Tibetan, Kannada, Malayalam, Telugu, Marathi, Nepali and Arabic languages, and established 26 new benchmark results while performed close to the best results in the rest cases. HCR-Net showed performance improvements up to 11\% against the existing results and achieved a fast convergence rate showing up to 99\% of final performance in the very first epoch. HCR-Net significantly outperformed the state-of-the-art transfer learning techniques and also reduced the number of trainable parameters by 34\% as compared with the corresponding pre-trained network. To facilitate reproducibility and further advancements of HCR research, the complete code is publicly released at \url{https://github.com/jmdvinodjmd/HCR-Net}.

CVApr 10, 2024
An inclusive review on deep learning techniques and their scope in handwriting recognition

Sukhdeep Singh, Sudhir Rohilla, Anuj Sharma

Deep learning expresses a category of machine learning algorithms that have the capability to combine raw inputs into intermediate features layers. These deep learning algorithms have demonstrated great results in different fields. Deep learning has particularly witnessed for a great achievement of human level performance across a number of domains in computer vision and pattern recognition. For the achievement of state-of-the-art performances in diverse domains, the deep learning used different architectures and these architectures used activation functions to perform various computations between hidden and output layers of any architecture. This paper presents a survey on the existing studies of deep learning in handwriting recognition field. Even though the recent progress indicates that the deep learning methods has provided valuable means for speeding up or proving accurate results in handwriting recognition, but following from the extensive literature survey, the present study finds that the deep learning has yet to revolutionize more and has to resolve many of the most pressing challenges in this field, but promising advances have been made on the prior state of the art. Additionally, an inadequate availability of labelled data to train presents problems in this domain. Nevertheless, the present handwriting recognition survey foresees deep learning enabling changes at both bench and bedside with the potential to transform several domains as image processing, speech recognition, computer vision, machine translation, robotics and control, medical imaging, medical information processing, bio-informatics, natural language processing, cyber security, and many others.

CVApr 22, 2024
GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System

Lalita Kumari, Sukhdeep Singh, Vaibhav Varish Singh Rathore et al.

The Handwritten Text Recognition problem has been a challenge for researchers for the last few decades, especially in the domain of computer vision, a subdomain of pattern recognition. Variability of texts amongst writers, cursiveness, and different font styles of handwritten texts with degradation of historical text images make it a challenging problem. Recognizing scanned document images in neural network-based systems typically involves a two-step approach: segmentation and recognition. However, this method has several drawbacks. These shortcomings encompass challenges in identifying text regions, analyzing layout diversity within pages, and establishing accurate ground truth segmentation. Consequently, these processes are prone to errors, leading to bottlenecks in achieving high recognition accuracies. Thus, in this study, we present an end-to-end paragraph recognition system that incorporates internal line segmentation and gated convolutional layers based encoder. The gating is a mechanism that controls the flow of information and allows to adaptively selection of the more relevant features in handwritten text recognition models. The attention module plays an important role in performing internal line segmentation, allowing the page to be processed line-by-line. During the decoding step, we have integrated a connectionist temporal classification-based word beam search decoder as a post-processing step. In this work, we have extended existing LexiconNet by carefully applying and utilizing gated convolutional layers in the existing deep neural network. Our results at line and page levels also favour our new GatedLexiconNet. This study reported character error rates of 2.27% on IAM, 0.9% on RIMES, and 2.13% on READ-16, and word error rates of 5.73% on IAM, 2.76% on RIMES, and 6.52% on READ-2016 datasets.

LGMar 22, 2024
GTAGCN: Generalized Topology Adaptive Graph Convolutional Networks

Sukhdeep Singh, Anuj Sharma, Vinod Kumar Chauhan · oxford

Graph Neural Networks (GNN) have emerged as a popular and standard approach for learning from graph-structured data. The literature on GNN highlights the potential of this evolving research area and its widespread adoption in real-life applications. However, most of the approaches are either new in concept or derived from specific techniques. Therefore, the potential of more than one approach in hybrid form has not been studied extensively, which can be well utilized for sequenced data or static data together. We derive a hybrid approach based on two established techniques as generalized aggregation networks and topology adaptive graph convolution networks that solve our purpose to apply on both types of sequenced and static nature of data, effectively. The proposed method applies to both node and graph classification. Our empirical analysis reveals that the results are at par with literature results and better for handwritten strokes as sequenced data, where graph structures have not been explored.

NIOct 17, 2025
Agentic AI for Ultra-Modern Networks: Multi-Agent Framework for RAN Autonomy and Assurance

Sukhdeep Singh, Avinash Bhat, Shweta M et al.

The increasing complexity of Beyond 5G and 6G networks necessitates new paradigms for autonomy and assur- ance. Traditional O-RAN control loops rely heavily on RIC- based orchestration, which centralizes intelligence and exposes the system to risks such as policy conflicts, data drift, and unsafe actions under unforeseen conditions. In this work, we argue that the future of autonomous networks lies in a multi-agentic architecture, where specialized agents collaborate to perform data collection, model training, prediction, policy generation, verification, deployment, and assurance. By replacing tightly- coupled centralized RIC-based workflows with distributed agents, the framework achieves autonomy, resilience, explainability, and system-wide safety. To substantiate this vision, we design and evaluate a traffic steering use case under surge and drift conditions. Results across four KPIs: RRC connected users, IP throughput, PRB utilization, and SINR, demonstrate that a naive predictor-driven deployment improves local KPIs but destabilizes neighbors, whereas the agentic system blocks unsafe policies, preserving global network health. This study highlights multi- agent architectures as a credible foundation for trustworthy AI- driven autonomy in next-generation RANs.

CVMay 15, 2024
Graph Neural Network based Handwritten Trajectories Recognition

Anuj Sharma, Sukhdeep Singh, S Ratna

The graph neural networks has been proved to be an efficient machine learning technique in real life applications. The handwritten recognition is one of the useful area in real life use where both offline and online handwriting recognition are required. The chain code as feature extraction technique has shown significant results in literature and we have been able to use chain codes with graph neural networks. To the best of our knowledge, this work presents first time a novel combination of handwritten trajectories features as chain codes and graph neural networks together. The handwritten trajectories for offline handwritten text has been evaluated using recovery of drawing order, whereas online handwritten trajectories are directly used with chain codes. Our results prove that present combination surpass previous results and minimize error rate in few epochs only.

NIFeb 4, 2022
Predictive Closed-Loop Service Automation in O-RAN based Network Slicing

Joseph Thaliath, Solmaz Niknam, Sukhdeep Singh et al.

Network slicing provides introduces customized and agile network deployment for managing different service types for various verticals under the same infrastructure. To cater to the dynamic service requirements of these verticals and meet the required quality-of-service (QoS) mentioned in the service-level agreement (SLA), network slices need to be isolated through dedicated elements and resources. Additionally, allocated resources to these slices need to be continuously monitored and intelligently managed. This enables immediate detection and correction of any SLA violation to support automated service assurance in a closed-loop fashion. By reducing human intervention, intelligent and closed-loop resource management reduces the cost of offering flexible services. Resource management in a network shared among verticals (potentially administered by different providers), would be further facilitated through open and standardized interfaces. Open radio access network (O-RAN) is perhaps the most promising RAN architecture that inherits all the aforementioned features, namely intelligence, open and standard interfaces, and closed control loop. Inspired by this, in this article we provide a closed-loop and intelligent resource provisioning scheme for O-RAN slicing to prevent SLA violations. In order to maintain realism, a real-world dataset of a large operator is used to train a learning solution for optimizing resource utilization in the proposed closed-loop service automation process. Moreover, the deployment architecture and the corresponding flow that are cognizant of the O-RAN requirements are also discussed.

SPMay 17, 2020
Intelligent O-RAN for Beyond 5G and 6G Wireless Networks

Solmaz Niknam, Abhishek Roy, Harpreet S. Dhillon et al.

Building on the principles of openness and intelligence, there has been a concerted global effort from the operators towards enhancing the radio access network (RAN) architecture. The objective is to build an operator-defined RAN architecture (and associated interfaces) on open hardware that provides intelligent radio control for beyond fifth generation (5G) as well as future sixth generation (6G) wireless networks. Specifically, the open-radio access network (O-RAN) alliance has been formed by merging xRAN forum and C-RAN alliance to formally define the requirements that would help achieve this objective. Owing to the importance of O-RAN in the current wireless landscape, this article provides an introduction to the concepts, principles, and requirements of the Open RAN as specified by the O-RAN alliance. In order to illustrate the role of intelligence in O-RAN, we propose an intelligent radio resource management scheme to handle traffic congestion and demonstrate its efficacy on a real-world dataset obtained from a large operator. A high-level architecture of this deployment scenario that is compliant with the O-RAN requirements is also discussed. The article concludes with key technical challenges and open problems for future research and development.