Somya D. Mohanty

CR
h-index11
4papers
45citations
Novelty36%
AI Score30

4 Papers

4.9CLAug 4, 2025
Automated SNOMED CT Concept Annotation in Clinical Text Using Bi-GRU Neural Networks

Ali Noori, Pratik Devkota, Somya Mohanty et al.

Automated annotation of clinical text with standardized medical concepts is critical for enabling structured data extraction and decision support. SNOMED CT provides a rich ontology for labeling clinical entities, but manual annotation is labor-intensive and impractical at scale. This study introduces a neural sequence labeling approach for SNOMED CT concept recognition using a Bidirectional GRU model. Leveraging a subset of MIMIC-IV, we preprocess text with domain-adapted SpaCy and SciBERT-based tokenization, segmenting sentences into overlapping 19-token chunks enriched with contextual, syntactic, and morphological features. The Bi-GRU model assigns IOB tags to identify concept spans and achieves strong performance with a 90 percent F1-score on the validation set. These results surpass traditional rule-based systems and match or exceed existing neural models. Qualitative analysis shows effective handling of ambiguous terms and misspellings. Our findings highlight that lightweight RNN-based architectures can deliver high-quality clinical concept annotation with significantly lower computational cost than transformer-based models, making them well-suited for real-world deployment.

3.3SIJan 2, 2021
A multi-modal approach towards mining social media data during natural disasters -- a case study of Hurricane Irma

Somya D. Mohanty, Brown Biggers, Saed Sayedahmed et al.

Streaming social media provides a real-time glimpse of extreme weather impacts. However, the volume of streaming data makes mining information a challenge for emergency managers, policy makers, and disciplinary scientists. Here we explore the effectiveness of data learned approaches to mine and filter information from streaming social media data from Hurricane Irma's landfall in Florida, USA. We use 54,383 Twitter messages (out of 784K geolocated messages) from 16,598 users from Sept. 10 - 12, 2017 to develop 4 independent models to filter data for relevance: 1) a geospatial model based on forcing conditions at the place and time of each tweet, 2) an image classification model for tweets that include images, 3) a user model to predict the reliability of the tweeter, and 4) a text model to determine if the text is related to Hurricane Irma. All four models are independently tested, and can be combined to quickly filter and visualize tweets based on user-defined thresholds for each submodel. We envision that this type of filtering and visualization routine can be useful as a base model for data capture from noisy sources such as Twitter. The data can then be subsequently used by policy makers, environmental managers, emergency managers, and domain scientists interested in finding tweets with specific attributes to use during different stages of the disaster (e.g., preparedness, response, and recovery), or for detailed research.

4.2CROct 16, 2018
A Scalable, Trustworthy Infrastructure for Collaborative Container Repositories

Franklin Wei, Mahalingam Ramkumar, Somya D Mohanty

We present a scalable "Trustworthy Container Repository" (TCR) infrastructure for the storage of software container images, such as those used by Docker. Using an authenticated data structure based on index-ordered Merkle trees (IOMTs), TCR aims to provide assurances of 1) Integrity, 2) Availability, and 3) Confidentiality to its users, whose containers are stored in an untrusted environment. Trust within the TCR architecture is rooted in a low-complexity, tamper-resistant trusted module. The use of IOMTs allows such a module to efficiently track a virtually unlimited number of container images, and thus provide the desired assurances for the system's users. Using a simulated version of the proposed system, we demonstrate the scalability of platform by showing logarithmic time complexity up to $2^{25}$ (32 million) container images. This paper presents both algorithmic and proof-of-concept software implementations of the proposed TCR infrastructure.

5.8CRMay 3, 2018
What we learn from learning - Understanding capabilities and limitations of machine learning in botnet attacks

David Santana, Shan Suthaharan, Somya Mohanty

With a growing increase in botnet attacks, computer networks are constantly under threat from attacks that cripple cyber-infrastructure. Detecting these attacks in real-time proves to be a difficult and resource intensive task. One of the pertinent methods to detect such attacks is signature based detection using machine learning models. This paper explores the efficacy of these models at detecting botnet attacks, using data captured from large-scale network attacks. Our study provides a comprehensive overview of performance characteristics two machine learning models --- Random Forest and Multi-Layer Perceptron (Deep Learning) in such attack scenarios. Using Big Data analytics, the study explores the advantages, limitations, model/feature parameters, and overall performance of using machine learning in botnet attacks / communication. With insights gained from the analysis, this work recommends algorithms/models for specific attacks of botnets instead of a generalized model.