Shaikat M. Galib

h-index5

3papers

9citations

Novelty42%

AI Score27

Ranked #154,377 of 194,257 authors (top 79%)#50,204 in CV (top 85%)

3 Papers

9.6CVOct 17, 2024

H2OVL-Mississippi Vision Language Models Technical Report

Shaikat Galib, Shanshan Wang, Guanshuo Xu et al.

Smaller vision-language models (VLMs) are becoming increasingly important for privacy-focused, on-device applications due to their ability to run efficiently on consumer hardware for processing enterprise commercial documents and images. These models require strong language understanding and visual capabilities to enhance human-machine interaction. To address this need, we present H2OVL-Mississippi, a pair of small VLMs trained on 37 million image-text pairs using 240 hours of compute on 8 x H100 GPUs. H2OVL-Mississippi-0.8B is a tiny model with 0.8 billion parameters that specializes in text recognition, achieving state of the art performance on the Text Recognition portion of OCRBench and surpassing much larger models in this area. Additionally, we are releasing H2OVL-Mississippi-2B, a 2 billion parameter model for general use cases, exhibiting highly competitive metrics across various academic benchmarks. Both models build upon our prior work with H2O-Danube language models, extending their capabilities into the visual domain. We release them under the Apache 2.0 license, making VLMs accessible to everyone, democratizing document AI and visual LLMs.

1.2LGDec 9, 2020

LSTM recurrent neural network assisted aircraft stall prediction for enhanced situational awareness

Tahsin Sejat Saniat, Tahiat Goni, Shaikat M. Galib

Since the dawn of mankind's introduction to powered flights, there have been multiple incidents which can be attributed to aircraft stalls. Most modern-day aircraft are equipped with advanced warning systems to warn the pilots about a potential stall, so that pilots may adopt the necessary recovery measures. But these warnings often have a short window before the aircraft actually enters a stall and require the pilots to act promptly to prevent it. In this paper, we propose a deep learning based approach to predict an Impending stall, well in advance, even before the stall-warning is triggered. We leverage the capabilities of long short-term memory (LSTM) recurrent neural networks (RNN) and propose a novel approach to predict potential stalls from the sequential in-flight sensor data. Three different neural network architectures were explored. The neural network models, trained on 26400 seconds of simulator flight data are able to predict a potential stall with > 95% accuracy, approximately 10 seconds in advance of the stall-warning trigger. This can significantly augment the Pilot's preparedness to handle an unexpected stall and will add an additional layer of safety to the traditional stall warning systems.

1.1CVNov 29, 2016Code

Computer Aided Detection of Oral Lesions on CT Images

Shaikat Galib, Fahima Islam, Muhammad Abir et al.

Oral lesions are important findings on computed tomography (CT) images. In this study, a fully automatic method to detect oral lesions in mandibular region from dental CT images is proposed. Two methods were developed to recognize two types of lesions namely (1) Close border (CB) lesions and (2) Open border (OB) lesions, which cover most of the lesion types that can be found on CT images. For the detection of CB lesions, fifteen features were extracted from each initial lesion candidates and multi layer perceptron (MLP) neural network was used to classify suspicious regions. Moreover, OB lesions were detected using a rule based image processing method, where no feature extraction or classification algorithm were used. The results were validated using a CT dataset of 52 patients, where 22 patients had abnormalities and 30 patients were normal. Using non-training dataset, CB detection algorithm yielded 71% sensitivity with 0.31 false positives per patient. Furthermore, OB detection algorithm achieved 100% sensitivity with 0.13 false positives per patient. Results suggest that, the proposed framework, which consists of two methods, has the potential to be used in clinical context, and assist radiologists for better diagnosis.