CVApr 27, 2023Code
Exploiting Inductive Bias in Transformer for Point Cloud Classification and SegmentationZihao Li, Pan Gao, Hui Yuan et al.
Discovering inter-point connection for efficient high-dimensional feature extraction from point coordinate is a key challenge in processing point cloud. Most existing methods focus on designing efficient local feature extractors while ignoring global connection, or vice versa. In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations, which considers both local and global attentions. Specifically, considering local spatial coherence, local feature learning is performed through Relative Position Encoding and Attentive Feature Pooling. We incorporate the learned locality into the Transformer module. The local feature affects value component in Transformer to modulate the relationship between channels of each point, which can enhance self-attention mechanism with locality based channel interaction. We demonstrate its superiority experimentally on classification and segmentation tasks. The code is available at: https://github.com/jiamang/IBT
CVSep 4, 2022
Rice Leaf Disease Classification and Detection Using YOLOv5Md Ershadul Haque, Ashikur Rahman, Iftekhar Junaeid et al.
A staple food in more than a hundred nations worldwide is rice (Oryza sativa). The cultivation of rice is vital to global economic growth. However, the main issue facing the agricultural industry is rice leaf disease. The quality and quantity of the crops have declined, and this is the main cause. As farmers in any country do not have much knowledge about rice leaf disease, they cannot diagnose rice leaf disease properly. That's why they cannot take proper care of rice leaves. As a result, the production is decreasing. From literature survey, it has seen that YOLOv5 exhibit the better result compare to others deep learning method. As a result of the continual advancement of object detection technology, YOLO family algorithms, which have extraordinarily high precision and better speed have been used in various scene recognition tasks to build rice leaf disease monitoring systems. We have annotate 1500 collected data sets and propose a rice leaf disease classification and detection method based on YOLOv5 deep learning. We then trained and evaluated the YOLOv5 model. The simulation outcomes show improved object detection result for the augmented YOLOv5 network proposed in this article. The required levels of recognition precision, recall, mAP value, and F1 score are 90\%, 67\%, 76\%, and 81\% respectively are considered as performance metrics.
CVOct 19, 2022
Vision-Based Robust Lane Detection and Tracking under Different Challenging Environmental ConditionsSamia Sultana, Boshir Ahmed, Manoranjan Paul et al.
Lane marking detection is fundamental for both advanced driving assistance systems. However, detecting lane is highly challenging when the visibility of a road lane marking is low due to real-life challenging environment and adverse weather. Most of the lane detection methods suffer from four types of challenges: (i) light effects i.e., shadow, glare of light, reflection etc.; (ii) Obscured visibility of eroded, blurred, colored and cracked lane caused by natural disasters and adverse weather; (iii) lane marking occlusion by different objects from surroundings (wiper, vehicles etc.); and (iv) presence of confusing lane like lines inside the lane view e.g., guardrails, pavement marking, road divider etc. Here, we propose a robust lane detection and tracking method with three key technologies. First, we introduce a comprehensive intensity threshold range (CITR) to improve the performance of the canny operator in detecting low intensity lane edges. Second, we propose a two-step lane verification technique, the angle based geometric constraint (AGC) and length-based geometric constraint (LGC) followed by Hough Transform, to verify the characteristics of lane marking and to prevent incorrect lane detection. Finally, we propose a novel lane tracking technique, by defining a range of horizontal lane position (RHLP) along the x axis which will be updating with respect to the lane position of previous frame. It can keep track of the lane position when either left or right or both lane markings are partially and fully invisible. To evaluate the performance of the proposed method we used the DSDLDE [1] and SLD [2] dataset with 1080x1920 and 480x720 resolutions at 24 and 25 frames/sec respectively. Experimental results show that the average detection rate is 97.55%, and the average processing time is 22.33 msec/frame, which outperform the state of-the-art method.
IVJul 13, 2023
Full-resolution Lung Nodule Segmentation from Chest X-ray Images using Residual Encoder-Decoder NetworksMichael James Horry, Subrata Chakraborty, Biswajeet Pradhan et al.
Lung cancer is the leading cause of cancer death and early diagnosis is associated with a positive prognosis. Chest X-ray (CXR) provides an inexpensive imaging mode for lung cancer diagnosis. Suspicious nodules are difficult to distinguish from vascular and bone structures using CXR. Computer vision has previously been proposed to assist human radiologists in this task, however, leading studies use down-sampled images and computationally expensive methods with unproven generalization. Instead, this study localizes lung nodules using efficient encoder-decoder neural networks that process full resolution images to avoid any signal loss resulting from down-sampling. Encoder-decoder networks are trained and tested using the JSRT lung nodule dataset. The networks are used to localize lung nodules from an independent external CXR dataset. Sensitivity and false positive rates are measured using an automated framework to eliminate any observer subjectivity. These experiments allow for the determination of the optimal network depth, image resolution and pre-processing pipeline for generalized lung nodule localization. We find that nodule localization is influenced by subtlety, with more subtle nodules being detected in earlier training epochs. Therefore, we propose a novel self-ensemble model from three consecutive epochs centered on the validation optimum. This ensemble achieved a sensitivity of 85% in 10-fold internal testing with false positives of 8 per image. A sensitivity of 81% is achieved at a false positive rate of 6 following morphological false positive reduction. This result is comparable to more computationally complex systems based on linear and spatial filtering, but with a sub-second inference time that is faster than other methods. The proposed algorithm achieved excellent generalization results against an external dataset with sensitivity of 77% at a false positive rate of 7.6.
QUANT-PHDec 14, 2022
A novel state connection strategy for quantum computing to represent and compress digital imagesMd Ershadul Haque, Manoranjan Paul, Tanmoy Debnath
Quantum image processing draws a lot of attention due to faster data computation and storage compared to classical data processing systems. Converting classical image data into the quantum domain and state label preparation complexity is still a challenging issue. The existing techniques normally connect the pixel values and the state position directly. Recently, the EFRQI (efficient flexible representation of the quantum image) approach uses an auxiliary qubit that connects the pixel-representing qubits to the state position qubits via Toffoli gates to reduce state connection. Due to the twice use of Toffoli gates for each pixel connection still it requires a significant number of bits to connect each pixel value. In this paper, we propose a new SCMFRQI (state connection modification FRQI) approach for further reducing the required bits by modifying the state connection using a reset gate rather than repeating the use of the same Toffoli gate connection as a reset gate. Moreover, unlike other existing methods, we compress images using block-level for further reduction of required qubits. The experimental results confirm that the proposed method outperforms the existing methods in terms of both image representation and compression points of view.
CVApr 25, 2022
Dynamic Point Cloud Compression with Cross-Sectional ApproachFaranak Tohidi, Manoranjan Paul, Anwaar Ulhaq
The recent development of dynamic point clouds has introduced the possibility of mimicking natural reality, and greatly assisting quality of life. However, to broadcast successfully, the dynamic point clouds require higher compression due to their huge volume of data compared to the traditional video. Recently, MPEG finalized a Video-based Point Cloud Compression standard known as V-PCC. However, V-PCC requires huge computational time due to expensive normal calculation and segmentation, sacrifices some points to limit the number of 2D patches, and cannot occupy all spaces in the 2D frame. The proposed method addresses these limitations by using a novel cross-sectional approach. This approach reduces expensive normal estimation and segmentation, retains more points, and utilizes more spaces for 2D frame generation compared to the VPCC. The experimental results using standard video sequences show that the proposed technique can achieve better compression in both geometric and texture data compared to the V-PCC standard.
QUANT-PHJun 22, 2023
Efficient quantum image representation and compression circuit using zero-discarded state preparation approachMd Ershadul Haque, Manoranjan Paul, Anwaar Ulhaq et al.
Quantum image computing draws a lot of attention due to storing and processing image data faster than classical. With increasing the image size, the number of connections also increases, leading to the circuit complex. Therefore, efficient quantum image representation and compression issues are still challenging. The encoding of images for representation and compression in quantum systems is different from classical ones. In quantum, encoding of position is more concerned which is the major difference from the classical. In this paper, a novel zero-discarded state connection novel enhance quantum representation (ZSCNEQR) approach is introduced to reduce complexity further by discarding '0' in the location representation information. In the control operational gate, only input '1' contribute to its output thus, discarding zero makes the proposed ZSCNEQR circuit more efficient. The proposed ZSCNEQR approach significantly reduced the required bit for both representation and compression. The proposed method requires 11.76\% less qubits compared to the recent existing method. The results show that the proposed approach is highly effective for representing and compressing images compared to the two relevant existing methods in terms of rate-distortion performance.
CVSep 28, 2022
Analysis and prediction of heart stroke from ejection fraction and serum creatinine using LSTM deep learning approachMd Ershadul Haque, Salah Uddin, Md Ariful Islam et al.
The combination of big data and deep learning is a world-shattering technology that can greatly impact any objective if used properly. With the availability of a large volume of health care datasets and progressions in deep learning techniques, systems are now well equipped to predict the future trend of any health problems. From the literature survey, we found the SVM was used to predict the heart failure rate without relating objective factors. Utilizing the intensity of important historical information in electronic health records (EHR), we have built a smart and predictive model utilizing long short-term memory (LSTM) and predict the future trend of heart failure based on that health record. Hence the fundamental commitment of this work is to predict the failure of the heart using an LSTM based on the patient's electronic medicinal information. We have analyzed a dataset containing the medical records of 299 heart failure patients collected at the Faisalabad Institute of Cardiology and the Allied Hospital in Faisalabad (Punjab, Pakistan). The patients consisted of 105 women and 194 men and their ages ranged from 40 and 95 years old. The dataset contains 13 features, which report clinical, body, and lifestyle information responsible for heart failure. We have found an increasing trend in our analysis which will contribute to advancing the knowledge in the field of heart stroke prediction.
CVAug 17, 2022
Efficient dynamic point cloud coding using Slice-Wise SegmentationFaranak Tohidi, Manoranjan Paul, Anwaar Ulhaq
With the fast growth of immersive video sequences, achieving seamless and high-quality compressed 3D content is even more critical. MPEG recently developed a video-based point cloud compression (V-PCC) standard for dynamic point cloud coding. However, reconstructed point clouds using V-PCC suffer from different artifacts, including losing data during pre-processing before applying existing video coding techniques, e.g., High-Efficiency Video Coding (HEVC). Patch generations and self-occluded points in the 3D to the 2D projection are the main reasons for missing data using V-PCC. This paper proposes a new method that introduces overlapping slicing as an alternative to patch generation to decrease the number of patches generated and the amount of data lost. In the proposed method, the entire point cloud has been cross-sectioned into variable-sized slices based on the number of self-occluded points so that data loss can be minimized in the patch generation process and projection. For this, a variable number of layers are considered, partially overlapped to retain the self-occluded points. The proposed method's added advantage is to reduce the bits requirement and to encode geometric data using the slicing base position. The experimental results show that the proposed method is much more flexible than the standard V-PCC method, improves the rate-distortion performance, and decreases the data loss significantly compared to the standard V-PCC method.
CVAug 28, 2022
Efficient Motion Modelling with Variable-sized blocks from Hierarchical Cuboidal PartitioningPriyabrata Karmakar, Manzur Murshed, Manoranjan Paul et al.
Motion modelling with block-based architecture has been widely used in video coding where a frame is divided into fixed-sized blocks that are motion compensated independently. This often leads to coding inefficiency as fixed-sized blocks hardly align with the object boundaries. Although hierarchical block-partitioning has been introduced to address this, the increased number of motion vectors limits the benefit. Recently, approximate segmentation of images with cuboidal partitioning has gained popularity. Not only are the variable-sized rectangular segments (cuboids) readily amenable to block-based image/video coding techniques, but they are also capable of aligning well with the object boundaries. This is because cuboidal partitioning is based on a homogeneity constraint, minimising the sum of squared errors (SSE). In this paper, we have investigated the potential of cuboids in motion modelling against the fixed-sized blocks used in scalable video coding. Specifically, we have constructed motion-compensated current frame using the cuboidal partitioning information of the anchor frame in a group-of-picture (GOP). The predicted current frame has then been used as the base layer while encoding the current frame as an enhancement layer using the scalable HEVC encoder. Experimental results confirm 6.71%-10.90% bitrate savings on 4K video sequences.
AIJul 11, 2024
Predicting Heart Failure with Attention Learning Techniques Utilizing Cardiovascular DataErshadul Haque, Manoranjan Paul, Faranak Tohidi
Cardiovascular diseases (CVDs) encompass a group of disorders affecting the heart and blood vessels, including conditions such as coronary artery disease, heart failure, stroke, and hypertension. In cardiovascular diseases, heart failure is one of the main causes of death and also long-term suffering in patients worldwide. Prediction is one of the risk factors that is highly valuable for treatment and intervention to minimize heart failure. In this work, an attention learning-based heart failure prediction approach is proposed on EHR(electronic health record) cardiovascular data such as ejection fraction and serum creatinine. Moreover, different optimizers with various learning rate approaches are applied to fine-tune the proposed approach. Serum creatinine and ejection fraction are the two most important features to predict the patient's heart failure. The computational result shows that the RMSProp optimizer with 0.001 learning rate has a better prediction based on serum creatinine. On the other hand, the combination of SGD optimizer with 0.01 learning rate exhibits optimum performance based on ejection fraction features. Overall, the proposed attention learning-based approach performs very efficiently in predicting heart failure compared to the existing state-of-the-art such as LSTM approach.
CVJul 12, 2024
Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and SegmentationZihao Li, Pan Gao, Kang You et al.
Previous studies have demonstrated the effectiveness of point-based neural models on the point cloud analysis task. However, there remains a crucial issue on producing the efficient input embedding for raw point coordinates. Moreover, another issue lies in the limited efficiency of neighboring aggregations, which is a critical component in the network stem. In this paper, we propose a Global Attention-guided Dual-domain Feature Learning network (GAD) to address the above-mentioned issues. We first devise the Contextual Position-enhanced Transformer (CPT) module, which is armed with an improved global attention mechanism, to produce a global-aware input embedding that serves as the guidance to subsequent aggregations. Then, the Dual-domain K-nearest neighbor Feature Fusion (DKFF) is cascaded to conduct effective feature aggregation through novel dual-domain feature learning which appreciates both local geometric relations and long-distance semantic connections. Extensive experiments on multiple point cloud analysis tasks (e.g., classification, part segmentation, and scene semantic segmentation) demonstrate the superior performance of the proposed method and the efficacy of the devised modules.
LGOct 23, 2024Code
Att2CPC: Attention-Guided Lossy Attribute Compression of Point CloudsKai Liu, Kang You, Pan Gao et al.
With the great progress of 3D sensing and acquisition technology, the volume of point cloud data has grown dramatically, which urges the development of efficient point cloud compression methods. In this paper, we focus on the task of learned lossy point cloud attribute compression (PCAC). We propose an efficient attention-based method for lossy compression of point cloud attributes leveraging on an autoencoder architecture. Specifically, at the encoding side, we conduct multiple downsampling to best exploit the local attribute patterns, in which effective External Cross Attention (ECA) is devised to hierarchically aggregate features by intergrating attributes and geometry contexts. At the decoding side, the attributes of the point cloud are progressively reconstructed based on the multi-scale representation and the zero-padding upsampling tactic. To the best of our knowledge, this is the first approach to introduce attention mechanism to point-based lossy PCAC task. We verify the compression efficiency of our model on various sequences, including human body frames, sparse objects, and large-scale point cloud scenes. Experiments show that our method achieves an average improvement of 1.15 dB and 2.13 dB in BD-PSNR of Y channel and YUV channel, respectively, when comparing with the state-of-the-art point-based method Deep-PCAC. Codes of this paper are available at https://github.com/I2-Multimedia-Lab/Att2CPC.
LGJan 2
A Sparse-Attention Deep Learning Model Integrating Heterogeneous Multimodal Features for Parkinson's Disease Severity ProfilingDristi Datta, Tanmoy Debnath, Minh Chau et al.
Characterising the heterogeneous presentation of Parkinson's disease (PD) requires integrating biological and clinical markers within a unified predictive framework. While multimodal data provide complementary information, many existing computational models struggle with interpretability, class imbalance, or effective fusion of high-dimensional imaging and tabular clinical features. To address these limitations, we propose the Class-Weighted Sparse-Attention Fusion Network (SAFN), an interpretable deep learning framework for robust multimodal profiling. SAFN integrates MRI cortical thickness, MRI volumetric measures, clinical assessments, and demographic variables using modality-specific encoders and a symmetric cross-attention mechanism that captures nonlinear interactions between imaging and clinical representations. A sparsity-constrained attention-gating fusion layer dynamically prioritises informative modalities, while a class-balanced focal loss (beta = 0.999, gamma = 1.5) mitigates dataset imbalance without synthetic oversampling. Evaluated on 703 participants (570 PD, 133 healthy controls) from the Parkinson's Progression Markers Initiative using subject-wise five-fold cross-validation, SAFN achieves an accuracy of 0.98 plus or minus 0.02 and a PR-AUC of 1.00 plus or minus 0.00, outperforming established machine learning and deep learning baselines. Interpretability analysis shows a clinically coherent decision process, with approximately 60 percent of predictive weight assigned to clinical assessments, consistent with Movement Disorder Society diagnostic principles. SAFN provides a reproducible and transparent multimodal modelling paradigm for computational profiling of neurodegenerative disease.
CVAug 7, 2025
A Novel Image Similarity Metric for Scene Composition StructureMd Redwanul Haque, Manzur Murshed, Manoranjan Paul et al.
The rapid advancement of generative AI models necessitates novel methods for evaluating image quality that extend beyond human perception. A critical concern for these models is the preservation of an image's underlying Scene Composition Structure (SCS), which defines the geometric relationships among objects and the background, their relative positions, sizes, orientations, etc. Maintaining SCS integrity is paramount for ensuring faithful and structurally accurate GenAI outputs. Traditional image similarity metrics often fall short in assessing SCS. Pixel-level approaches are overly sensitive to minor visual noise, while perception-based metrics prioritize human aesthetic appeal, neither adequately capturing structural fidelity. Furthermore, recent neural-network-based metrics introduce training overheads and potential generalization issues. We introduce the SCS Similarity Index Measure (SCSSIM), a novel, analytical, and training-free metric that quantifies SCS preservation by exploiting statistical measures derived from the Cuboidal hierarchical partitioning of images, robustly capturing non-object-based structural relationships. Our experiments demonstrate SCSSIM's high invariance to non-compositional distortions, accurately reflecting unchanged SCS. Conversely, it shows a strong monotonic decrease for compositional distortions, precisely indicating when SCS has been altered. Compared to existing metrics, SCSSIM exhibits superior properties for structural evaluation, making it an invaluable tool for developing and evaluating generative models, ensuring the integrity of scene composition.
IVMay 24, 2025
ReflectGAN: Modeling Vegetation Effects for Soil Carbon Estimation from Satellite ImageryDristi Datta, Manoranjan Paul, Manzur Murshed et al.
Soil organic carbon (SOC) is a critical indicator of soil health, but its accurate estimation from satellite imagery is hindered in vegetated regions due to spectral contamination from plant cover, which obscures soil reflectance and reduces model reliability. This study proposes the Reflectance Transformation Generative Adversarial Network (ReflectGAN), a novel paired GAN-based framework designed to reconstruct accurate bare soil reflectance from vegetated soil satellite observations. By learning the spectral transformation between vegetated and bare soil reflectance, ReflectGAN facilitates more precise SOC estimation under mixed land cover conditions. Using the LUCAS 2018 dataset and corresponding Landsat 8 imagery, we trained multiple learning-based models on both original and ReflectGAN-reconstructed reflectance inputs. Models trained on ReflectGAN outputs consistently outperformed those using existing vegetation correction methods. For example, the best-performing model (RF) achieved an $R^2$ of 0.54, RMSE of 3.95, and RPD of 2.07 when applied to the ReflectGAN-generated signals, representing a 35\% increase in $R^2$, a 43\% reduction in RMSE, and a 43\% improvement in RPD compared to the best existing method (PMM-SU). The performance of the models with ReflectGAN is also better compared to their counterparts when applied to another dataset, i.e., Sentinel-2 imagery. These findings demonstrate the potential of ReflectGAN to improve SOC estimation accuracy in vegetated landscapes, supporting more reliable soil monitoring.
LGMar 13, 2024
FSDR: A Novel Deep Learning-based Feature Selection Algorithm for Pseudo Time-Series Data using Discrete RelaxationMohammad Rahman, Manzur Murshed, Shyh Wei Teng et al.
Conventional feature selection algorithms applied to Pseudo Time-Series (PTS) data, which consists of observations arranged in sequential order without adhering to a conventional temporal dimension, often exhibit impractical computational complexities with high dimensional data. To address this challenge, we introduce a Deep Learning (DL)-based feature selection algorithm: Feature Selection through Discrete Relaxation (FSDR), tailored for PTS data. Unlike the existing feature selection algorithms, FSDR learns the important features as model parameters using discrete relaxation, which refers to the process of approximating a discrete optimisation problem with a continuous one. FSDR is capable of accommodating a high number of feature dimensions, a capability beyond the reach of existing DL-based or traditional methods. Through testing on a hyperspectral dataset (i.e., a type of PTS data), our experimental results demonstrate that FSDR outperforms three commonly used feature selection algorithms, taking into account a balance among execution time, $R^2$, and $RMSE$.
IVFeb 11, 2022
Dilated convolutional neural network-based deep reference picture generation for video compressionHaoyue Tian, Pan Gao, Ran Wei et al.
Motion estimation and motion compensation are indispensable parts of inter prediction in video coding. Since the motion vector of objects is mostly in fractional pixel units, original reference pictures may not accurately provide a suitable reference for motion compensation. In this paper, we propose a deep reference picture generator which can create a picture that is more relevant to the current encoding frame, thereby further reducing temporal redundancy and improving video compression efficiency. Inspired by the recent progress of Convolutional Neural Network(CNN), this paper proposes to use a dilated CNN to build the generator. Moreover, we insert the generated deep picture into Versatile Video Coding(VVC) as a reference picture and perform a comprehensive set of experiments to evaluate the effectiveness of our network on the latest VVC Test Model VTM. The experimental results demonstrate that our proposed method achieves on average 9.7% bit saving compared with VVC under low-delay P configuration.
CVJan 24, 2022
Debiasing pipeline improves deep learning model generalization for X-ray based lung nodule detectionMichael Horry, Subrata Chakraborty, Biswajeet Pradhan et al.
Lung cancer is the leading cause of cancer death worldwide and a good prognosis depends on early diagnosis. Unfortunately, screening programs for the early diagnosis of lung cancer are uncommon. This is in-part due to the at-risk groups being located in rural areas far from medical facilities. Reaching these populations would require a scaled approach that combines mobility, low cost, speed, accuracy, and privacy. We can resolve these issues by combining the chest X-ray imaging mode with a federated deep-learning approach, provided that the federated model is trained on homogenous data to ensure that no single data source can adversely bias the model at any point in time. In this study we show that an image pre-processing pipeline that homogenizes and debiases chest X-ray images can improve both internal classification and external generalization, paving the way for a low-cost and accessible deep learning-based clinical system for lung cancer screening. An evolutionary pruning mechanism is used to train a nodule detection deep learning model on the most informative images from a publicly available lung nodule X-ray dataset. Histogram equalization is used to remove systematic differences in image brightness and contrast. Model training is performed using all combinations of lung field segmentation, close cropping, and rib suppression operators. We show that this pre-processing pipeline results in deep learning models that successfully generalize an independent lung nodule dataset using ablation studies to assess the contribution of each operator in this pipeline. In stripping chest X-ray images of known confounding variables by lung field segmentation, along with suppression of signal noise from the bone structure we can train a highly accurate deep learning lung nodule detection algorithm with outstanding generalization accuracy of 89% to nodule samples in unseen data.
CVApr 20, 2021
Systematic investigation into generalization of COVID-19 CT deep learning models with Gabor ensemble for lung involvement scoringMichael J. Horry, Subrata Chakraborty, Biswajeet Pradhan et al.
The COVID-19 pandemic has inspired unprecedented data collection and computer vision modelling efforts worldwide, focusing on diagnosis and stratification of COVID-19 from medical images. Despite this large-scale research effort, these models have found limited practical application due in part to unproven generalization of these models beyond their source study. This study investigates the generalizability of key published models using the publicly available COVID-19 Computed Tomography data through cross dataset validation. We then assess the predictive ability of these models for COVID-19 severity using an independent new dataset that is stratified for COVID-19 lung involvement. Each inter-dataset study is performed using histogram equalization, and contrast limited adaptive histogram equalization with and without a learning Gabor filter. The study shows high variability in the generalization of models trained on these datasets due to varied sample image provenances and acquisition processes amongst other factors. We show that under certain conditions, an internally consistent dataset can generalize well to an external dataset despite structural differences between these datasets with f1 scores up to 86%. Our best performing model shows high predictive accuracy for lung involvement score for an independent dataset for which expertly labelled lung involvement stratification is available. Creating an ensemble of our best model for disease positive prediction with our best model for disease negative prediction using a min-max function resulted in a superior model for lung involvement prediction with average predictive accuracy of 75% for zero lung involvement and 96% for 75-100% lung involvement with almost linear relationship between these stratifications.
IVFeb 2, 2021
Human-Machine Collaborative Video Coding Through Cuboidal PartitioningAshek Ahmmed, Manoranjan Paul, Manzur Murshed et al.
Video coding algorithms encode and decode an entire video frame while feature coding techniques only preserve and communicate the most critical information needed for a given application. This is because video coding targets human perception, while feature coding aims for machine vision tasks. Recently, attempts are being made to bridge the gap between these two domains. In this work, we propose a video coding framework by leveraging on to the commonality that exists between human vision and machine vision applications using cuboids. This is because cuboids, estimated rectangular regions over a video frame, are computationally efficient, has a compact representation and object centric. Such properties are already shown to add value to traditional video coding systems. Herein cuboidal feature descriptors are extracted from the current frame and then employed for accomplishing a machine vision task in the form of object detection. Experimental results show that a trained classifier yields superior average precision when equipped with cuboidal features oriented representation of the current test frame. Additionally, this representation costs $7\%$ less in bit rate if the captured frames are need be communicated to a receiver.
IVDec 10, 2020
Deep Mining Generation of Lung Cancer Malignancy Models from Chest X-ray ImagesMichael J. Horry, Subrata Chakraborty, Biswajeet Pradhan et al.
Lung cancer is the leading cause of cancer death and morbidity worldwide. Many studies have shown machine learning models to be effective at detecting lung nodules from chest X-ray images. However, these techniques have yet to be embraced by the medical community due to several practical, ethical, and regulatory constraints stemming from the black-box nature of deep learning models. Additionally, most lung nodules visible on chest X-ray are benign; therefore, the narrow task of computer vision-based lung nodule detection cannot be equated to automated lung cancer detection. Addressing both concerns, this study introduces a novel hybrid deep learning and decision tree-based computer vision model which presents lung cancer malignancy predictions as interpretable decision trees. The deep learning component of this process is trained using a large publicly available dataset on pathological biomarkers associated with lung cancer. These models are then used to inference biomarker scores for chest X-ray images from two, independent data sets for which malignancy metadata is available. We mine multi-variate predictive models by fitting shallow decision trees to the malignancy stratified datasets and interrogate a range of metrics to determine the best model. Our best decision tree model achieves sensitivity and specificity of 86.7% and 80.0% respectively with a positive predictive value of 92.9%. Decision trees mined using this method may be considered as a starting point for refinement into clinically useful multi-variate lung cancer malignancy models for implementation as a workflow augmentation tool to improve the efficiency of human radiologists.
IVNov 30, 2020
MAVIDH Score: A COVID-19 Severity Scoring using Chest X-Ray Pathology FeaturesDouglas P. S. Gomes, Michael J. Horry, Anwaar Ulhaq et al.
The application of computer vision for COVID-19 diagnosis is complex and challenging, given the risks associated with patient misclassifications. Arguably, the primary value of medical imaging for COVID-19 lies rather on patient prognosis. Radiological images can guide physicians assessing the severity of the disease, and a series of images from the same patient at different stages can help to gauge disease progression. Hence, a simple method based on lung-pathology interpretable features for scoring disease severity from Chest X-rays is proposed here. As the primary contribution, this method correlates well to patient severity in different stages of disease progression with competitive results compared to other existing, more complex methods. An original data selection approach is also proposed, allowing the simple model to learn the severity-related features. It is hypothesized that the resulting competitive performance presented here is related to the method being feature-based rather than reliant on lung involvement or opacity as others in the literature. A second contribution comes from the validation of the results, conceptualized as the scoring of patients groups from different stages of the disease. Besides performing such validation on an independent data set, the results were also compared with other proposed scoring methods in the literature. The results show that there is a significant correlation between the scoring system (MAVIDH) and patient outcome, which could potentially help physicians rating and following disease progression in COVID-19 patients.
IVSep 26, 2020
Potential Features of ICU Admission in X-ray Images of COVID-19 PatientsDouglas P. S. Gomes, Anwaar Ulhaq, Manoranjan Paul et al.
X-ray images may present non-trivial features with predictive information of patients that develop severe symptoms of COVID-19. If true, this hypothesis may have practical value in allocating resources to particular patients while using a relatively inexpensive imaging technique. The difficulty of testing such a hypothesis comes from the need for large sets of labelled data, which need to be well-annotated and should contemplate the post-imaging severity outcome. This paper presents an original methodology for extracting semantic features that correlate to severity from a data set with patient ICU admission labels through interpretable models. The methodology employs a neural network trained to recognise lung pathologies to extract the semantic features, which are then analysed with low-complexity models to limit overfitting while increasing interpretability. This analysis points out that only a few features explain most of the variance between patients that developed severe symptoms. When applied to an unrelated larger data set with pathology-related clinical notes, the method has shown to be capable of selecting images for the learned features, which could translate some information about their common locations in the lung. Besides attesting separability on patients that eventually develop severe symptoms, the proposed methods represent a statistical approach highlighting the importance of features related to ICU admission that may have been only qualitatively reported. While handling limited data sets, notable methodological aspects are adopted, such as presenting a state-of-the-art lung segmentation network and the use of low-complexity models to avoid overfitting. The code for methodology and experiments is also available.
TOJul 10, 2020
Hyperspectral Imaging to detect Age, Defects and Individual Nutrient Deficiency in Grapevine LeavesManoranjan Paul, Sourabhi Debnath, Tanmoy Debnath et al.
Hyperspectral (HS) imaging was successfully employed in the 380 nm to 1000 nm wavelength range to investigate the efficacy of detecting age, healthiness and individual nutrient deficiency of grapevine leaves collected from vineyards located in central west NSW, Australia. For age detection, the appearance of many healthy grapevine leaves has been examined. Then visually defective leaves were compared with healthy leaves. Control leaves and individual nutrient-deficient leaves (e.g. N, K and Mg) were also analysed. Several features were employed at various stages in the Ultraviolet (UV), Visible (VIS) and Near Infrared (NIR) regions to evaluate the experimental data: mean brightness, mean 1st derivative brightness, variation index, mean spectral ratio, normalised difference vegetation index (NDVI) and standard deviation (SD). Experiment results demonstrate that these features could be utilised with a high degree of effectiveness to compare age, identify unhealthy samples and not only to distinguish from control and nutrient deficiency but also to identify individual nutrient defects. Therefore, our work corroborated that HS imaging has excellent potential as a non-destructive as well as a non-contact method to detect age, healthiness and individual nutrient deficiencies of grapevine leaves
IVJul 10, 2020
Rain Streak Removal in a Video to Improve Visibility by TAWL AlgorithmMuhammad Rafiqul Islam, Manoranjan Paul
In computer vision applications, the visibility of the video content is crucial to perform analysis for better accuracy. The visibility can be affected by several atmospheric interferences in challenging weather-one of them is the appearance of rain streak. In recent time, rain streak removal achieves lots of interest to the researchers as it has some exciting applications such as autonomous car, intelligent traffic monitoring system, multimedia, etc. In this paper, we propose a novel and simple method by combining three novel extracted features focusing on temporal appearance, wide shape and relative location of the rain streak and we called it TAWL (Temporal Appearance, Width, and Location) method. The proposed TAWL method adaptively uses features from different resolutions and frame rates. Moreover, it progressively processes features from the up-coming frames so that it can remove rain in the real-time. The experiments have been conducted using video sequences with both real rains and synthetic rains to compare the performance of the proposed method against the relevant state-of-the-art methods. The experimental results demonstrate that the proposed method outperforms the state-of-the-art methods by removing more rain streaks while keeping other moving regions.
IVApr 15, 2020
Computer Vision For COVID-19 Control: A SurveyAnwaar Ulhaq, Asim Khan, Douglas Gomes et al.
The COVID-19 pandemic has triggered an urgent need to contribute to the fight against an immense threat to the human population. Computer Vision, as a subfield of Artificial Intelligence, has enjoyed recent success in solving various complex problems in health care and has the potential to contribute to the fight of controlling COVID-19. In response to this call, computer vision researchers are putting their knowledge base at work to devise effective ways to counter COVID-19 challenge and serve the global community. New contributions are being shared with every passing day. It motivated us to review the recent work, collect information about available research resources and an indication of future research directions. We want to make it available to computer vision researchers to save precious time. This survey paper is intended to provide a preliminary review of the available literature on the computer vision efforts against COVID-19 pandemic.
CVApr 15, 2020
A Practical Blockchain Framework using Image Hashing for Image AuthenticationCameron White, Manoranjan Paul, Subrata Chakraborty
Blockchain is a relatively new technology that can be seen as a decentralised database. Blockchain systems heavily rely on cryptographic hash functions to store their data, which makes it difficult to tamper with any data stored in the system. A topic that was researched along with blockchain is image authentication. Image authentication focuses on investigating and maintaining the integrity of images. As a blockchain system can be useful for maintaining data integrity, image authentication has the potential to be enhanced by blockchain. There are many techniques that can be used to authenticate images; the technique investigated by this work is image hashing. Image hashing is a technique used to calculate how similar two different images are. This is done by converting the images into hashes and then comparing them using a distance formula. To investigate the topic, an experiment involving a simulated blockchain was created. The blockchain acted as a database for images. This blockchain was made up of devices which contained their own unique image hashing algorithms. The blockchain was tested by creating modified copies of the images contained in the database, and then submitting them to the blockchain to see if it will return the original image. Through this experiment it was discovered that it is plausible to create an image authentication system using blockchain and image hashing. However, the design proposed by this work requires refinement, as it appears to struggle in some situations. This work shows that blockchain can be a suitable approach for authenticating images, particularly via image hashing. Other observations include that using multiple image hash algorithms at the same time can increase performance in some cases, as well as that each type of test done to the blockchain has its own unique pattern to its data.
CYApr 7, 2020
Automatically Assessing Quality of Online Health ArticlesFariha Afsana, Muhammad Ashad Kabir, Naeemul Hassan et al.
The information ecosystem today is overwhelmed by an unprecedented quantity of data on versatile topics are with varied quality. However, the quality of information disseminated in the field of medicine has been questioned as the negative health consequences of health misinformation can be life-threatening. There is currently no generic automated tool for evaluating the quality of online health information spanned over a broad range. To address this gap, in this paper, we applied a data mining approach to automatically assess the quality of online health articles based on 10 quality criteria. We have prepared a labeled dataset with 53012 features and applied different feature selection methods to identify the best feature subset with which our trained classifier achieved an accuracy of 84%-90% varied over 10 criteria. Our semantic analysis of features shows the underpinning associations between the selected features & assessment criteria and further rationalize our assessment approach. Our findings will help in identifying high-quality health articles and thus aiding users in shaping their opinion to make the right choice while picking health-related help from online.
CVMar 25, 2019
Enhanced Transfer Learning with ImageNet Trained Classification LayerTasfia Shermin, Shyh Wei Teng, Manzur Murshed et al.
Parameter fine tuning is a transfer learning approach whereby learned parameters from pre-trained source network are transferred to the target network followed by fine-tuning. Prior research has shown that this approach is capable of improving task performance. However, the impact of the ImageNet pre-trained classification layer in parameter fine-tuning is mostly unexplored in the literature. In this paper, we propose a fine-tuning approach with the pre-trained classification layer. We employ layer-wise fine-tuning to determine which layers should be frozen for optimal performance. Our empirical analysis demonstrates that the proposed fine-tuning performs better than traditional fine-tuning. This finding indicates that the pre-trained classification layer holds less category-specific or more global information than believed earlier. Thus, we hypothesize that the presence of this layer is crucial for growing network depth to adapt better to a new task. Our study manifests that careful normalization and scaling are essential for creating harmony between the pre-trained and new layers for target domain adaptation. We evaluate the proposed depth augmented networks for fine-tuning on several challenging benchmark datasets and show that they can achieve higher classification accuracy than contemporary transfer learning approaches.
MMFeb 12, 2019
A block-based inter-band predictor using multilayer propagation neural network for hyperspectral image compressionRui Dusselaar, Manoranjan Paul
In this paper, a block-based inter-band predictor (BIP) with multilayer propagation neural network model (MLPNN) is presented by a completely new framework. This predictor can combine with diversity entropy coding methods. Hyperspectral (HS) images are composed by a series high similarity spectral bands. Our assumption is to use trained MLPNN predict the succeeding bands based on current band information. The purpose is to explore whether BIP-MLPNN can provide better image predictive results with high efficiency. The algorithm also changed from the traditional compression methods encoding images pixel by pixel, the compression process only encodes the weights and the biases vectors of BIP-MLPNN which require few bits to transfer. The decoder will reconstruct a band by using the same structure of the network at the encoder side. The BIP-MLPNN decoder does not need to be trained as the weights and biases have already been transmitted. We can easily reconstruct the succeeding bands by using the BIP-MLPNN decoder. The experimental results indicate that BIP-MLPNN predictor outperforms the CCSDS-123 HS image coding standard. Due to a good approximation of the target band, the proposed method outperforms the CCSDS-123 by more than 2.0dB PSNR image quality in the predicted bands. Moreover, the proposed method provides high quality image e.g., 30 to 40dB PSNR at very low bit rate (less than 0.1 bpppb) and outperforms the existing methods e.g., JPEG, 3DSPECK, 3DSPIHT and in terms of rate-distortion performance.