Mohammed Eunus Ali

CL
h-index15
27papers
1,297citations
Novelty43%
AI Score53

27 Papers

LGSep 9, 2024Code
A Multi-Modal Deep Learning Based Approach for House Price Prediction

Md Hasebul Hasan, Md Abid Jahan, Mohammed Eunus Ali et al.

Accurate prediction of house price, a vital aspect of the residential real estate sector, is of substantial interest for a wide range of stakeholders. However, predicting house prices is a complex task due to the significant variability influenced by factors such as house features, location, neighborhood, and many others. Despite numerous attempts utilizing a wide array of algorithms, including recent deep learning techniques, to predict house prices accurately, existing approaches have fallen short of considering a wide range of factors such as textual and visual features. This paper addresses this gap by comprehensively incorporating attributes, such as features, textual descriptions, geo-spatial neighborhood, and house images, typically showcased in real estate listings in a house price prediction system. Specifically, we propose a multi-modal deep learning approach that leverages different types of data to learn more accurate representation of the house. In particular, we learn a joint embedding of raw house attributes, geo-spatial neighborhood, and most importantly from textual description and images representing the house; and finally use a downstream regression model to predict the house price from this jointly learned embedding vector. Our experimental results with a real-world dataset show that the text embedding of the house advertisement description and image embedding of the house pictures in addition to raw attributes and geo-spatial embedding, can significantly improve the house price prediction accuracy. The relevant source code and dataset are publicly accessible at the following URL: https://github.com/4P0N/mhpp

CLAug 20, 2022Code
BSpell: A CNN-Blended BERT Based Bangla Spell Checker

Chowdhury Rafeed Rahman, MD. Hasibur Rahman, Samiha Zakir et al.

Bangla typing is mostly performed using English keyboard and can be highly erroneous due to the presence of compound and similarly pronounced letters. Spelling correction of a misspelled word requires understanding of word typing pattern as well as the context of the word usage. A specialized BERT model named BSpell has been proposed in this paper targeted towards word for word correction in sentence level. BSpell contains an end-to-end trainable CNN sub-model named SemanticNet along with specialized auxiliary loss. This allows BSpell to specialize in highly inflected Bangla vocabulary in the presence of spelling errors. Furthermore, a hybrid pretraining scheme has been proposed for BSpell that combines word level and character level masking. Comparison on two Bangla and one Hindi spelling correction dataset shows the superiority of our proposed approach. BSpell is available as a Bangla spell checking tool via GitHub: https://github.com/Hasiburshanto/Bangla-Spell-Checker

CLAug 12, 2023
Generating Faithful Text From a Knowledge Graph with Noisy Reference Text

Tahsina Hashem, Weiqing Wang, Derry Tanti Wijaya et al.

Knowledge Graph (KG)-to-Text generation aims at generating fluent natural-language text that accurately represents the information of a given knowledge graph. While significant progress has been made in this task by exploiting the power of pre-trained language models (PLMs) with appropriate graph structure-aware modules, existing models still fall short of generating faithful text, especially when the ground-truth natural-language text contains additional information that is not present in the graph. In this paper, we develop a KG-to-text generation model that can generate faithful natural-language text from a given graph, in the presence of noisy reference text. Our framework incorporates two core ideas: Firstly, we utilize contrastive learning to enhance the model's ability to differentiate between faithful and hallucinated information in the text, thereby encouraging the decoder to generate text that aligns with the input graph. Secondly, we empower the decoder to control the level of hallucination in the generated text by employing a controllable text generation technique. We evaluate our model's performance through the standard quantitative metrics as well as a ChatGPT-based quantitative and qualitative analysis. Our evaluation demonstrates the superior performance of our model over state-of-the-art KG-to-text models on faithfulness.

CLMay 18, 2024Code
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving

Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez

Code synthesis, which requires a deep understanding of complex natural language problem descriptions, generation of code instructions for complex algorithms and data structures, and the successful execution of comprehensive unit tests, presents a significant challenge. While large language models (LLMs) demonstrate impressive proficiency in natural language processing, their performance in code generation tasks remains limited. In this paper, we introduce a new approach to code generation tasks leveraging multi-agent prompting that uniquely replicates the full cycle of program synthesis as observed in human developers. Our framework, MapCoder, consists of four LLM agents specifically designed to emulate the stages of this cycle: recalling relevant examples, planning, code generation, and debugging. After conducting thorough experiments, with multiple LLM ablations and analyses across eight challenging competitive problem-solving and program synthesis benchmarks, MapCoder showcases remarkable code generation capabilities, achieving new state-of-the-art results (pass@1) on HumanEval (93.9%), MBPP (83.1%), APPS (22.0%), CodeContests (28.5%), and xCodeEval (45.3%). Moreover, our method consistently delivers superior performance across various programming languages and varying problem difficulties. We open-source our framework at https://github.com/Md-Ashraful-Pramanik/MapCoder.

SPJul 22, 2023
Contrastive Self-Supervised Learning Based Approach for Patient Similarity: A Case Study on Atrial Fibrillation Detection from PPG Signal

Subangkar Karmaker Shanto, Shoumik Saha, Atif Hasan Rahman et al.

In this paper, we propose a novel contrastive learning based deep learning framework for patient similarity search using physiological signals. We use a contrastive learning based approach to learn similar embeddings of patients with similar physiological signal data. We also introduce a number of neighbor selection algorithms to determine the patients with the highest similarity on the generated embeddings. To validate the effectiveness of our framework for measuring patient similarity, we select the detection of Atrial Fibrillation (AF) through photoplethysmography (PPG) signals obtained from smartwatch devices as our case study. We present extensive experimentation of our framework on a dataset of over 170 individuals and compare the performance of our framework with other baseline methods on this dataset.

LGJun 16, 2022
Unsupervised Space Partitioning for Nearest Neighbor Search

Abrar Fahim, Mohammed Eunus Ali, Muhammad Aamir Cheema

Approximate Nearest Neighbor Search (ANNS) in high dimensional spaces is crucial for many real-life applications (e.g., e-commerce, web, multimedia, etc.) dealing with an abundance of data. This paper proposes an end-to-end learning framework that couples the partitioning (one critical step of ANNS) and learning-to-search steps using a custom loss function. A key advantage of our proposed solution is that it does not require any expensive pre-processing of the dataset, which is one of the critical limitations of the state-of-the-art approach. We achieve the above edge by formulating a multi-objective custom loss function that does not need ground truth labels to quantify the quality of a given data-space partition, making it entirely unsupervised. We also propose an ensembling technique by adding varying input weights to the loss function to train an ensemble of models to enhance the search quality. On several standard benchmarks for ANNS, we show that our method beats the state-of-the-art space partitioning method and the ubiquitous K-means clustering method while using fewer parameters and shorter offline training times. We also show that incorporating our space-partitioning strategy into state-of-the-art ANNS techniques such as ScaNN can improve their performance significantly. Finally, we present our unsupervised partitioning approach as a promising alternative to many widely used clustering methods, such as K-means clustering and DBSCAN.

CVSep 6, 2024
Generating Faithful and Salient Text from Multimodal Data

Tahsina Hashem, Weiqing Wang, Derry Tanti Wijaya et al.

While large multimodal models (LMMs) have obtained strong performance on many multimodal tasks, they may still hallucinate while generating text. Their performance on detecting salient features from visual data is also unclear. In this paper, we develop a framework to generate faithful and salient text from mixed-modal data, which includes images and structured data ( represented in knowledge graphs or tables). Specifically, we train a small vision critic model to identify hallucinated and non-salient features from the image modality. The critic model also generates a list of salient image features. This information is used in the post editing step to improve the generation quality. Experiments on two datasets show that our framework improves LMMs' generation quality on both faithfulness and saliency, outperforming recent techniques aimed at reducing hallucination.

CLFeb 8, 2025Code
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging

Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez

Large Language Models (LLMs) have made significant strides in code generation and problem solving. Current approaches employ external tool-based iterative debuggers that use compiler or other tool-based runtime feedback to refine coarse programs generated by various methods. However, the effectiveness of these approaches heavily relies on the quality of the initial code generation, which remains an open challenge. In this paper, we introduce CodeSim, a novel multi-agent code generation framework that comprehensively addresses the stages of program synthesis-planning, coding, and debugging-through a human-like perception approach. As human verifies their understanding of any algorithms through visual simulation, CodeSim uniquely features a method of plan verification and internal debugging through the step-by-step simulation of input/output. Extensive experiments across seven challenging competitive problem-solving and program synthesis benchmarks demonstrate CodeSim's remarkable code generation capabilities. Our framework achieves new state-of-the-art (pass@1) results-(HumanEval 95.1%, MBPP 90.7%, APPS 22%, and CodeContests 29.1%). Furthermore, our method shows potential for even greater enhancement when cascaded with external debuggers. To facilitate further research and development in this area, we have open-sourced our framework in this link (https://kagnlp.github.io/codesim.github.io/).

CLDec 31, 2024Code
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Mahir Labib Dihan, Md Tanvir Hassan, Md Tanvir Parvez et al.

Recent advancements in foundation models have improved autonomous tool usage and reasoning, but their capabilities in map-based reasoning remain underexplored. To address this, we introduce MapEval, a benchmark designed to assess foundation models across three distinct tasks - textual, API-based, and visual reasoning - through 700 multiple-choice questions spanning 180 cities and 54 countries, covering spatial relationships, navigation, travel planning, and real-world map interactions. Unlike prior benchmarks that focus on simple location queries, MapEval requires models to handle long-context reasoning, API interactions, and visual map analysis, making it the most comprehensive evaluation framework for geospatial AI. On evaluation of 30 foundation models, including Claude-3.5-Sonnet, GPT-4o, and Gemini-1.5-Pro, none surpass 67% accuracy, with open-source models performing significantly worse and all models lagging over 20% behind human performance. These results expose critical gaps in spatial inference, as models struggle with distances, directions, route planning, and place-specific reasoning, highlighting the need for better geospatial AI to bridge the gap between foundation models and real-world navigation. All the resources are available at: https://mapeval.github.io/.

CLDec 30, 2024Code
MapQaTor: An Extensible Framework for Efficient Annotation of Map-Based QA Datasets

Mahir Labib Dihan, Mohammed Eunus Ali, Md Rizwan Parvez

Mapping and navigation services like Google Maps, Apple Maps, OpenStreetMap, are essential for accessing various location-based data, yet they often struggle to handle natural language geospatial queries. Recent advancements in Large Language Models (LLMs) show promise in question answering (QA), but creating reliable geospatial QA datasets from map services remains challenging. We introduce MapQaTor, an extensible open-source framework that streamlines the creation of reproducible, traceable map-based QA datasets. MapQaTor enables seamless integration with any maps API, allowing users to gather and visualize data from diverse sources with minimal setup. By caching API responses, the platform ensures consistent ground truth, enhancing the reliability of the data even as real-world information evolves. MapQaTor centralizes data retrieval, annotation, and visualization within a single platform, offering a unique opportunity to evaluate the current state of LLM-based geospatial reasoning while advancing their capabilities for improved geospatial understanding. Evaluation metrics show that, MapQaTor speeds up the annotation process by at least 30 times compared to manual methods, underscoring its potential for developing geospatial resources, such as complex map reasoning datasets. The website is live at: https://mapqator.github.io/ and a demo video is available at: https://youtu.be/bVv7-NYRsTw.

CLOct 25, 2023
Understanding Social Structures from Contemporary Literary Fiction using Character Interaction Graph -- Half Century Chronology of Influential Bengali Writers

Nafis Irtiza Tripto, Mohammed Eunus Ali

Social structures and real-world incidents often influence contemporary literary fiction. Existing research in literary fiction analysis explains these real-world phenomena through the manual critical analysis of stories. Conventional Natural Language Processing (NLP) methodologies, including sentiment analysis, narrative summarization, and topic modeling, have demonstrated substantial efficacy in analyzing and identifying similarities within fictional works. However, the intricate dynamics of character interactions within fiction necessitate a more nuanced approach that incorporates visualization techniques. Character interaction graphs (or networks) emerge as a highly suitable means for visualization and information retrieval from the realm of fiction. Therefore, we leverage character interaction graphs with NLP-derived features to explore a diverse spectrum of societal inquiries about contemporary culture's impact on the landscape of literary fiction. Our study involves constructing character interaction graphs from fiction, extracting relevant graph features, and exploiting these features to resolve various real-life queries. Experimental evaluation of influential Bengali fiction over half a century demonstrates that character interaction graphs can be highly effective in specific assessments and information retrieval from literary fiction. Our data and codebase are available at https://cutt.ly/fbMgGEM

AISep 7, 2025Code
MapAgent: A Hierarchical Agent for Geospatial Reasoning with Dynamic Map Tool Integration

Md Hasebul Hasan, Mahir Labib Dihan, Tanzima Hashem et al.

Agentic AI has significantly extended the capabilities of large language models (LLMs) by enabling complex reasoning and tool use. However, most existing frameworks are tailored to domains such as mathematics, coding, or web automation, and fall short on geospatial tasks that require spatial reasoning, multi-hop planning, and real-time map interaction. To address these challenges, we introduce MapAgent, a hierarchical multi-agent plug-and-play framework with customized toolsets and agentic scaffolds for map-integrated geospatial reasoning. Unlike existing flat agent-based approaches that treat tools uniformly-often overwhelming the LLM when handling similar but subtly different geospatial APIs-MapAgent decouples planning from execution. A high-level planner decomposes complex queries into subgoals, which are routed to specialized modules. For tool-heavy modules-such as map-based services-we then design a dedicated map-tool agent that efficiently orchestrates related APIs adaptively in parallel to effectively fetch geospatial data relevant for the query, while simpler modules (e.g., solution generation or answer extraction) operate without additional agent overhead. This hierarchical design reduces cognitive load, improves tool selection accuracy, and enables precise coordination across similar APIs. We evaluate MapAgent on four diverse geospatial benchmarks-MapEval-Textual, MapEval-API, MapEval-Visual, and MapQA-and demonstrate substantial gains over state-of-the-art tool-augmented and agentic baselines. We open-source our framwork at https://github.com/Hasebul/MapAgent.

CVNov 22, 2023
Towards Detecting, Recognizing, and Parsing the Address Information from Bangla Signboard: A Deep Learning-based Approach

Hasan Murad, Mohammed Eunus Ali

Retrieving textual information from natural scene images is an active research area in the field of computer vision with numerous practical applications. Detecting text regions and extracting text from signboards is a challenging problem due to special characteristics like reflecting lights, uneven illumination, or shadows found in real-life natural scene images. With the advent of deep learning-based methods, different sophisticated techniques have been proposed for text detection and text recognition from the natural scene. Though a significant amount of effort has been devoted to extracting natural scene text for resourceful languages like English, little has been done for low-resource languages like Bangla. In this research work, we have proposed an end-to-end system with deep learning-based models for efficiently detecting, recognizing, correcting, and parsing address information from Bangla signboards. We have created manually annotated datasets and synthetic datasets to train signboard detection, address text detection, address text recognition, address text correction, and address text parser models. We have conducted a comparative study among different CTC-based and Encoder-Decoder model architectures for Bangla address text recognition. Moreover, we have designed a novel address text correction model using a sequence-to-sequence transformer-based network to improve the performance of Bangla address text recognition model by post-correction. Finally, we have developed a Bangla address text parser using the state-of-the-art transformer-based pre-trained language model.

AIDec 14, 2025
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment

Mahir Labib Dihan, Tanzima Hashem, Mohammed Eunus Ali et al.

LLM-based agents often operate in a greedy, step-by-step manner, selecting actions solely based on the current observation without considering long-term consequences or alternative paths. This lack of foresight is particularly problematic in web environments, which are only partially observable-limited to browser-visible content (e.g., DOM and UI elements)-where a single misstep often requires complex and brittle navigation to undo. Without an explicit backtracking mechanism, agents struggle to correct errors or systematically explore alternative paths. Tree-search methods provide a principled framework for such structured exploration, but existing approaches lack mechanisms for safe backtracking, making them prone to unintended side effects. They also assume that all actions are reversible, ignoring the presence of irreversible actions-limitations that reduce their effectiveness in realistic web tasks. To address these challenges, we introduce WebOperator, a tree-search framework that enables reliable backtracking and strategic exploration. Our method incorporates a best-first search strategy that ranks actions by both reward estimates and safety considerations, along with a robust backtracking mechanism that verifies the feasibility of previously visited paths before replaying them, preventing unintended side effects. To further guide exploration, WebOperator generates action candidates from multiple, varied reasoning contexts to ensure diverse and robust exploration, and subsequently curates a high-quality action set by filtering out invalid actions pre-execution and merging semantically equivalent ones. Experimental results on WebArena and WebVoyager demonstrate the effectiveness of WebOperator. On WebArena, WebOperator achieves a state-of-the-art 54.6% success rate with gpt-4o, underscoring the critical advantage of integrating strategic foresight with safe execution.

CVFeb 3
SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?

Azmine Toushik Wasi, Wahid Faisal, Abdur Rahman et al.

Spatial reasoning is a fundamental aspect of human cognition, yet it remains a major challenge for contemporary vision-language models (VLMs). Prior work largely relied on synthetic or LLM-generated environments with limited task designs and puzzle-like setups, failing to capture the real-world complexity, visual noise, and diverse spatial relationships that VLMs encounter. To address this, we introduce SpatiaLab, a comprehensive benchmark for evaluating VLMs' spatial reasoning in realistic, unconstrained contexts. SpatiaLab comprises 1,400 visual question-answer pairs across six major categories: Relative Positioning, Depth & Occlusion, Orientation, Size & Scale, Spatial Navigation, and 3D Geometry, each with five subcategories, yielding 30 distinct task types. Each subcategory contains at least 25 questions, and each main category includes at least 200 questions, supporting both multiple-choice and open-ended evaluation. Experiments across diverse state-of-the-art VLMs, including open- and closed-source models, reasoning-focused, and specialized spatial reasoning models, reveal a substantial gap in spatial reasoning capabilities compared with humans. In the multiple-choice setup, InternVL3.5-72B achieves 54.93% accuracy versus 87.57% for humans. In the open-ended setting, all models show a performance drop of around 10-25%, with GPT-5-mini scoring highest at 40.93% versus 64.93% for humans. These results highlight key limitations in handling complex spatial relationships, depth perception, navigation, and 3D geometry. By providing a diverse, real-world evaluation framework, SpatiaLab exposes critical challenges and opportunities for advancing VLMs' spatial reasoning, offering a benchmark to guide future research toward robust, human-aligned spatial understanding. SpatiaLab is available at: https://spatialab-reasoning.github.io/.

AIOct 8, 2025
CompassLLM: A Multi-Agent Approach toward Geo-Spatial Reasoning for Popular Path Query

Md. Nazmul Islam Ananto, Shamit Fatin, Mohammed Eunus Ali et al.

The popular path query - identifying the most frequented routes between locations from historical trajectory data - has important applications in urban planning, navigation optimization, and travel recommendations. While traditional algorithms and machine learning approaches have achieved success in this domain, they typically require model training, parameter tuning, and retraining when accommodating data updates. As Large Language Models (LLMs) demonstrate increasing capabilities in spatial and graph-based reasoning, there is growing interest in exploring how these models can be applied to geo-spatial problems. We introduce CompassLLM, a novel multi-agent framework that intelligently leverages the reasoning capabilities of LLMs into the geo-spatial domain to solve the popular path query. CompassLLM employs its agents in a two-stage pipeline: the SEARCH stage that identifies popular paths, and a GENERATE stage that synthesizes novel paths in the absence of an existing one in the historical trajectory data. Experiments on real and synthetic datasets show that CompassLLM demonstrates superior accuracy in SEARCH and competitive performance in GENERATE while being cost-effective.

CLOct 25, 2021
Paradigm Shift in Language Modeling: Revisiting CNN for Modeling Sanskrit Originated Bengali and Hindi Language

Chowdhury Rafeed Rahman, MD. Hasibur Rahman, Mohammad Rafsan et al.

Though there has been a large body of recent works in language modeling (LM) for high resource languages such as English and Chinese, the area is still unexplored for low resource languages like Bengali and Hindi. We propose an end to end trainable memory efficient CNN architecture named CoCNN to handle specific characteristics such as high inflection, morphological richness, flexible word order and phonetical spelling errors of Bengali and Hindi. In particular, we introduce two learnable convolutional sub-models at word and at sentence level that are end to end trainable. We show that state-of-the-art (SOTA) Transformer models including pretrained BERT do not necessarily yield the best performance for Bengali and Hindi. CoCNN outperforms pretrained BERT with 16X less parameters, and it achieves much better performance than SOTA LSTM models on multiple real-world datasets. This is the first study on the effectiveness of different architectures drawn from three deep learning paradigms - Convolution, Recurrent, and Transformer neural nets for modeling two widely used languages, Bengali and Hindi.

LGSep 8, 2021
DeepAltTrip: Top-k Alternative Itineraries for Trip Recommendation

Syed Md. Mukit Rashid, Mohammed Eunus Ali, Muhammad Aamir Cheema

Trip itinerary recommendation finds an ordered sequence of Points-of-Interest (POIs) from a large number of candidate POIs in a city. In this paper, we propose a deep learning-based framework, called DeepAltTrip, that learns to recommend top-k alternative itineraries for given source and destination POIs. These alternative itineraries would be not only popular given the historical routes adopted by past users but also dissimilar (or diverse) to each other. The DeepAltTrip consists of two major components: (i) Itinerary Net (ITRNet) which estimates the likelihood of POIs on an itinerary by using graph autoencoders and two (forward and backward) LSTMs; and (ii) a route generation procedure to generate k diverse itineraries passing through relevant POIs obtained using ITRNet. For the route generation step, we propose a novel sampling algorithm that can seamlessly handle a wide variety of user-defined constraints. To the best of our knowledge, this is the first work that learns from historical trips to provide a set of alternative itineraries to the users. Extensive experiments conducted on eight popular real-world datasets show the effectiveness and efficacy of our approach over state-of-the-art methods.

CVAug 7, 2021
Learning Indoor Layouts from Simple Point-Clouds

Md. Tareq Mahmood, Mohammed Eunus Ali

Reconstructing a layout of indoor spaces has been a crucial part of growing indoor location based services. One of the key challenges in the proliferation of indoor location based services is the unavailability of indoor spatial maps due to the complex nature of capturing an indoor space model (e.g., floor plan) of an existing building. In this paper, we propose a system to automatically generate floor plans that can recognize rooms from the point-clouds obtained through smartphones like Google's Tango. In particular, we propose two approaches - a Recurrent Neural Network based approach using Pointer Network and a Convolutional Neural Network based approach using Mask-RCNN to identify rooms (and thereby floor plans) from point-clouds. Experimental results on different datasets demonstrate approximately 0.80-0.90 Intersection-over-Union scores, which show that our models can effectively identify the rooms and regenerate the shapes of the rooms in heterogeneous environment.

IRNov 20, 2020
A Survey on Deep Learning Based Point-Of-Interest (POI) Recommendations

Md. Ashraful Islam, Mir Mahathir Mohammad, Sarkar Snigdha Sarathi Das et al.

Location-based Social Networks (LBSNs) enable users to socialize with friends and acquaintances by sharing their check-ins, opinions, photos, and reviews. Huge volume of data generated from LBSNs opens up a new avenue of research that gives birth to a new sub-field of recommendation systems, known as Point-of-Interest (POI) recommendation. A POI recommendation technique essentially exploits users' historical check-ins and other multi-modal information such as POI attributes and friendship network, to recommend the next set of POIs suitable for a user. A plethora of earlier works focused on traditional machine learning techniques by using hand-crafted features from the dataset. With the recent surge of deep learning research, we have witnessed a large variety of POI recommendation works utilizing different deep learning paradigms. These techniques largely vary in problem formulations, proposed techniques, used datasets, and features, etc. To the best of our knowledge, this work is the first comprehensive survey of all major deep learning-based POI recommendation works. Our work categorizes and critically analyzes the recent POI recommendation works based on different deep learning paradigms and other relevant features. This review can be considered a cookbook for researchers or practitioners working in the area of POI recommendation.

LGNov 2, 2020
BayesBeat: Reliable Atrial Fibrillation Detection from Noisy Photoplethysmography Data

Sarkar Snigdha Sarathi Das, Subangkar Karmaker Shanto, Masum Rahman et al.

Smartwatches or fitness trackers have garnered a lot of popularity as potential health tracking devices due to their affordable and longitudinal monitoring capabilities. To further widen their health tracking capabilities, in recent years researchers have started to look into the possibility of Atrial Fibrillation (AF) detection in real-time leveraging photoplethysmography (PPG) data, an inexpensive sensor widely available in almost all smartwatches. A significant challenge in AF detection from PPG signals comes from the inherent noise in the smartwatch PPG signals. In this paper, we propose a novel deep learning based approach, BayesBeat that leverages the power of Bayesian deep learning to accurately infer AF risks from noisy PPG signals, and at the same time provides an uncertainty estimate of the prediction. Extensive experiments on two publicly available dataset reveal that our proposed method BayesBeat outperforms the existing state-of-the-art methods. Moreover, BayesBeat is substantially more efficient having 40-200X fewer parameters than state-of-the-art baseline approaches making it suitable for deployment in resource constrained wearable devices.

LGSep 1, 2020
Boosting House Price Predictions using Geo-Spatial Network Embedding

Sarkar Snigdha Sarathi Das, Mohammed Eunus Ali, Yuan-Fang Li et al.

Real estate contributes significantly to all major economies around the world. In particular, house prices have a direct impact on stakeholders, ranging from house buyers to financing companies. Thus, a plethora of techniques have been developed for real estate price prediction. Most of the existing techniques rely on different house features to build a variety of prediction models to predict house prices. Perceiving the effect of spatial dependence on house prices, some later works focused on introducing spatial regression models for improving prediction performance. However, they fail to take into account the geo-spatial context of the neighborhood amenities such as how close a house is to a train station, or a highly-ranked school, or a shopping center. Such contextual information may play a vital role in users' interests in a house and thereby has a direct influence on its price. In this paper, we propose to leverage the concept of graph neural networks to capture the geo-spatial context of the neighborhood of a house. In particular, we present a novel method, the Geo-Spatial Network Embedding (GSNE), that learns the embeddings of houses and various types of Points of Interest (POIs) in the form of multipartite networks, where the houses and the POIs are represented as attributed nodes and the relationships between them as edges. Extensive experiments with a large number of regression techniques show that the embeddings produced by our proposed GSNE technique consistently and significantly improve the performance of the house price prediction task regardless of the downstream regression model.

DBJun 15, 2020
Comparing Alternative Route Planning Techniques: A Comparative User Study on Melbourne, Dhaka and Copenhagen Road Networks

Lingxiao Li, Muhammad Aamir Cheema, Hua Lu et al.

Many modern navigation systems and map-based services do not only provide the fastest route from a source location s to a target location t but also provide a few alternative routes to the users as more options to choose from. Consequently, computing alternative paths has received significant research attention. However, it is unclear which of the existing approaches generates alternative routes of better quality because the quality of these alternatives is mostly subjective. Motivated by this, in this paper, we present a user study conducted on the road networks of Melbourne, Dhaka and Copenhagen that compares the quality (as perceived by the users) of the alternative routes generated by four of the most popular existing approaches including the routes provided by Google Maps. We also present a web-based demo system that can be accessed using any internet-enabled device and allows users to see the alternative routes generated by the four approaches for any pair of selected source and target. We report the average ratings received by the four approaches and our statistical analysis shows that there is no credible evidence that the four approaches receive different ratings on average. We also discuss the limitations of this user study and recommend the readers to interpret these results with caution because certain factors may have affected the participants' ratings.

CVDec 12, 2019
CCCNet: An Attention Based Deep Learning Framework for Categorized Crowd Counting

Sarkar Snigdha Sarathi Das, Syed Md. Mukit Rashid, Mohammed Eunus Ali

Crowd counting problem that counts the number of people in an image has been extensively studied in recent years. In this paper, we introduce a new variant of crowd counting problem, namely "Categorized Crowd Counting", that counts the number of people sitting and standing in a given image. Categorized crowd counting has many real-world applications such as crowd monitoring, customer service, and resource management. The major challenges in categorized crowd counting come from high occlusion, perspective distortion and the seemingly identical upper body posture of sitting and standing persons. Existing density map based approaches perform well to approximate a large crowd, but lose important local information necessary for categorization. On the other hand, traditional detection-based approaches perform poorly in occluded environments, especially when the crowd size gets bigger. Hence, to solve the categorized crowd counting problem, we develop a novel attention-based deep learning framework that addresses the above limitations. In particular, our approach works in three phases: i) We first generate basic detection based sitting and standing density maps to capture the local information; ii) Then, we generate a crowd counting based density map as global counting feature; iii) Finally, we have a cross-branch segregating refinement phase that splits the crowd density map into final sitting and standing density maps using attention mechanism. Extensive experiments show the efficacy of our approach in solving the categorized crowd counting problem.

SIDec 4, 2019
Keyword Aware Influential Community Search in Large Attributed Graphs

Md. Saiful Islam, Mohammed Eunus Ali, Yong-Bin Kang et al.

We introduce a novel keyword-aware influential community query KICQ that finds the most influential communities from an attributed graph, where an influential community is defined as a closely connected group of vertices having some dominance over other groups of vertices with the expertise (a set of keywords) matching with the query terms (words or phrases). We first design the KICQ that facilitates users to issue an influential CS query intuitively by using a set of query terms, and predicates (AND or OR). In this context, we propose a novel word-embedding based similarity model that enables semantic community search, which substantially alleviates the limitations of exact keyword based community search. Next, we propose a new influence measure for a community that considers both the cohesiveness and influence of the community and eliminates the need for specifying values of internal parameters of a network. Finally, we propose two efficient algorithms for searching influential communities in large attributed graphs. We present detailed experiments and a case study to demonstrate the effectiveness and efficiency of the proposed approaches.

CVDec 3, 2018
Identification and Recognition of Rice Diseases and Pests Using Convolutional Neural Networks

Chowdhury Rafeed Rahman, Preetom Saha Arko, Mohammed Eunus Ali et al.

An accurate and timely detection of diseases and pests in rice plants can help farmers in applying timely treatment on the plants and thereby can reduce the economic losses substantially. Recent developments in deep learning based convolutional neural networks (CNN) have greatly improved the image classification accuracy. Being motivated by the success of CNNs in image classification, deep learning based approaches have been developed in this paper for detecting diseases and pests from rice plant images. The contribution of this paper is two fold: (i) State-of-the-art large scale architectures such as VGG16 and InceptionV3 have been adopted and fine tuned for detecting and recognizing rice diseases and pests. Experimental results show the effectiveness of these models with real datasets. (ii) Since large scale architectures are not suitable for mobile devices, a two-stage small CNN architecture has been proposed, and compared with the state-of-the-art memory efficient CNN architectures such as MobileNet, NasNet Mobile and SqueezeNet. Experimental results show that the proposed architecture can achieve the desired accuracy of 93.3\% with a significantly reduced model size (e.g., 99\% less size compared to that of VGG16).

CYMar 24, 2018
Can We Predict the Scenic Beauty of Locations from Geo-tagged Flickr Images?

Ch. Md. Rakin Haider, Mohammed Eunus Ali

In this work, we propose a novel technique to determine the aesthetic score of a location from social metadata of Flickr photos. In particular, we built machine learning classifiers to predict the class of a location where each class corresponds to a set of locations having equal aesthetic rating. These models are trained on two empirically build datasets containing locations in two different cities (Rome and Paris) where aesthetic ratings of locations were gathered from TripAdvisor.com. In this work we exploit the idea that in a location with higher aesthetic rating, it is more likely for an user to capture a photo and other users are more likely to interact with that photo. Our models achieved as high as 79.48% accuracy (78.60% precision and 79.27% recall) on Rome dataset and 73.78% accuracy(75.62% precision and 78.07% recall) on Paris dataset. The proposed technique can facilitate urban planning, tour planning and recommending aesthetically pleasing paths.