Ch. Md. Rakin Haider

DB
h-index59
4papers
18citations
Novelty43%
AI Score34

4 Papers

DBJul 1, 2022
The "AI+R"-tree: An Instance-optimized R-tree

Abdullah-Al-Mamun, Ch. Md. Rakin Haider, Jianguo Wang et al.

The emerging class of instance-optimized systems has shown potential to achieve high performance by specializing to a specific data and query workloads. Particularly, Machine Learning (ML) techniques have been applied successfully to build various instance-optimized components (e.g., learned indexes). This paper investigates to leverage ML techniques to enhance the performance of spatial indexes, particularly the R-tree, for a given data and query workloads. As the areas covered by the R-tree index nodes overlap in space, upon searching for a specific point in space, multiple paths from root to leaf may potentially be explored. In the worst case, the entire R-tree could be searched. In this paper, we define and use the overlap ratio to quantify the degree of extraneous leaf node accesses required by a range query. The goal is to enhance the query performance of a traditional R-tree for high-overlap range queries as they tend to incur long running-times. We introduce a new AI-tree that transforms the search operation of an R-tree into a multi-label classification task to exclude the extraneous leaf node accesses. Then, we augment a traditional R-tree to the AI-tree to form a hybrid "AI+R"-tree. The "AI+R"-tree can automatically differentiate between the high- and low-overlap queries using a learned model. Thus, the "AI+R"-tree processes high-overlap queries using the AI-tree, and the low-overlap queries using the R-tree. Experiments on real datasets demonstrate that the "AI+R"-tree can enhance the query performance over a traditional R-tree by up to 500%.

CLNov 30, 2025
When Safety Blocks Sense: Measuring Semantic Confusion in LLM Refusals

Riad Ahmed Anonto, Md Labid Al Nahiyan, Md Tanvir Hassan et al.

Safety-aligned language models often refuse prompts that are actually harmless. Current evaluations mostly report global rates such as false rejection or compliance. These scores treat each prompt alone and miss local inconsistency, where a model accepts one phrasing of an intent but rejects a close paraphrase. This gap limits diagnosis and tuning. We introduce "semantic confusion," a failure mode that captures such local inconsistency, and a framework to measure it. We build ParaGuard, a 10k-prompt corpus of controlled paraphrase clusters that hold intent fixed while varying surface form. We then propose three model-agnostic metrics at the token level: Confusion Index, Confusion Rate, and Confusion Depth. These metrics compare each refusal to its nearest accepted neighbors and use token embeddings, next-token probabilities, and perplexity signals. Experiments across diverse model families and deployment guards show that global false-rejection rate hides critical structure. Our metrics reveal globally unstable boundaries in some settings, localized pockets of inconsistency in others, and cases where stricter refusal does not increase inconsistency. We also show how confusion-aware auditing separates how often a system refuses from how sensibly it refuses. This gives developers a practical signal to reduce false refusals while preserving safety.

DBFeb 14, 2025
Tradeoffs in Processing Queries and Supporting Updates over an ML-Enhanced R-tree

Abdullah Al-Mamun, Ch. Md. Rakin Haider, Jianguo Wang et al.

Machine Learning (ML) techniques have been successfully applied to design various learned database index structures for both the one- and multi-dimensional spaces. Particularly, a class of traditional multi-dimensional indexes has been augmented with ML models to design ML-enhanced variants of their traditional counterparts. This paper focuses on the R-tree multi-dimensional index structure as it is widely used for indexing multi-dimensional data. The R-tree has been augmented with machine learning models to enhance the R-tree performance. The AI+R-tree is an ML-enhanced R-tree index structure that augments a traditional disk-based R-tree with an ML model to enhance the R-tree's query processing performance, mainly, to avoid navigating the overlapping branches of the R-tree that do not yield query results, e.g., in the presence of high-overlap among the rectangles of the R-tree nodes. We investigate the empirical tradeoffs in processing dynamic query workloads and in supporting updates over the AI+R-tree. Particularly, we investigate the impact of the choice of ML models over the AI+R-tree query processing performance. Moreover, we present a case study of designing a custom loss function for a neural network model tailored to the query processing requirements of the AI+R-tree. Furthermore, we present the design tradeoffs for adopting various strategies for supporting dynamic inserts, updates, and deletes with the vision of realizing a mutable AI+R-tree. Experiments on real datasets demonstrate that the AI+R-tree can enhance the query processing performance of a traditional R-tree for high-overlap range queries by up to 5.4X while achieving up to 99% average query recall.

CYMar 24, 2018
Can We Predict the Scenic Beauty of Locations from Geo-tagged Flickr Images?

Ch. Md. Rakin Haider, Mohammed Eunus Ali

In this work, we propose a novel technique to determine the aesthetic score of a location from social metadata of Flickr photos. In particular, we built machine learning classifiers to predict the class of a location where each class corresponds to a set of locations having equal aesthetic rating. These models are trained on two empirically build datasets containing locations in two different cities (Rome and Paris) where aesthetic ratings of locations were gathered from TripAdvisor.com. In this work we exploit the idea that in a location with higher aesthetic rating, it is more likely for an user to capture a photo and other users are more likely to interact with that photo. Our models achieved as high as 79.48% accuracy (78.60% precision and 79.27% recall) on Rome dataset and 73.78% accuracy(75.62% precision and 78.07% recall) on Paris dataset. The proposed technique can facilitate urban planning, tour planning and recommending aesthetically pleasing paths.