LGFeb 27
FastODT: A tree-based framework for efficient continual learningDaniel Bretsko, Piotr Walas, Devashish Khulbe et al.
Machine learning models deployed in real-world settings must operate under evolving data distributions and constrained computational resources. This challenge is particularly acute in non-stationary domains such as energy time series, weather monitoring, and environmental sensing. To remain effective, models must support adaptability, continuous learning, and long-term knowledge retention. This paper introduces a oblivious tree-based model with Hoeffding bound controlling its growth. It seamlessly integrates rapid learning and inference with efficient memory management and robust knowledge preservation, thus allowing for online learning. Extensive experiments across energy and environmental sensing time-series benchmarks demonstrate that the proposed framework achieves performance competitive with, and in several cases surpassing, existing online and batch learning methods, while maintaining superior computational efficiency. Collectively, these results demonstrate that the proposed approach fulfills the core objectives of adaptability, continual updating, and efficient retraining without full model retraining. The framework provides a scalable and resource-aware foundation for deployment in real-world non-stationary environments where resources are constrained and sustained adaptation is essential.
LGJul 5, 2025
Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning ModelsDevashish Khulbe, Alexander Belyi, Stanislav Sobolevsky
Urban socioeconomic modeling has predominantly concentrated on extensive location and neighborhood-based features, relying on the localized population footprint. However, networks in urban systems are common, and many urban modeling methods don't account for network-based effects. In this study, we propose using commute information records from the census as a reliable and comprehensive source to construct mobility networks across cities. Leveraging deep learning architectures, we employ these commute networks across U.S. metro areas for socioeconomic modeling. We show that mobility network structures provide significant predictive performance without considering any node features. Consequently, we use mobility networks to present a supervised learning framework to model a city's socioeconomic indicator directly, combining Graph Neural Network and Vanilla Neural Network models to learn all parameters in a single learning pipeline. Our experiments in 12 major U.S. cities show the proposed model outperforms previous conventional machine learning models. This work provides urban researchers methods to incorporate network effects in urban modeling and informs stakeholders of wider network-based effects in urban policymaking and planning.
CRDec 10, 2019
Deep Learning Based Android Malware Detection FrameworkSoumya Sourav, Devashish Khulbe, Naman Kapoor
With the development in the field of smartphones and ever growing base of Internet, various softwares are left prone to many malicious activities like pharming, phishing, ransomware, spam, spoofing, spyware, eavesdropping, etc. These threats have not spared the smartphones which are equally prone to them. In this work, we aim to detect these malwares with accuracy and efficiency. This being essentially a classification problem, we use various machine learning methods for this task. We observe that across models, Attention based Artificial Neural Networks (ANN), or broadly speaking, Deep Learning, are most suitable for this problem. Attention based ANNs are an amalgamation of accuracy and efficiency, the crux of our work. The accuracy achieved by our model is around 96.75\%. Our model runs the test on Android Package Files (APKs) to determine whether a particular application is malicious or not by doing behavior analysis on android application under consideration.
SIJun 26, 2019
Modeling Food Popularity Dependencies using Social Media dataDevashish Khulbe, Manu Pathak
The rise in popularity of major social media platforms have enabled people to share photos and textual information about their daily life. One of the popular topics about which information is shared is food. Since a lot of media about food are attributed to particular locations and restaurants, information like spatio-temporal popularity of various cuisines can be analyzed. Tracking the popularity of food types and retail locations across space and time can also be useful for business owners and restaurant investors. In this work, we present an approach using off-the shelf machine learning techniques to identify trends and popularity of cuisine types in an area using geo-tagged data from social media, Google images and Yelp. After adjusting for time, we use the Kernel Density Estimation to get hot spots across the location and model the dependencies among food cuisines popularity using Bayesian Networks. We consider the Manhattan borough of New York City as the location for our analyses but the approach can be used for any area with social media data and information about retail businesses.
LGJun 25, 2019
Modeling Severe Traffic Accidents With Spatial And Temporal FeaturesDevashish Khulbe, Soumya Sourav
We present an approach to estimate the severity of traffic related accidents in aggregated (area-level) and disaggregated (point level) data. Exploring spatial features, we measure complexity of road networks using several area level variables. Also using temporal and other situational features from open data for New York City, we use Gradient Boosting models for inference and measuring feature importance along with Gaussian Processes to model spatial dependencies in the data. The results show significant importance of complexity in aggregated model as well as as other features in prediction which may be helpful in framing policies and targeting interventions for preventing severe traffic related accidents and injuries.