Amit Kumar

CV
h-index32
45papers
837citations
Novelty45%
AI Score55

45 Papers

LGMay 8, 2022
Online Algorithms with Multiple Predictions

Keerti Anand, Rong Ge, Amit Kumar et al.

This paper studies online algorithms augmented with multiple machine-learned predictions. While online algorithms augmented with a single prediction have been extensively studied in recent years, the literature for the multiple predictions setting is sparse. In this paper, we give a generic algorithmic framework for online covering problems with multiple predictions that obtains an online solution that is competitive against the performance of the best predictor. Our algorithm incorporates the use of predictions in the classic potential-based analysis of online algorithms. We apply our algorithmic framework to solve classical problems such as online set cover, (weighted) caching, and online facility location in the multiple predictions setting. Our algorithm can also be robustified, i.e., the algorithm can be simultaneously made competitive against the best prediction and the performance of the best online algorithm (without prediction).

LGMay 18, 2022
A Regression Approach to Learning-Augmented Online Algorithms

Keerti Anand, Rong Ge, Amit Kumar et al.

The emerging field of learning-augmented online algorithms uses ML techniques to predict future input parameters and thereby improve the performance of online algorithms. Since these parameters are, in general, real-valued functions, a natural approach is to use regression techniques to make these predictions. We introduce this approach in this paper, and explore it in the context of a general online search framework that captures classic problems like (generalized) ski rental, bin packing, minimum makespan scheduling, etc. We show nearly tight bounds on the sample complexity of this regression problem, and extend our results to the agnostic setting. From a technical standpoint, we show that the key is to incorporate online optimization benchmarks in the design of the loss function for the regression problem, thereby diverging from the use of off-the-shelf regression tools with standard bounds on statistical error.

CVMar 28, 2022
HIME: Efficient Headshot Image Super-Resolution with Multiple Exemplars

Xiaoyu Xiang, Jon Morton, Fitsum A Reda et al.

A promising direction for recovering the lost information in low-resolution headshot images is utilizing a set of high-resolution exemplars from the same identity. Complementary images in the reference set can improve the generated headshot quality across many different views and poses. However, it is challenging to make the best use of multiple exemplars: the quality and alignment of each exemplar cannot be guaranteed. Using low-quality and mismatched images as references will impair the output results. To overcome these issues, we propose an efficient Headshot Image Super-Resolution with Multiple Exemplars network (HIME) method. Compared with previous methods, our network can effectively handle the misalignment between the input and the reference without requiring facial priors and learn the aggregated reference set representation in an end-to-end manner. Furthermore, to reconstruct more detailed facial features, we propose a correlation loss that provides a rich representation of the local texture in a controllable spatial range. Experimental results demonstrate that the proposed framework not only has significantly fewer computation cost than recent exemplar-guided methods but also achieves better qualitative and quantitative performance.

LGJul 21, 2023
Random Separating Hyperplane Theorem and Learning Polytopes

Chiranjib Bhattacharyya, Ravindran Kannan, Amit Kumar

The Separating Hyperplane theorem is a fundamental result in Convex Geometry with myriad applications. Our first result, Random Separating Hyperplane Theorem (RSH), is a strengthening of this for polytopes. $\rsh$ asserts that if the distance between $a$ and a polytope $K$ with $k$ vertices and unit diameter in $\Re^d$ is at least $δ$, where $δ$ is a fixed constant in $(0,1)$, then a randomly chosen hyperplane separates $a$ and $K$ with probability at least $1/poly(k)$ and margin at least $Ω\left(δ/\sqrt{d} \right)$. An immediate consequence of our result is the first near optimal bound on the error increase in the reduction from a Separation oracle to an Optimization oracle over a polytope. RSH has algorithmic applications in learning polytopes. We consider a fundamental problem, denoted the ``Hausdorff problem'', of learning a unit diameter polytope $K$ within Hausdorff distance $δ$, given an optimization oracle for $K$. Using RSH, we show that with polynomially many random queries to the optimization oracle, $K$ can be approximated within error $O(δ)$. To our knowledge this is the first provable algorithm for the Hausdorff Problem. Building on this result, we show that if the vertices of $K$ are well-separated, then an optimization oracle can be used to generate a list of points, each within Hausdorff distance $O(δ)$ of $K$, with the property that the list contains a point close to each vertex of $K$. Further, we show how to prune this list to generate a (unique) approximation to each vertex of the polytope. We prove that in many latent variable settings, e.g., topic modeling, LDA, optimization oracles do exist provided we project to a suitable SVD subspace. Thus, our work yields the first efficient algorithm for finding approximations to the vertices of the latent polytope under the well-separatedness assumption.

RODec 11, 2024Code
Vision-based indoor localization of nano drones in controlled environment with its applications

Simranjeet Singh, Amit Kumar, Fayyaz Pocker Chemban et al.

Navigating unmanned aerial vehicles in environments where GPS signals are unavailable poses a compelling and intricate challenge. This challenge is further heightened when dealing with Nano Aerial Vehicles (NAVs) due to their compact size, payload restrictions, and computational capabilities. This paper proposes an approach for localization using off-board computing, an off-board monocular camera, and modified open-source algorithms. The proposed method uses three parallel proportional-integral-derivative controllers on the off-board computer to provide velocity corrections via wireless communication, stabilizing the NAV in a custom-controlled environment. Featuring a 3.1cm localization error and a modest setup cost of 50 USD, this approach proves optimal for environments where cost considerations are paramount. It is especially well-suited for applications like teaching drone control in academic institutions, where the specified error margin is deemed acceptable. Various applications are designed to validate the proposed technique, such as landing the NAV on a moving ground vehicle, path planning in a 3D space, and localizing multi-NAVs. The created package is openly available at https://github.com/simmubhangu/eyantra_drone to foster research in this field.

CYOct 26, 2023
Bias in Evaluation Processes: An Optimization-Based Model

L. Elisa Celis, Amit Kumar, Anay Mehrotra et al.

Biases with respect to socially-salient attributes of individuals have been well documented in evaluation processes used in settings such as admissions and hiring. We view such an evaluation process as a transformation of a distribution of the true utility of an individual for a task to an observed distribution and model it as a solution to a loss minimization problem subject to an information constraint. Our model has two parameters that have been identified as factors leading to biases: the resource-information trade-off parameter in the information constraint and the risk-averseness parameter in the loss function. We characterize the distributions that arise from our model and study the effect of the parameters on the observed distribution. The outputs of our model enrich the class of distributions that can be used to capture variation across groups in the observed evaluations. We empirically validate our model by fitting real-world datasets and use it to study the effect of interventions in a downstream selection task. These results contribute to an understanding of the emergence of bias in evaluation processes and provide tools to guide the deployment of interventions to mitigate biases.

DSSep 7, 2024
Centralized Selection with Preferences in the Presence of Biases

L. Elisa Celis, Amit Kumar, Nisheeth K. Vishnoi et al.

This paper considers the scenario in which there are multiple institutions, each with a limited capacity for candidates, and candidates, each with preferences over the institutions. A central entity evaluates the utility of each candidate to the institutions, and the goal is to select candidates for each institution in a way that maximizes utility while also considering the candidates' preferences. The paper focuses on the setting in which candidates are divided into multiple groups and the observed utilities of candidates in some groups are biased--systematically lower than their true utilities. The first result is that, in these biased settings, prior algorithms can lead to selections with sub-optimal true utility and significant discrepancies in the fraction of candidates from each group that get their preferred choices. Subsequently, an algorithm is presented along with proof that it produces selections that achieve near-optimal group fairness with respect to preferences while also nearly maximizing the true utility under distributional assumptions. Further, extensive empirical validation of these results in real-world and synthetic settings, in which the distributional assumptions may not hold, are presented.

DSMar 18
Learning-Augmented Algorithms for $k$-median via Online Learning

Anish Hebbar, Rong Ge, Amit Kumar et al.

The field of learning-augmented algorithms seeks to use ML techniques on past instances of a problem to inform an algorithm designed for a future instance. In this paper, we introduce a novel model for learning-augmented algorithms inspired by online learning. In this model, we are given a sequence of instances of a problem and the goal of the learning-augmented algorithm is to use prior instances to propose a solution to a future instance of the problem. The performance of the algorithm is measured by its average performance across all the instances, where the performance on a single instance is the ratio between the cost of the algorithm's solution and that of an optimal solution for that instance. We apply this framework to the classic $k$-median clustering problem, and give an efficient learning algorithm that can approximately match the average performance of the best fixed $k$-median solution in hindsight across all the instances. We also experimentally evaluate our algorithm and show that its empirical performance is close to optimal, and also that it automatically adapts the solution to a dynamically changing sequence.

SPAug 20, 2024
Deep Learning-based Classification of Dementia using Image Representation of Subcortical Signals

Shivani Ranjan, Ayush Tripathi, Harshal Shende et al.

Dementia is a neurological syndrome marked by cognitive decline. Alzheimer's disease (AD) and Frontotemporal dementia (FTD) are the common forms of dementia, each with distinct progression patterns. EEG, a non-invasive tool for recording brain activity, has shown potential in distinguishing AD from FTD and mild cognitive impairment (MCI). Previous studies have utilized various EEG features, such as subband power and connectivity patterns to differentiate these conditions. However, artifacts in EEG signals can obscure crucial information, necessitating advanced signal processing techniques. This study aims to develop a deep learning-based classification system for dementia by analyzing scout time-series signals from deep brain regions, specifically the hippocampus, amygdala, and thalamus. The study utilizes scout time series extracted via the standardized low-resolution brain electromagnetic tomography (sLORETA) technique. The time series is converted to image representations using continuous wavelet transform (CWT) and fed as input to deep learning models. Two high-density EEG datasets are utilized to check for the efficacy of the proposed method: the online BrainLat dataset (comprising AD, FTD, and healthy controls (HC)) and the in-house IITD-AIIA dataset (including subjects with AD, MCI, and HC). Different classification strategies and classifier combinations have been utilized for the accurate mapping of classes on both datasets. The best results were achieved by using a product of probabilities from classifiers for left and right subcortical regions in conjunction with the DenseNet model architecture. It yields accuracies of 94.17$\%$ and 77.72$\%$ on the BrainLat and IITD-AIIA datasets, respectively. This highlights the potential of this approach for early and accurate differentiation of neurodegenerative disorders.

CLMar 31, 2023
Exploiting Multilingualism in Low-resource Neural Machine Translation via Adversarial Learning

Amit Kumar, Ajay Pratap, Anil Kumar Singh

Generative Adversarial Networks (GAN) offer a promising approach for Neural Machine Translation (NMT). However, feeding multiple morphologically languages into a single model during training reduces the NMT's performance. In GAN, similar to bilingual models, multilingual NMT only considers one reference translation for each sentence during model training. This single reference translation limits the GAN model from learning sufficient information about the source sentence representation. Thus, in this article, we propose Denoising Adversarial Auto-encoder-based Sentence Interpolation (DAASI) approach to perform sentence interpolation by learning the intermediate latent representation of the source and target sentences of multilingual language pairs. Apart from latent representation, we also use the Wasserstein-GAN approach for the multilingual NMT model by incorporating the model generated sentences of multiple languages for reward computation. This computed reward optimizes the performance of the GAN-based multilingual model in an effective manner. We demonstrate the experiments on low-resource language pairs and find that our approach outperforms the existing state-of-the-art approaches for multilingual NMT with a performance gain of up to 4 BLEU points. Moreover, we use our trained model on zero-shot language pairs under an unsupervised scenario and show the robustness of the proposed approach.

STAug 25, 2024
StockTime: A Time Series Specialized Large Language Model Architecture for Stock Price Prediction

Shengkun Wang, Taoran Ji, Linhan Wang et al.

The stock price prediction task holds a significant role in the financial domain and has been studied for a long time. Recently, large language models (LLMs) have brought new ways to improve these predictions. While recent financial large language models (FinLLMs) have shown considerable progress in financial NLP tasks compared to smaller pre-trained language models (PLMs), challenges persist in stock price forecasting. Firstly, effectively integrating the modalities of time series data and natural language to fully leverage these capabilities remains complex. Secondly, FinLLMs focus more on analysis and interpretability, which can overlook the essential features of time series data. Moreover, due to the abundance of false and redundant information in financial markets, models often produce less accurate predictions when faced with such input data. In this paper, we introduce StockTime, a novel LLM-based architecture designed specifically for stock price data. Unlike recent FinLLMs, StockTime is specifically designed for stock price time series data. It leverages the natural ability of LLMs to predict the next token by treating stock prices as consecutive tokens, extracting textual information such as stock correlations, statistical trends and timestamps directly from these stock prices. StockTime then integrates both textual and time series data into the embedding space. By fusing this multimodal data, StockTime effectively predicts stock prices across arbitrary look-back periods. Our experiments demonstrate that StockTime outperforms recent LLMs, as it gives more accurate predictions while reducing memory usage and runtime costs.

CLMar 3, 2023
Exploiting Language Relatedness in Machine Translation Through Domain Adaptation Techniques

Amit Kumar, Rupjyoti Baruah, Ajay Pratap et al.

One of the significant challenges of Machine Translation (MT) is the scarcity of large amounts of data, mainly parallel sentence aligned corpora. If the evaluation is as rigorous as resource-rich languages, both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) can produce good results with such large amounts of data. However, it is challenging to improve the quality of MT output for low resource languages, especially in NMT and SMT. In order to tackle the challenges faced by MT, we present a novel approach of using a scaled similarity score of sentences, especially for related languages based on a 5-gram KenLM language model with Kneser-ney smoothing technique for filtering in-domain data from out-of-domain corpora that boost the translation quality of MT. Furthermore, we employ other domain adaptation techniques such as multi-domain, fine-tuning and iterative back-translation approach to compare our novel approach on the Hindi-Nepali language pair for NMT and SMT. Our approach succeeds in increasing ~2 BLEU point on multi-domain approach, ~3 BLEU point on fine-tuning for NMT and ~2 BLEU point on iterative back-translation approach.

SEApr 30Code
Multifaceted Hero Developers and Bug-Fixing Outcomes Across Severity

Amit Kumar, Mahen Gandhi, Meher Bhardwaj et al.

Open-source projects often rely on a small group of highly active contributors known as hero developers. Prior work shows that hero developers are common in many OSS and enterprise projects, yet who qualifies as a hero depends heavily on the chosen contribution metric. Code-based metrics identify implementation-focused developers, whereas discussion-based metrics highlight coordination and communication; these metrics capture distinct facets of contribution. We conducted a measurement-sensitive study of multifaceted heroism across 77 Apache Software Foundation projects using three technical measures (commit count, distinct files touched, churn) and two social measures (issue-comment count, number of distinct issues commented on). We examined hero prevalence, overlap among hero sets, and severity-wise bug-fixing outcomes via fix and reopen rates. Results show that hero projects are common under all measures, but identified heroes differ substantially across facets. The pooled Jaccard overlap between technical and social hero sets is only 0.10. Cross-facet asymmetry is evident: 71.4% of technical heroes exhibit strong social activity, while only 24.2% of social heroes show strong technical activity. Fix-rate and reopen-rate differences are modest, yet hero-category rankings vary across severity levels and outcome measures. These findings indicate that heroism is not a single, metric-independent role. A multifaceted perspective offers a more reliable understanding of key contributors and better supports developer prioritisation and severity-aware bug assignment.

CVSep 25, 2024
TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans

Aggelina Chatziagapi, Bindita Chaudhuri, Amit Kumar et al.

We introduce a novel framework that learns a dynamic neural radiance field (NeRF) for full-body talking humans from monocular videos. Prior work represents only the body pose or the face. However, humans communicate with their full body, combining body pose, hand gestures, as well as facial expressions. In this work, we propose TalkinNeRF, a unified NeRF-based network that represents the holistic 4D human motion. Given a monocular video of a subject, we learn corresponding modules for the body, face, and hands, that are combined together to generate the final result. To capture complex finger articulation, we learn an additional deformation field for the hands. Our multi-identity representation enables simultaneous training for multiple subjects, as well as robust animation under completely unseen poses. It can also generalize to novel identities, given only a short video as input. We demonstrate state-of-the-art performance for animating full-body talking humans, with fine-grained hand articulation and facial expressions.

DSMar 15
Improved Online Hitting Set Algorithms for Structured and Geometric Set Systems

Sujoy Bhore, Anupam Gupta, Amit Kumar

In the online hitting set problem, sets arrive over time, and the algorithm has to maintain a subset of elements that hit all the sets seen so far. Alon, Awerbuch, Azar, Buchbinder, and Naor (SICOMP 2009) gave an algorithm with competitive ratio $O(\log n \log m)$ for the (general) online hitting set and set cover problems for $m$ sets and $n$ elements; this is known to be tight for efficient online algorithms. Given this barrier for general set systems, we ask: can we break this double-logarithmic phenomenon for online hitting set/set cover on structured and geometric set systems? We provide an $O(\log n \log\log n)$-competitive algorithm for the weighted online hitting set problem on set systems with linear shallow-cell complexity, replacing the double-logarithmic factor in the general result by effectively a single logarithmic term. As a consequence of our results we obtain the first bounds for weighted online hitting set for natural geometric set families, thereby answering open questions regarding the gap between general and geometric weighted online hitting set problems.

CLFeb 6
Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning

Deqian Kong, Minglu Zhao, Aoyang Qin et al.

Standard chain-of-thought reasoning generates a solution in a single forward pass, committing irrevocably to each token and lacking a mechanism to recover from early errors. We introduce Inference-Time Rethinking, a generative framework that enables iterative self-correction by decoupling declarative latent thought vectors from procedural generation. We factorize reasoning into a continuous latent thought vector (what to reason about) and a decoder that verbalizes the trace conditioned on this vector (how to reason). Beyond serving as a declarative buffer, latent thought vectors compress the reasoning structure into a continuous representation that abstracts away surface-level token variability, making gradient-based optimization over reasoning strategies well-posed. Our prior model maps unstructured noise to a learned manifold of valid reasoning patterns, and at test time we employ a Gibbs-style procedure that alternates between generating a candidate trace and optimizing the latent vector to better explain that trace, effectively navigating the latent manifold to refine the reasoning strategy. Training a 0.2B-parameter model from scratch on GSM8K, our method with 30 rethinking iterations surpasses baselines with 10 to 15 times more parameters, including a 3B counterpart. This result demonstrates that effective mathematical reasoning can emerge from sophisticated inference-time computation rather than solely from massive parameter counts.

GNNov 19, 2025Code
CASPER: Cross-modal Alignment of Spatial and single-cell Profiles for Expression Recovery

Amit Kumar, Maninder Kaur, Raghvendra Mall et al.

Spatial Transcriptomics enables mapping of gene expression within its native tissue context, but current platforms measure only a limited set of genes due to experimental constraints and excessive costs. To overcome this, computational models integrate Single-Cell RNA Sequencing data with Spatial Transcriptomics to predict unmeasured genes. We propose CASPER, a cross-attention based framework that predicts unmeasured gene expression in Spatial Transcriptomics by leveraging centroid-level representations from Single-Cell RNA Sequencing. We performed rigorous testing over four state-of-the-art Spatial Transcriptomics/Single-Cell RNA Sequencing dataset pairs across four existing baseline models. CASPER shows significant improvement in nine out of the twelve metrics for our experiments. This work paves the way for further work in Spatial Transcriptomics to Single-Cell RNA Sequencing modality translation. The code for CASPER is available at https://github.com/AI4Med-Lab/CASPER.

SEJan 24, 2020Code
Advaita: Bug Duplicity Detection System

Amit Kumar, Manohar Madanu, Hari Prakash et al.

Bugs are prevalent in software development. To improve software quality, bugs are filed using a bug tracking system. Properties of a reported bug would consist of a headline, description, project, product, component that is affected by the bug and the severity of the bug. Duplicate bugs rate (% of duplicate bugs) are in the range from single digit (1 to 9%) to double digits (40%) based on the product maturity , size of the code and number of engineers working on the project. Duplicate bugs range are between 9% to 39% in some of the open source projects like Eclipse, Firefox etc. Detection of duplicity deals with identifying whether any two bugs convey the same meaning. This detection of duplicates helps in de-duplication. Detecting duplicate bugs help reduce triaging efforts and saves time for developers in fixing the issues. Traditional natural language processing techniques are less accurate in identifying similarity between sentences. Using the bug data present in a bug tracking system, various approaches were explored including several machine learning algorithms, to obtain a viable approach that can identify duplicate bugs, given a pair of sentences(i.e. the respective bug descriptions). This approach considers multiple sets of features viz. basic text statistical features, semantic features and contextual features. These features are extracted from the headline, description and component and are subsequently used to train a classification algorithm.

CVMay 9, 2019Code
A Dual-Path Model With Adaptive Attention For Vehicle Re-Identification

Pirazh Khorramshahi, Amit Kumar, Neehar Peri et al.

In recent years, attention models have been extensively used for person and vehicle re-identification. Most re-identification methods are designed to focus attention on key-point locations. However, depending on the orientation, the contribution of each key-point varies. In this paper, we present a novel dual-path adaptive attention model for vehicle re-identification (AAVER). The global appearance path captures macroscopic vehicle features while the orientation conditioned part appearance path learns to capture localized discriminative features by focusing attention on the most informative key-points. Through extensive experimentation, we show that the proposed AAVER method is able to accurately re-identify vehicles in unconstrained scenarios, yielding state of the art results on the challenging dataset VeRi-776. As a byproduct, the proposed system is also able to accurately predict vehicle key-points and shows an improvement of more than 7% over state of the art. The code for key-point estimation model is available at https://github.com/Pirazh/Vehicle_Key_Point_Orientation_Estimation.

ROJan 11, 2023
An Overview of Artificial Intelligence-based Soft Upper Limb Exoskeleton for Rehabilitation: A Descriptive Review

Sanjukta Halder, Amit Kumar

The upper limb robotic exoskeleton is an electromechanical device which use to recover a patients motor dysfunction in the rehabilitation field. It can provide repetitive, comprehensive, focused, positive, and precise training to regain the joints and muscles capability. It has been shown that existing robotic exoskeletons are generally used rigid motors and mechanical structures. Soft robotic devices can be a correct substitute for rigid ones. Soft exosuits are flexible, portable, comfortable, user-friendly, low-cost, and travel-friendly. Somehow, they need expertise or therapist to assist those devices. Also, they cannot be adaptable to different patients with non-identical physical parameters and various rehabilitation needs. For that reason, nowadays we need intelligent exoskeletons during rehabilitation which have to learn from patients previous data and act according to it with patients intention. There also has a big gap between theoretical and practical applications for using those exoskeletons. Most of the intelligent exoskeletons are prototype in manner. To solve this problem, the robotic exoskeleton should be made both criteria as ergonomic and portable. The exoskeletons have to the power of decision-making to avoid the presence of expertise. In this growing field, the present trend is to make the exoskeleton intelligent and make it more reliable to use in clinical practice.

LGDec 10, 2024
Machine Learning Algorithms for Detecting Mental Stress in College Students

Ashutosh Singh, Khushdeep Singh, Amit Kumar et al.

In today's world, stress is a big problem that affects people's health and happiness. More and more people are feeling stressed out, which can lead to lots of health issues like breathing problems, feeling overwhelmed, heart attack, diabetes, etc. This work endeavors to forecast stress and non-stress occurrences among college students by applying various machine learning algorithms: Decision Trees, Random Forest, Support Vector Machines, AdaBoost, Naive Bayes, Logistic Regression, and K-nearest Neighbors. The primary objective of this work is to leverage a research study to predict and mitigate stress and non-stress based on the collected questionnaire dataset. We conducted a workshop with the primary goal of studying the stress levels found among the students. This workshop was attended by Approximately 843 students aged between 18 to 21 years old. A questionnaire was given to the students validated under the guidance of the experts from the All India Institute of Medical Sciences (AIIMS) Raipur, Chhattisgarh, India, on which our dataset is based. The survey consists of 28 questions, aiming to comprehensively understand the multidimensional aspects of stress, including emotional well-being, physical health, academic performance, relationships, and leisure. This work finds that Support Vector Machines have a maximum accuracy for Stress, reaching 95\%. The study contributes to a deeper understanding of stress determinants. It aims to improve college student's overall quality of life and academic success, addressing the multifaceted nature of stress.

CVFeb 11, 2025
KPIs 2024 Challenge: Advancing Glomerular Segmentation from Patch- to Slide-Level

Ruining Deng, Tianyuan Yao, Yucheng Tang et al.

Chronic kidney disease (CKD) is a major global health issue, affecting over 10% of the population and causing significant mortality. While kidney biopsy remains the gold standard for CKD diagnosis and treatment, the lack of comprehensive benchmarks for kidney pathology segmentation hinders progress in the field. To address this, we organized the Kidney Pathology Image Segmentation (KPIs) Challenge, introducing a dataset that incorporates preclinical rodent models of CKD with over 10,000 annotated glomeruli from 60+ Periodic Acid Schiff (PAS)-stained whole slide images. The challenge includes two tasks, patch-level segmentation and whole slide image segmentation and detection, evaluated using the Dice Similarity Coefficient (DSC) and F1-score. By encouraging innovative segmentation methods that adapt to diverse CKD models and tissue conditions, the KPIs Challenge aims to advance kidney pathology analysis, establish new benchmarks, and enable precise, large-scale quantification for disease research and diagnosis.

IRMay 18, 2024
EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search

Kamalkumar Rathinasamy, Jayarama Nettar, Amit Kumar et al.

Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract relevant insights to address employee inquiries. These solutions often leverage pre-trained embedding models and generative models as foundational components. While pre-trained embeddings may exhibit proximity or disparity based on their original training objectives, they might not fully align with the unique characteristics of enterprise-specific data, leading to suboptimal alignment with the retrieval goals of enterprise environments. In this paper, we propose a comprehensive methodology for contextualizing pre-trained embedding models to enterprise environments, covering the entire process from data preparation to model fine-tuning and evaluation. By adapting the embeddings to better suit the retrieval tasks prevalent in enterprises, we aim to enhance the performance of information retrieval solutions. We discuss the process of fine-tuning, its effect on retrieval accuracy, and the potential benefits for enterprise information management. Our findings demonstrate the efficacy of fine-tuned embedding models in improving the precision and relevance of search results in enterprise settings.

LGFeb 26, 2025
CryptoPulse: Short-Term Cryptocurrency Forecasting with Dual-Prediction and Cross-Correlated Market Indicators

Amit Kumar, Taoran Ji

Cryptocurrencies fluctuate in markets with high price volatility, posing significant challenges for investors. To aid in informed decision-making, systems predicting cryptocurrency market movements have been developed, typically focusing on historical patterns. However, these methods often overlook three critical factors influencing market dynamics: 1) the macro investing environment, reflected in major cryptocurrency fluctuations affecting collaborative investor behaviors; 2) overall market sentiment, heavily influenced by news impacting investor strategies; and 3) technical indicators, offering insights into overbought or oversold conditions, momentum, and market trends, which are crucial for short-term price movements. This paper proposes a dual prediction mechanism that forecasts the next day's closing price by incorporating macroeconomic fluctuations, technical indicators, and individual cryptocurrency price changes. Additionally, a novel refinement mechanism enhances predictions through market sentiment-based rescaling and fusion. Experiments demonstrate that the proposed model achieves state-of-the-art performance, consistently outperforming ten comparison methods.

SEApr 1
Leveraging Commit Size Context and Hyper Co-Change Graph Centralities for Defect Prediction

Amit Kumar, Ethari Hrishikesh, Sonali Agarwal

File-level defect prediction models traditionally rely on product and process metrics. While process metrics effectively complement product metrics, they often overlook commit size the number of files changed per commit despite its strong association with software quality. Network centrality measures on dependency graphs have also proven to be valuable product level indicators. Motivated by this, we first redefine process metrics as commit size aware process metric vectors, transforming conventional scalar measures into 100 dimensional profiles that capture the distribution of changes across commit size strata. We then model change history as a hyper co change graph, where hyperedges naturally encode commit-size semantics. Vector centralities computed on these hypergraphs quantify size-aware node importance for source files. Experiments on nine long-lived Apache projects using five popular classifiers show that replacing scalar process metrics with the proposed commit size aware vectors, alongside product metrics, consistently improves predictive performance. These findings establish that commit size aware process metrics and hypergraph based vector centralities capture higher-order change semantics, leading to more discriminative, better calibrated, and statistically superior defect prediction models.

DCMar 7
Uber's Failover Architecture: Reconciling Reliability and Efficiency in Hyperscale Microservice Infrastructure

Mayank Bansal, Milind Chabbi, Kenneth Bogh et al.

Operating a global, real-time platform at Uber's scale requires infrastructure that is both resilient and cost-efficient. Historically, reliability was ensured through a costly 2x capacity model--each service provisioned to handle global traffic independently across two regions--leaving half the fleet idle. We present Uber's Failover Architecture (UFA), which replaces the uniform 2x model with a differentiated architecture aligned to business criticality. Critical services retain failover guarantees, while non-critical services opportunistically use failover buffer capacity reserved for critical services during steady state. During rare "full-peak" failovers, non-critical services are selectively preempted and rapidly restored, with differentiated Service-Level Agreements (SLAs) using on-demand capacity. Automated safeguards, including dependency analysis and regression gates, ensure critical services continue to function even while non-critical services are unavailable. The quantitative impact is significant: UFA reduces steady-state provisioning from 2x to 1.3x, raising utilization from ~20% to ~30% while sustaining 99.97% availability. To date, UFA has hardened over 4,000 unsafe dependencies, eliminated over one million CPU cores from a baseline of about four million cores.

CVMar 23, 2025
Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes

Kelly O. Marshall, Omid Poursaeed, Sergiu Oprea et al.

3D indoor scene generation is an important problem for the design of digital and real-world environments. To automate this process, a scene generation model should be able to not only generate plausible scene layouts, but also take into consideration visual features and style preferences. Existing methods for this task exhibit very limited control over these attributes, only allowing text inputs in the form of simple object-level descriptions or pairwise spatial relationships. Our proposed method Decorum enables users to control the scene generation process with natural language by adopting language-based representations at each stage. This enables us to harness recent advancements in Large Language Models (LLMs) to model language-to-language mappings. In addition, we show that using a text-based representation allows us to select furniture for our scenes using a novel object retrieval method based on multimodal LLMs. Evaluations on the benchmark 3D-FRONT dataset show that our methods achieve improvements over existing work in text-conditioned scene synthesis and object retrieval.

AIDec 18, 2023
Towards Fairness in Online Service with k Servers and its Application on Fair Food Delivery

Daman Deep Singh, Amit Kumar, Abhijnan Chakraborty

The k-SERVER problem is one of the most prominent problems in online algorithms with several variants and extensions. However, simplifying assumptions like instantaneous server movements and zero service time has hitherto limited its applicability to real-world problems. In this paper, we introduce a realistic generalization of k-SERVER without such assumptions - the k-FOOD problem, where requests with source-destination locations and an associated pickup time window arrive in an online fashion, and each has to be served by exactly one of the available k servers. The k-FOOD problem offers the versatility to model a variety of real-world use cases such as food delivery, ride sharing, and quick commerce. Moreover, motivated by the need for fairness in online platforms, we introduce the FAIR k-FOOD problem with the max-min objective. We establish that both k-FOOD and FAIR k-FOOD problems are strongly NP-hard and develop an optimal offline algorithm that arises naturally from a time-expanded flow network. Subsequently, we propose an online algorithm DOC4FOOD involving virtual movements of servers to the nearest request location. Experiments on a real-world food-delivery dataset, alongside synthetic datasets, establish the efficacy of the proposed algorithm against state-of-the-art fair food delivery algorithms.

DSMay 26, 2023
Universal Weak Coreset

Ragesh Jaiswal, Amit Kumar

Coresets for $k$-means and $k$-median problems yield a small summary of the data, which preserve the clustering cost with respect to any set of $k$ centers. Recently coresets have also been constructed for constrained $k$-means and $k$-median problems. However, the notion of coresets has the drawback that (i) they can only be applied in settings where the input points are allowed to have weights, and (ii) in general metric spaces, the size of the coresets can depend logarithmically on the number of points. The notion of weak coresets, which have less stringent requirements than coresets, has been studied in the context of classical $k$-means and $k$-median problems. A weak coreset is a pair $(J,S)$ of subsets of points, where $S$ acts as a summary of the point set and $J$ as a set of potential centers. This pair satisfies the properties that (i) $S$ is a good summary of the data as long as the $k$ centers are chosen from $J$ only, and (ii) there is a good choice of $k$ centers in $J$ with cost close to the optimal cost. We develop this framework, which we call universal weak coresets, for constrained clustering settings. In conjunction with recent coreset constructions for constrained settings, our designs give greater data compression, are conceptually simpler, and apply to a wide range of constrained $k$-median and $k$-means problems.

CLMay 21, 2023
Machine Translation by Projecting Text into the Same Phonetic-Orthographic Space Using a Common Encoding

Amit Kumar, Shantipriya Parida, Ajay Pratap et al.

The use of subword embedding has proved to be a major innovation in Neural Machine Translation (NMT). It helps NMT to learn better context vectors for Low Resource Languages (LRLs) so as to predict the target words by better modelling the morphologies of the two languages and also the morphosyntax transfer. Even so, their performance for translation in Indian language to Indian language scenario is still not as good as for resource-rich languages. One reason for this is the relative morphological richness of Indian languages, while another is that most of them fall into the extremely low resource or zero-shot categories. Since most major Indian languages use Indic or Brahmi origin scripts, the text written in them is highly phonetic in nature and phonetically similar in terms of abstract letters and their arrangements. We use these characteristics of Indian languages and their scripts to propose an approach based on common multilingual Latin-based encodings (WX notation) that take advantage of language similarity while addressing the morphological complexity issue in NMT. These multilingual Latin-based encodings in NMT, together with Byte Pair Embedding (BPE) allow us to better exploit their phonetic and orthographic as well as lexical similarities to improve the translation quality by projecting different but similar languages on the same orthographic-phonetic character space. We verify the proposed approach by demonstrating experiments on similar language pairs (Gujarati-Hindi, Marathi-Hindi, Nepali-Hindi, Maithili-Hindi, Punjabi-Hindi, and Urdu-Hindi) under low resource conditions. The proposed approach shows an improvement in a majority of cases, in one case as much as ~10 BLEU points compared to baseline techniques for similar language pairs. We also get up to ~1 BLEU points improvement on distant and zero-shot language pairs.

CLDec 29, 2021
Application of Hierarchical Temporal Memory Theory for Document Categorization

Deven Shah, Pinak Ghate, Manali Paranjape et al.

The current work intends to study the performance of the Hierarchical Temporal Memory(HTM) theory for automated classification of text as well as documents. HTM is a biologically inspired theory based on the working principles of the human neocortex. The current study intends to provide an alternative framework for document categorization using the Spatial Pooler learning algorithm in the HTM Theory. As HTM accepts only a stream of binary data as input, Latent Semantic Indexing(LSI) technique is used for extracting the top features from the input and converting them into binary format. The Spatial Pooler algorithm converts the binary input into sparse patterns with similar input text having overlapping spatial patterns making it easy for classifying the patterns into categories. The results obtained prove that HTM theory, although is in its nascent stages, performs at par with most of the popular machine learning based classifiers.

CVDec 22, 2021
EyePAD++: A Distillation-based approach for joint Eye Authentication and Presentation Attack Detection using Periocular Images

Prithviraj Dhar, Amit Kumar, Kirsten Kaplan et al.

A practical eye authentication (EA) system targeted for edge devices needs to perform authentication and be robust to presentation attacks, all while remaining compute and latency efficient. However, existing eye-based frameworks a) perform authentication and Presentation Attack Detection (PAD) independently and b) involve significant pre-processing steps to extract the iris region. Here, we introduce a joint framework for EA and PAD using periocular images. While a deep Multitask Learning (MTL) network can perform both the tasks, MTL suffers from the forgetting effect since the training datasets for EA and PAD are disjoint. To overcome this, we propose Eye Authentication with PAD (EyePAD), a distillation-based method that trains a single network for EA and PAD while reducing the effect of forgetting. To further improve the EA performance, we introduce a novel approach called EyePAD++ that includes training an MTL network on both EA and PAD data, while distilling the `versatility' of the EyePAD network through an additional distillation step. Our proposed methods outperform the SOTA in PAD and obtain near-SOTA performance in eye-to-eye verification, without any pre-processing. We also demonstrate the efficacy of EyePAD and EyePAD++ in user-to-user verification with PAD across network backbones and image quality.

DSDec 8, 2020
Algorithms for finding $k$ in $k$-means

Chiranjib Bhattacharyya, Ravindran Kannan, Amit Kumar

$k-$means Clustering requires as input the exact value of $k$, the number of clusters. Two challenges are open: (i) Is there a data-determined definition of $k$ which is provably correct and (ii) Is there a polynomial time algorithm to find $k$ from data ? This paper provides the first affirmative answers to both these questions. As common in the literature, we assume that the data admits an unknown Ground Truth (GT) clustering with cluster centers separated. This assumption alone is not sufficient to answer Yes to (i). We assume a novel, but natural second constraint called no tight sub-cluster (NTSC) which stipulates that no substantially large subset of a GT cluster can be "tighter" (in a sense we define) than the cluster. Our yes answer to (i) and (ii) are under these two deterministic assumptions. We also give polynomial time algorithm to identify $k$. Our algorithm relies on NTSC to peel off one cluster at a time by identifying points which are tightly packed. We are also able to show that our algorithm(s) apply to data generated by mixtures of Gaussians and more generally to mixtures of sub-Gaussian pdf's and hence are able to find the number of components of the mixture from data. To our knowledge, previous results for these specialized settings as well, assume generally that $k$ is given besides the data.

CVDec 3, 2020
EVRNet: Efficient Video Restoration on Edge Devices

Sachin Mehta, Amit Kumar, Fitsum Reda et al.

Video transmission applications (e.g., conferencing) are gaining momentum, especially in times of global health pandemic. Video signals are transmitted over lossy channels, resulting in low-quality received signals. To restore videos on recipient edge devices in real-time, we introduce an efficient video restoration network, EVRNet. EVRNet efficiently allocates parameters inside the network using alignment, differential, and fusion modules. With extensive experiments on video restoration tasks (deblocking, denoising, and super-resolution), we demonstrate that EVRNet delivers competitive performance to existing methods with significantly fewer parameters and MACs. For example, EVRNet has 260 times fewer parameters and 958 times fewer MACs than enhanced deformable convolution-based video restoration network (EDVR) for 4 times video super-resolution while its SSIM score is 0.018 less than EDVR. We also evaluated the performance of EVRNet under multiple distortions on unseen dataset to demonstrate its ability in modeling variable-length sequences under both camera and object motion.

AIMar 9, 2020
Integrating Acting, Planning and Learning in Hierarchical Operational Models

Sunandita Patra, James Mason, Amit Kumar et al.

We present new planning and learning algorithms for RAE, the Refinement Acting Engine. RAE uses hierarchical operational models to perform tasks in dynamically changing environments. Our planning procedure, UPOM, does a UCT-like search in the space of operational models in order to find a near-optimal method to use for the task and context at hand. Our learning strategies acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. Our experimental results show that UPOM and our learning strategies significantly improve RAE's performance in four test domains using two different metrics: efficiency and success ratio.

CVJul 30, 2019
Landmark Detection in Low Resolution Faces with Semi-Supervised Learning

Amit Kumar, Rama Chellappa

Landmark detection algorithms trained on high resolution images perform poorly on datasets containing low resolution images. This deters the performance of algorithms relying on quality landmarks, for example, face recognition. To the best of our knowledge, there does not exist any dataset consisting of low resolution face images along with their annotated landmarks, making supervised training infeasible. In this paper, we present a semi-supervised approach to predict landmarks on low resolution images by learning them from labeled high resolution images. The objective of this work is to show that predicting landmarks directly on low resolution images is more effective than the current practice of aligning images after rescaling or superresolution. In a two-step process, the proposed approach first learns to generate low resolution images by modeling the distribution of target low resolution images. In the second stage, the roles of generated images and real low resolution images are switched and the model learns to predict landmarks for real low resolution images from generated low resolution images. With extensive experimentation, we study the impact of each of the design choices and also show that prediction of landmarks directly on low resolution images improves the performance of important tasks such as face recognition in low resolution images.

CVFeb 19, 2018
Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment

Amit Kumar, Rama Chellappa

Heatmap regression has been used for landmark localization for quite a while now. Most of the methods use a very deep stack of bottleneck modules for heatmap classification stage, followed by heatmap regression to extract the keypoints. In this paper, we present a single dendritic CNN, termed as Pose Conditioned Dendritic Convolution Neural Network (PCD-CNN), where a classification network is followed by a second and modular classification network, trained in an end to end fashion to obtain accurate landmark points. Following a Bayesian formulation, we disentangle the 3D pose of a face image explicitly by conditioning the landmark estimation on pose, making it different from multi-tasking approaches. Extensive experimentation shows that conditioning on pose reduces the localization error by making it agnostic to face pose. The proposed model can be extended to yield variable number of landmark points and hence broadening its applicability to other datasets. Instead of increasing depth or width of the network, we train the CNN efficiently with Mask-Softmax Loss and hard sample mining to achieve upto $15\%$ reduction in error compared to state-of-the-art methods for extreme and medium pose face images from challenging datasets including AFLW, AFW, COFW and IBUG.

AIApr 28, 2017
Intelligent Personal Assistant with Knowledge Navigation

Amit Kumar, Rahul Dutta, Harbhajan Rai

An Intelligent Personal Agent (IPA) is an agent that has the purpose of helping the user to gain information through reliable resources with the help of knowledge navigation techniques and saving time to search the best content. The agent is also responsible for responding to the chat-based queries with the help of Conversation Corpus. We will be testing different methods for optimal query generation. To felicitate the ease of usage of the application, the agent will be able to accept the input through Text (Keyboard), Voice (Speech Recognition) and Server (Facebook) and output responses using the same method. Existing chat bots reply by making changes in the input, but we will give responses based on multiple SRT files. The model will learn using the human dialogs dataset and will be able respond human-like. Responses to queries about famous things (places, people, and words) can be provided using web scraping which will enable the bot to have knowledge navigation features. The agent will even learn from its past experiences supporting semi-supervised learning.

CVApr 6, 2017
A Convolution Tree with Deconvolution Branches: Exploiting Geometric Relationships for Single Shot Keypoint Detection

Amit Kumar, Rama Chellappa

Recently, Deep Convolution Networks (DCNNs) have been applied to the task of face alignment and have shown potential for learning improved feature representations. Although deeper layers can capture abstract concepts like pose, it is difficult to capture the geometric relationships among the keypoints in DCNNs. In this paper, we propose a novel convolution-deconvolution network for facial keypoint detection. Our model predicts the 2D locations of the keypoints and their individual visibility along with 3D head pose, while exploiting the spatial relationships among different keypoints. Different from existing approaches of modeling these relationships, we propose learnable transform functions which captures the relationships between keypoints at feature level. However, due to extensive variations in pose, not all of these relationships act at once, and hence we propose, a pose-based routing function which implicitly models the active relationships. Both transform functions and the routing function are implemented through convolutions in a multi-task framework. Our approach presents a single-shot keypoint detection method, making it different from many existing cascade regression-based methods. We also show that learning these relationships significantly improve the accuracy of keypoint detections for in-the-wild face images from challenging datasets such as AFW and AFLW.

CVFeb 16, 2017
KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors

Amit Kumar, Azadeh Alavi, Rama Chellappa

Keypoint detection is one of the most important pre-processing steps in tasks such as face modeling, recognition and verification. In this paper, we present an iterative method for Keypoint Estimation and Pose prediction of unconstrained faces by Learning Efficient H-CNN Regressors (KEPLER) for addressing the face alignment problem. Recent state of the art methods have shown improvements in face keypoint detection by employing Convolution Neural Networks (CNNs). Although a simple feed forward neural network can learn the mapping between input and output spaces, it cannot learn the inherent structural dependencies. We present a novel architecture called H-CNN (Heatmap-CNN) which captures structured global and local features and thus favors accurate keypoint detecion. HCNN is jointly trained on the visibility, fiducials and 3D-pose of the face. As the iterations proceed, the error decreases making the gradients small and thus requiring efficient training of DCNNs to mitigate this. KEPLER performs global corrections in pose and fiducials for the first four iterations followed by local corrections in the subsequent stage. As a by-product, KEPLER also provides 3D pose (pitch, yaw and roll) of the face accurately. In this paper, we show that without using any 3D information, KEPLER outperforms state of the art methods for alignment on challenging datasets such as AFW and AFLW.

CVMay 9, 2016
Unconstrained Still/Video-Based Face Verification with Deep Convolutional Neural Networks

Jun-Cheng Chen, Rajeev Ranjan, Swami Sankaranarayanan et al.

Over the last five years, methods based on Deep Convolutional Neural Networks (DCNNs) have shown impressive performance improvements for object detection and recognition problems. This has been made possible due to the availability of large annotated datasets, a better understanding of the non-linear mapping between input images and class labels as well as the affordability of GPUs. In this paper, we present the design details of a deep learning system for unconstrained face recognition, including modules for face detection, association, alignment and face verification. The quantitative performance evaluation is conducted using the IARPA Janus Benchmark A (IJB-A), the JANUS Challenge Set 2 (JANUS CS2), and the LFW dataset. The IJB-A dataset includes real-world unconstrained faces of 500 subjects with significant pose and illumination variations which are much harder than the Labeled Faces in the Wild (LFW) and Youtube Face (YTF) datasets. JANUS CS2 is the extended version of IJB-A which contains not only all the images/frames of IJB-A but also includes the original videos for evaluating the video-based face verification system. Some open issues regarding DCNNs for face verification problems are then discussed.

CVFeb 2, 2016
Head Pose Estimation of Occluded Faces using Regularized Regression

Amit Kumar, Rishabh Bindal, Soumya Indela et al.

This paper presents regression methods for estimation of head pose from occluded 2-D face images. The process primarily involves reconstructing a face from its occluded image, followed by classification. Typical methods for reconstruction assume that the pixel errors of the occluded regions are independent. However, such an assumption is not true in the case of occlusion, because of its inherent contiguous nature. Hence, we use nuclear norm as a metric that can describe well the structure of the error. We also use LASSO Regression based l1 - regularization to improve reconstruction. Next, we implement Nuclear Norm Regularized Regression (NR), and also our proposed method, for reconstruction and subsequent classification. Finally, we compare the performance of the methods in terms of accuracy of head pose estimation of occluded faces.

CVJan 29, 2016
Face Alignment by Local Deep Descriptor Regression

Amit Kumar, Rajeev Ranjan, Vishal Patel et al.

We present an algorithm for extracting key-point descriptors using deep convolutional neural networks (CNN). Unlike many existing deep CNNs, our model computes local features around a given point in an image. We also present a face alignment algorithm based on regression using these local descriptors. The proposed method called Local Deep Descriptor Regression (LDDR) is able to localize face landmarks of varying sizes, poses and occlusions with high accuracy. Deep Descriptors presented in this paper are able to uniquely and efficiently describe every pixel in the image and therefore can potentially replace traditional descriptors such as SIFT and HOG. Extensive evaluations on five publicly available unconstrained face alignment datasets show that our deep descriptor network is able to capture strong local features around a given landmark and performs significantly better than many competitive and state-of-the-art face alignment algorithms.

CVJan 28, 2016
Towards the Design of an End-to-End Automated System for Image and Video-based Recognition

Rama Chellappa, Jun-Cheng Chen, Rajeev Ranjan et al.

Over many decades, researchers working in object recognition have longed for an end-to-end automated system that will simply accept 2D or 3D image or videos as inputs and output the labels of objects in the input data. Computer vision methods that use representations derived based on geometric, radiometric and neural considerations and statistical and structural matchers and artificial neural network-based methods where a multi-layer network learns the mapping from inputs to class labels have provided competing approaches for image recognition problems. Over the last four years, methods based on Deep Convolutional Neural Networks (DCNNs) have shown impressive performance improvements on object detection/recognition challenge problems. This has been made possible due to the availability of large annotated data, a better understanding of the non-linear mapping between image and class labels as well as the affordability of GPUs. In this paper, we present a brief history of developments in computer vision and artificial neural networks over the last forty years for the problem of image-based recognition. We then present the design details of a deep learning system for end-to-end unconstrained face verification/recognition. Some open issues regarding DCNNs for object recognition problems are then discussed. We caution the readers that the views expressed in this paper are from the authors and authors only!

CRNov 2, 2015
Network Security Threats and Protection Models

Amit Kumar, Santosh Malhotra

In a brave new age of global connectivity and e-commerce, interconnections via networks have heightened, creating for both individuals and organizations, a state of complete dependence upon vulnerable systems for storage and transfer of information. Never before, have so many people had power in their own hands. The power to deface websites, access personal mail accounts, and worse more the potential to bring down entire governments, and financial corporations through openly documented software codes. This paper discusses the possible exploits on typical network components, it will cite real life scenarios, and propose practical measures that can be taken as safeguard. Then, it describes some of the key efforts done by the research community to prevent such attacks, mainly by using Firewall and Intrusion Detection Systems.