19.6SEApr 21
Revisiting Code Debloating with Ground Truth-based EvaluationMuhammad Bilal, Moiz Ali, Mohit Kumar et al.
Program debloating aims to remove unused code to reduce performance overhead, attack surfaces, and maintenance costs. Over time, debloating has evolved across multiple layers (container, library, and application), each building on the principles of application-level debloating. Despite its central role, application-level debloating continues to rely on imperfect proxies for measuring performance, such as test-case-driven evaluation for correctness, code size for runtime efficiency, and gadget count reduction for estimating security posture. While there is widespread skepticism about using such imperfect proxies, the community still lacks standardized methodologies or benchmarks to assess the true performance of application-level software debloating. This experience paper aims to address the gap. We revisit the foundations of application-level debloating through a ground-truth-based evaluation paradigm. Our analysis of eight state-of-the-art debloaters - Blade, Chisel, Cov, CovA, Lmcas, Trimmer, Occam, and Razor - uncovers insights previously unattainable through traditional evaluations. These tools collectively span the spectrum of source-to-source, IR-to-IR, and binary-to-binary transformation paradigms, characterizing a holistic reassessment across abstraction levels. Our analysis reveals that while dynamic analysis-based tools often remove up to 94% of code that should be retained, static analysis-based approaches exhibit the opposite behavior, showing high false retention rates due to coarse-grained dependency over-approximation. Additionally, static analyses may add code by introducing specialized variants of functions. False retentions and removals not only cause functional incorrectness but may also lead to systematic inconsistency, robustness failures, and exploitable vulnerabilities.
LGApr 3, 2023
On Mitigating the Utility-Loss in Differentially Private Learning: A new Perspective by a Geometrically Inspired Kernel ApproachMohit Kumar, Bernhard A. Moser, Lukas Fischer
Privacy-utility tradeoff remains as one of the fundamental issues of differentially private machine learning. This paper introduces a geometrically inspired kernel-based approach to mitigate the accuracy-loss issue in classification. In this approach, a representation of the affine hull of given data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads to a novel distance measure that hides privacy-sensitive information about individual data points and improves the privacy-utility tradeoff via significantly reducing the risk of membership inference attacks. The effectiveness of the approach is demonstrated through experiments on MNIST dataset, Freiburg groceries dataset, and a real biomedical dataset. It is verified that the approach remains computationally practical. The application of the approach to federated learning is considered and it is observed that the accuracy-loss due to data being distributed is either marginal or not significantly high.
LGApr 12, 2022
Membership-Mappings for Practical Secure Distributed Deep LearningMohit Kumar, Weiping Zhang, Lukas Fischer et al.
This study leverages the data representation capability of fuzzy based membership-mappings for practical secure distributed deep learning using fully homomorphic encryption. The impracticality issue of secure machine (deep) learning with fully homomorphic encrypted data, arising from large computational overhead, is addressed via applying fuzzy attributes. Fuzzy attributes are induced by globally convergent and robust variational membership-mappings based local deep models. Fuzzy attributes combine the local deep models in a robust and flexible manner such that the global model can be evaluated homomorphically in an efficient manner using a boolean circuit composed of bootstrapped binary gates. The proposed method, while preserving privacy in a distributed learning scenario, remains accurate, practical, and scalable. The method is evaluated through numerous experiments including demonstrations through MNIST dataset and Freiburg Groceries Dataset. Further, a biomedical application related to mental stress detection on individuals is considered.
LGNov 24, 2023
Comparative Analysis of Transformers for Modeling Tabular Data: A Casestudy using Industry Scale DatasetUsneek Singh, Piyush Arora, Shamika Ganesan et al.
We perform a comparative analysis of transformer-based models designed for modeling tabular data, specifically on an industry-scale dataset. While earlier studies demonstrated promising outcomes on smaller public or synthetic datasets, the effectiveness did not extend to larger industry-scale datasets. The challenges identified include handling high-dimensional data, the necessity for efficient pre-processing of categorical and numerical features, and addressing substantial computational requirements. To overcome the identified challenges, the study conducts an extensive examination of various transformer-based models using both synthetic datasets and the default prediction Kaggle dataset (2022) from American Express. The paper presents crucial insights into optimal data pre-processing, compares pre-training and direct supervised learning methods, discusses strategies for managing categorical and numerical features, and highlights trade-offs between computational resources and performance. Focusing on temporal financial data modeling, the research aims to facilitate the systematic development and deployment of transformer-based models in real-world scenarios, emphasizing scalability.
RONov 14, 2023
RoboSense At Edge: Detecting Slip, Crumple and Shape of the Object in Robotic Hand for TeleoprationsSudev Kumar Padhi, Mohit Kumar, Debanka Giri et al.
Slip and crumple detection is essential for performing robust manipulation tasks with a robotic hand (RH) like remote surgery. It has been one of the challenging problems in the robotics manipulation community. In this work, we propose a technique based on machine learning (ML) based techniques to detect the slip, and crumple as well as the shape of an object that is currently held in the robotic hand. We proposed ML model will detect the slip, crumple, and shape using the force/torque exerted and the angular positions of the actuators present in the RH. The proposed model would be integrated into the loop of a robotic hand(RH) and haptic glove(HG). This would help us to reduce the latency in case of teleoperation
LGJul 5, 2024
Geometrically Inspired Kernel Machines for Collaborative Learning Beyond Gradient DescentMohit Kumar, Alexander Valentinitsch, Magdalena Fuchs et al.
This paper develops a novel mathematical framework for collaborative learning by means of geometrically inspired kernel machines which includes statements on the bounds of generalisation and approximation errors, and sample complexity. For classification problems, this approach allows us to learn bounded geometric structures around given data points and hence solve the global model learning problem in an efficient way by exploiting convexity properties of the related optimisation problem in a Reproducing Kernel Hilbert Space (RKHS). In this way, we can reduce classification problems to determining the closest bounded geometric structure from a given data point. Further advantages that come with our solution is that our approach does not require clients to perform multiple epochs of local optimisation using stochastic gradient descent, nor require rounds of communication between client/server for optimising the global model. We highlight that numerous experiments have shown that the proposed method is a competitive alternative to the state-of-the-art.
17.8LGMay 1
Kernel Affine Hull Machines for Compute-Efficient Query-Side Semantic EncodingMohit Kumar, Somayeh Kargaran, Bernhard A. Moser et al.
Transformer-based semantic retrieval is highly effective, yet in many deployments the dominant cost lies in online query encoding rather than corpus indexing. We study the fixed-teacher query-adaptation problem and ask whether repeated neural inference can be replaced by a lightweight, analytically explicit estimator without degrading decision-relevant retrieval quality. We propose Kernel Affine Hull Machines (KAHMs), which map inexpensive lexical features into a frozen semantic embedding space by estimating prototype-mixture weights in a rigorously specified RKHS and refining prototypes via normalized least-mean-squares, yielding a transparent decomposition of encoding error into posterior-approximation, generalization, and teacher-noise components. On a controlled Austrian-law benchmark (5,000 queries; 84 laws; 10,762 units), KAHM attains the strongest teacher-space reconstruction among matched learned adapters (MSE 0.000091, R^2 0.9071, cosine 0.9536) and consistently leads rank-sensitive metrics, including mean reciprocal rank at 20 (MRR@20, the average inverse rank of the first relevant result within the top 20), Hit rate at 20 (Hit@20, the fraction of queries with at least one relevant result in the top 20), and Top-1 accuracy (the fraction of queries whose correct item is ranked first), with scores of 0.504, 0.694, and 0.411, respectively. It also reduces per-query latency by a factor of 8.5 relative to direct transformer encoding. These results demonstrate that, in fixed-teacher regimes, lightweight geometric estimators can substitute for online neural encoding, preserving retrieval performance while substantially improving efficiency and interpretability.
LGNov 30, 2025
Operator-Theoretic Framework for Gradient-Free Federated LearningMohit Kumar, Mathias Brucker, Alexander Valentinitsch et al.
Federated learning must address heterogeneity, strict communication and computation limits, and privacy while ensuring performance. We propose an operator-theoretic framework that maps the $L^2$-optimal solution into a reproducing kernel Hilbert space (RKHS) via a forward operator, approximates it using available data, and maps back with the inverse operator, yielding a gradient-free scheme. Finite-sample bounds are derived using concentration inequalities over operator norms, and the framework identifies a data-dependent hypothesis space with guarantees on risk, error, robustness, and approximation. Within this space we design efficient kernel machines leveraging the space folding property of Kernel Affine Hull Machines. Clients transfer knowledge via a scalar space folding measure, reducing communication and enabling a simple differentially private protocol: summaries are computed from noise-perturbed data matrices in one step, avoiding per-round clipping and privacy accounting. The induced global rule requires only integer minimum and equality-comparison operations per test point, making it compatible with fully homomorphic encryption (FHE). Across four benchmarks, the gradient-free FL method with fixed encoder embeddings matches or outperforms strong gradient-based fine-tuning, with gains up to 23.7 points. In differentially private experiments, kernel smoothing mitigates accuracy loss in high-privacy regimes. The global rule admits an FHE realization using $Q \times C$ encrypted minimum and $C$ equality-comparison operations per test point, with operation-level benchmarks showing practical latencies. Overall, the framework provides provable guarantees with low communication, supports private knowledge transfer via scalar summaries, and yields an FHE-compatible prediction rule offering a mathematically grounded alternative to gradient-based federated learning under heterogeneity.
MLDec 12, 2024
$(ε, δ)$-Differentially Private Partial Least Squares RegressionRamin Nikzad-Langerodi, Mohit Kumar, Du Nguyen Duy et al.
As data-privacy requirements are becoming increasingly stringent and statistical models based on sensitive data are being deployed and used more routinely, protecting data-privacy becomes pivotal. Partial Least Squares (PLS) regression is the premier tool for building such models in analytical chemistry, yet it does not inherently provide privacy guarantees, leaving sensitive (training) data vulnerable to privacy attacks. To address this gap, we propose an $(ε, δ)$-differentially private PLS (edPLS) algorithm, which integrates well-studied and theoretically motivated Gaussian noise-adding mechanisms into the PLS algorithm to ensure the privacy of the data underlying the model. Our approach involves adding carefully calibrated Gaussian noise to the outputs of four key functions in the PLS algorithm: the weights, scores, $X$-loadings, and $Y$-loadings. The noise variance is determined based on the global sensitivity of each function, ensuring that the privacy loss is controlled according to the $(ε, δ)$-differential privacy framework. Specifically, we derive the sensitivity bounds for each function and use these bounds to calibrate the noise added to the model components. Experimental results demonstrate that edPLS effectively renders privacy attacks, aimed at recovering unique sources of variability in the training data, ineffective. Application of edPLS to the NIR corn benchmark dataset shows that the root mean squared error of prediction (RMSEP) remains competitive even at strong privacy levels (i.e., $ε=1$), given proper pre-processing of the corresponding spectra. These findings highlight the practical utility of edPLS in creating privacy-preserving multivariate calibrations and for the analysis of their privacy-utility trade-offs.
AIFeb 8, 2022
Learning MAX-SAT from Contextual Examples for Combinatorial OptimisationMohit Kumar, Samuel Kolb, Stefano Teso et al.
Combinatorial optimisation problems are ubiquitous in artificial intelligence. Designing the underlying models, however, requires substantial expertise, which is a limiting factor in practice. The models typically consist of hard and soft constraints, or combine hard constraints with an objective function. We introduce a novel setting for learning combinatorial optimisation problems from contextual examples. These positive and negative examples show - in a particular context - whether the solutions are good enough or not. We develop our framework using the MAX-SAT formalism as it is simple yet powerful setting having these features. We study the learnability of MAX-SAT models. Our theoretical results show that high-quality MAX-SAT models can be learned from contextual examples in the realisable and agnostic settings, as long as the data satisfies an intuitive "representativeness" condition. We also contribute two implementations based on our theoretical results: one leverages ideas from syntax-guided synthesis while the other makes use of stochastic local search techniques. The two implementations are evaluated by recovering synthetic and benchmark models from contextual examples. The experimental results support our theoretical analysis, showing that MAX-SAT models can be learned from contextual examples. Among the two implementations, the stochastic local search learner scales much better than the syntax-guided implementation while providing comparable or better models.
LGAug 2, 2021
Few-shot calibration of low-cost air pollution (PM2.5) sensors using meta-learningKalpit Yadav, Vipul Arora, Sonu Kumar Jha et al.
Low-cost particulate matter sensors are transforming air quality monitoring because they have lower costs and greater mobility as compared to reference monitors. Calibration of these low-cost sensors requires training data from co-deployed reference monitors. Machine Learning based calibration gives better performance than conventional techniques, but requires a large amount of training data from the sensor, to be calibrated, co-deployed with a reference monitor. In this work, we propose novel transfer learning methods for quick calibration of sensors with minimal co-deployment with reference monitors. Transfer learning utilizes a large amount of data from other sensors along with a limited amount of data from the target sensor. Our extensive experimentation finds the proposed Model-Agnostic- Meta-Learning (MAML) based transfer learning method to be the most effective over other competitive baselines.
AIJul 15, 2021
Learning Mixed-Integer Linear Programs from Contextual ExamplesMohit Kumar, Samuel Kolb, Luc De Raedt et al.
Mixed-integer linear programs (MILPs) are widely used in artificial intelligence and operations research to model complex decision problems like scheduling and routing. Designing such programs however requires both domain and modelling expertise. In this paper, we study the problem of acquiring MILPs from contextual examples, a novel and realistic setting in which examples capture solutions and non-solutions within a specific context. The resulting learning problem involves acquiring continuous parameters -- namely, a cost vector and a feasibility polytope -- but has a distinctly combinatorial flavor. To solve this complex problem, we also contribute MISSLE, an algorithm for learning MILPs from contextual examples. MISSLE uses a variant of stochastic local search that is guided by the gradient of a continuous surrogate loss function. Our empirical evaluation on synthetic data shows that MISSLE acquires better MILPs faster than alternatives based on stochastic local search and gradient descent.
LGJun 6, 2021
Information Theoretic Evaluation of Privacy-Leakage, Interpretability, and Transferability for Trustworthy AIMohit Kumar, Bernhard A. Moser, Lukas Fischer et al.
In order to develop machine learning and deep learning models that take into account the guidelines and principles of trustworthy AI, a novel information theoretic trustworthy AI framework is introduced. A unified approach to "privacy-preserving interpretable and transferable learning" is considered for studying and optimizing the tradeoffs between privacy, interpretability, and transferability aspects. A variational membership-mapping Bayesian model is used for the analytical approximations of the defined information theoretic measures for privacy-leakage, interpretability, and transferability. The approach consists of approximating the information theoretic measures via maximizing a lower-bound using variational optimization. The study presents a unified information theoretic approach to study different aspects of trustworthy AI in a rigorous analytical manner. The approach is demonstrated through numerous experiments on benchmark datasets and a real-world biomedical application concerned with the detection of mental stress on individuals using heart rate variability analysis.
LGMay 10, 2021
Differentially Private Transferrable Deep Learning with Membership-MappingsMohit Kumar
This paper considers the problem of differentially private semi-supervised transfer and multi-task learning. The notion of \emph{membership-mapping} has been developed using measure theory basis to learn data representation via a fuzzy membership function. An alternative conception of deep autoencoder, referred to as \emph{Conditionally Deep Membership-Mapping Autoencoder (CDMMA)}, is considered for transferrable deep learning. Under practice-oriented settings, an analytical solution for the learning of CDMMA can be derived by means of variational optimization. The paper proposes a transfer and multi-task learning approach that combines CDMMA with a tailored noise adding mechanism to achieve a given level of privacy-loss bound with the minimum perturbation of the data. Numerous experiments were carried out using MNIST, USPS, Office, and Caltech256 datasets to verify the competitive robust performance of the proposed methodology.
LGApr 14, 2021
Membership-Mappings for Data Representation Learning: Measure Theoretic ConceptualizationMohit Kumar, Bernhard A. Moser, Lukas Fischer et al.
A fuzzy theoretic analytical approach was recently introduced that leads to efficient and robust models while addressing automatically the typical issues associated to parametric deep models. However, a formal conceptualization of the fuzzy theoretic analytical deep models is still not available. This paper introduces using measure theoretic basis the notion of \emph{membership-mapping} for representing data points through attribute values (motivated by fuzzy theory). A property of the membership-mapping, that can be exploited for data representation learning, is of providing an interpolation on the given data points in the data space. An analytical approach to the variational learning of a membership-mappings based data representation model is considered.
LGSep 21, 2020
Machine Guides, Human Supervises: Interactive Learning with Global ExplanationsTeodora Popordanoska, Mohit Kumar, Stefano Teso
We introduce explanatory guided learning (XGL), a novel interactive learning strategy in which a machine guides a human supervisor toward selecting informative examples for a classifier. The guidance is provided by means of global explanations, which summarize the classifier's behavior on different regions of the instance space and expose its flaws. Compared to other explanatory interactive learning strategies, which are machine-initiated and rely on local explanations, XGL is designed to be robust against cases in which the explanations supplied by the machine oversell the classifier's quality. Moreover, XGL leverages global explanations to open up the black-box of human-initiated interaction, enabling supervisors to select informative examples that challenge the learned model. By drawing a link to interactive machine teaching, we show theoretically that global explanations are a viable approach for guiding supervisors. Our simulations show that explanatory guided learning avoids overselling the model's quality and performs comparably or better than machine- and human-initiated interactive learning strategies in terms of model quality.
AIJul 20, 2020
Toward Machine-Guided, Human-Initiated Explanatory Interactive LearningTeodora Popordanoska, Mohit Kumar, Stefano Teso
Recent work has demonstrated the promise of combining local explanations with active learning for understanding and supervising black-box models. Here we show that, under specific conditions, these algorithms may misrepresent the quality of the model being learned. The reason is that the machine illustrates its beliefs by predicting and explaining the labels of the query instances: if the machine is unaware of its own mistakes, it may end up choosing queries on which it performs artificially well. This biases the "narrative" presented by the machine to the user.We address this narrative bias by introducing explanatory guided learning, a novel interactive learning strategy in which: i) the supervisor is in charge of choosing the query instances, while ii) the machine uses global explanations to illustrate its overall behavior and to guide the supervisor toward choosing challenging, informative instances. This strategy retains the key advantages of explanatory interaction while avoiding narrative bias and compares favorably to active learning in terms of sample complexity. An initial empirical evaluation with a clustering-based prototype highlights the promise of our approach.
IRAug 1, 2018
Did We Get It Right? Predicting Query Performance in E-commerce SearchRohan Kumar, Mohit Kumar, Neil Shah et al.
In this paper, we address the problem of evaluating whether results served by an e-commerce search engine for a query are good or not. This is a critical question in evaluating any e-commerce search engine. While this question is traditionally answered using simple metrics like query click-through rate (CTR), we observe that in e-commerce search, such metrics can be misleading. Upon inspection, we find cases where CTR is high but the results are poor and vice versa. Similar cases exist for other metrics like time to click which are often also used for evaluating search engines. We aim to learn the quality of the results served by the search engine based on users' interactions with the results. Although this problem has been studied in the web search context, this is the first study for e-commerce search, to the best of our knowledge. Despite certain commonalities with evaluating web search engines, there are several major differences such as underlying reasons for search failure, and availability of rich user interaction data with products (e.g. adding a product to the cart). We study large-scale user interaction logs from Flipkart's search engine, analyze behavioral patterns and build models to classify queries based on user behavior signals. We demonstrate the feasibility and efficacy of such models in accurately predicting query performance. Our classifier is able to achieve an average AUC of 0.75 on a held-out test set.
AIMay 29, 2018
Automating Personnel Rostering by Learning Constraints Using TensorsMohit Kumar, Stefano Teso, Luc De Raedt
Many problems in operations research require that constraints be specified in the model. Determining the right constraints is a hard and laborsome task. We propose an approach to automate this process using artificial intelligence and machine learning principles. So far there has been only little work on learning constraints within the operations research community. We focus on personnel rostering and scheduling problems in which there are often past schedules available and show that it is possible to automatically learn constraints from such examples. To realize this, we adapted some techniques from the constraint programming community and we have extended them in order to cope with multidimensional examples. The method uses a tensor representation of the example, which helps in capturing the dimensionality as well as the structure of the example, and applies tensor operations to find the constraints that are satisfied by the example. To evaluate the proposed algorithm, we used constraints from the Nurse Rostering Competition and generated solutions that satisfy these constraints; these solutions were then used as examples to learn constraints. Experiments demonstrate that the proposed algorithm is capable of producing human readable constraints that capture the underlying characteristics of the examples.
MLNov 22, 2017
Decomposition Strategies for Constructive Preference ElicitationPaolo Dragone, Stefano Teso, Mohit Kumar et al.
We tackle the problem of constructive preference elicitation, that is the problem of learning user preferences over very large decision problems, involving a combinatorial space of possible outcomes. In this setting, the suggested configuration is synthesized on-the-fly by solving a constrained optimization problem, while the preferences are learned itera tively by interacting with the user. Previous work has shown that Coactive Learning is a suitable method for learning user preferences in constructive scenarios. In Coactive Learning the user provides feedback to the algorithm in the form of an improvement to a suggested configuration. When the problem involves many decision variables and constraints, this type of interaction poses a significant cognitive burden on the user. We propose a decomposition technique for large preference-based decision problems relying exclusively on inference and feedback over partial configurations. This has the clear advantage of drastically reducing the user cognitive load. Additionally, part-wise inference can be (up to exponentially) less computationally demanding than inference over full configurations. We discuss the theoretical implications of working with parts and present promising empirical results on one synthetic and two realistic constructive problems.
SIMar 30, 2017
FairJudge: Trustworthy User Prediction in Rating PlatformsSrijan Kumar, Bryan Hooi, Disha Makhija et al.
Rating platforms enable large-scale collection of user opinion about items (products, other users, etc.). However, many untrustworthy users give fraudulent ratings for excessive monetary gains. In the paper, we present FairJudge, a system to identify such fraudulent users. We propose three metrics: (i) the fairness of a user that quantifies how trustworthy the user is in rating the products, (ii) the reliability of a rating that measures how reliable the rating is, and (iii) the goodness of a product that measures the quality of the product. Intuitively, a user is fair if it provides reliable ratings that are close to the goodness of the product. We formulate a mutually recursive definition of these metrics, and further address cold start problems and incorporate behavioral properties of users and products in the formulation. We propose an iterative algorithm, FairJudge, to predict the values of the three metrics. We prove that FairJudge is guaranteed to converge in a bounded number of iterations, with linear time complexity. By conducting five different experiments on five rating platforms, we show that FairJudge significantly outperforms nine existing algorithms in predicting fair and unfair users. We reported the 100 most unfair users in the Flipkart network to their review fraud investigators, and 80 users were correctly identified (80% accuracy). The FairJudge algorithm is already being deployed at Flipkart.
AINov 19, 2015
BIRDNEST: Bayesian Inference for Ratings-Fraud DetectionBryan Hooi, Neil Shah, Alex Beutel et al.
Review fraud is a pervasive problem in online commerce, in which fraudulent sellers write or purchase fake reviews to manipulate perception of their products and services. Fake reviews are often detected based on several signs, including 1) they occur in short bursts of time; 2) fraudulent user accounts have skewed rating distributions. However, these may both be true in any given dataset. Hence, in this paper, we propose an approach for detecting fraudulent reviews which combines these 2 approaches in a principled manner, allowing successful detection even when one of these signs is not present. To combine these 2 approaches, we formulate our Bayesian Inference for Rating Data (BIRD) model, a flexible Bayesian model of user rating behavior. Based on our model we formulate a likelihood-based suspiciousness metric, Normalized Expected Surprise Total (NEST). We propose a linear-time algorithm for performing Bayesian inference using our model and computing the metric. Experiments on real data show that BIRDNEST successfully spots review fraud in large, real-world graphs: the 50 most suspicious users of the Flipkart platform flagged by our algorithm were investigated and all identified as fraudulent by domain experts at Flipkart.
SIOct 19, 2015
EdgeCentric: Anomaly Detection in Edge-Attributed NetworksNeil Shah, Alex Beutel, Bryan Hooi et al.
Given a network with attributed edges, how can we identify anomalous behavior? Networks with edge attributes are commonplace in the real world. For example, edges in e-commerce networks often indicate how users rated products and services in terms of number of stars, and edges in online social and phonecall networks contain temporal information about when friendships were formed and when users communicated with each other -- in such cases, edge attributes capture information about how the adjacent nodes interact with other entities in the network. In this paper, we aim to utilize exactly this information to discern suspicious from typical node behavior. Our work has a number of notable contributions, including (a) formulation: while most other graph-based anomaly detection works use structural graph connectivity or node information, we focus on the new problem of leveraging edge information, (b) methodology: we introduce EdgeCentric, an intuitive and scalable compression-based approach for detecting edge-attributed graph anomalies, and (c) practicality: we show that EdgeCentric successfully spots numerous such anomalies in several large, edge-attributed real-world graphs, including the Flipkart e-commerce graph with over 3 million product reviews between 1.1 million users and 545 thousand products, where it achieved 0.87 precision over the top 100 results.