CRDec 3, 2014
Hashing Pursuit for Online Identification of Heavy-Hitters in High-Speed Network StreamsMichael Kallitsis, Stilian Stoev, George Michailidis
Distributed Denial of Service (DDoS) attacks have become more prominent recently, both in frequency of occurrence, as well as magnitude. Such attacks render key Internet resources unavailable and disrupt its normal operation. It is therefore of paramount importance to quickly identify malicious Internet activity. The DDoS threat model includes characteristics such as: (i) heavy-hitters that transmit large volumes of traffic towards "victims", (ii) persistent-hitters that send traffic, not necessarily large, to specific destinations to be used as attack facilitators, (iii) host and port scanning for compiling lists of un-secure servers to be used as attack amplifiers, etc. This conglomeration of problems motivates the development of space/time efficient summaries of data traffic streams that can be used to identify heavy-hitters associated with the above attack vectors. This paper presents a hashing-based framework and fast algorithms that take into account the large-dimensionality of the incoming network stream and can be employed to quickly identify the culprits. The algorithms and data structures proposed provide a synopsis of the network stream that is not taxing to fast-memory, and can be efficiently implemented in hardware due to simple bit-wise operations. The methods are evaluated using real-world Internet data from a large academic network.
MLDec 2, 2013
Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression ProfilesAli Shojaie, Alexandra Jauhiainen, Michael Kallitsis et al.
Reconstructing transcriptional regulatory networks is an important task in functional genomics. Data obtained from experiments that perturb genes by knockouts or RNA interference contain useful information for addressing this reconstruction problem. However, such data can be limited in size and/or are expensive to acquire. On the other hand, observational data of the organism in steady state (e.g. wild-type) are more readily available, but their informational content is inadequate for the task at hand. We develop a computational approach to appropriately utilize both data sources for estimating a regulatory network. The proposed approach is based on a three-step algorithm to estimate the underlying directed but cyclic network, that uses as input both perturbation screens and steady state gene expression data. In the first step, the algorithm determines causal orderings of the genes that are consistent with the perturbation data, by combining an exhaustive search method with a fast heuristic that in turn couples a Monte Carlo technique with a fast search algorithm. In the second step, for each obtained causal ordering, a regulatory network is estimated using a penalized likelihood based method, while in the third step a consensus network is constructed from the highest scored ones. Extensive computational experiments show that the algorithm performs well in reconstructing the underlying network and clearly outperforms competing approaches that rely only on a single data source. Further, it is established that the algorithm produces a consistent estimate of the regulatory network.
SYJun 24, 2013
A State-Space Approach for Optimal Traffic Monitoring via Network Flow SamplingMichael Kallitsis, Stilian Stoev, George Michailidis
The robustness and integrity of IP networks require efficient tools for traffic monitoring and analysis, which scale well with traffic volume and network size. We address the problem of optimal large-scale flow monitoring of computer networks under resource constraints. We propose a stochastic optimization framework where traffic measurements are done by exploiting the spatial (across network links) and temporal relationship of traffic flows. Specifically, given the network topology, the state-space characterization of network flows and sampling constraints at each monitoring station, we seek an optimal packet sampling strategy that yields the best traffic volume estimation for all flows of the network. The optimal sampling design is the result of a concave minimization problem; then, Kalman filtering is employed to yield a sequence of traffic estimates for each network flow. We evaluate our algorithm using real-world Internet2 data.