Helen Xu

2.6DBJun 15

C^2: Cache-Conscious Succinct Tries with Adaptive Unary Path Compression

Kepan Zhang, Tiancheng Zhao, Helen Xu

Succinct tries are powerful string dictionaries because of their low memory footprint and fast query performance. However, existing succinct trie implementations face two key challenges to spatial locality: 1) they incur unnecessary cache misses during queries, especially during trie navigation operations, and 2) they waste significant space when the data contains many unary paths. We propose C^2, a set of two techniques: C_1 introduces a more cache-friendly layout for the \bv underlying succinct tries, and C_2 compresses redundant unary paths. We thoroughly redesign three state-of-the-art succinct tries: FST, CoCo-trie, and Marisa, producing C^2-FST, C^2-CoCo, and C^2-Marisa. Experiments on six diverse datasets show that the C_1 optimization improves query performance by 1.58x, 1.12x, and 1.42x, respectively, compared to the original FST, CoCo-trie, and Marisa. Furthermore, the C_2 optimization achieves a 1.3x smaller memory footprint on average. The succinct tries optimized with both aspects of C^2 achieve better space-time tradeoffs than their original versions and other state-of-the-art succinct tries, while using significantly less space than non-succinct tries like ART and C-ART.

3.2CRNov 30, 2015

Tracking Network Events with Write Optimized Data Structures: The Design and Implementation of TWIAD: The Write-Optimized IP Address Database

Nolan Donoghue, Bridger Hahn, Helen Xu et al.

Access to network traffic records is an integral part of recognizing and addressing network security breaches. Even with the increasing sophistication of network attacks, basic network events such as connections between two IP addresses play an important role in any network defense. Given the duration of current attacks, long-term data archival is critical but typically very little of the data is ever accessed. Previous work has provided tools and identified the need to trace connections. However, traditional databases raise performance concerns as they are optimized for querying rather than ingestion. The study of write-optimized data structures (WODS) is a new and growing field that provides a novel approach to traditional storage structures (e.g., B-trees). WODS trade minor degradations in query performance for significant gains in the ability to quickly insert more data elements, typically on the order of 10 to 100 times more inserts per second. These efficient, out-of-memory data structures can play a critical role in enabling robust, long-term tracking of network events. In this paper, we present TWIAD, the Write-optimized IP Address Database. TWIAD uses a write-optimized B-tree known as a B ε tree to track all IP address connections in a network traffic stream. Our initial implementation focuses on utilizing lower cost hardware, demonstrating that basic long-term tracking can be done without advanced equipment. We tested TWIAD on a modest desktop system and showed a sustained ingestion rate of about 20,000 inserts per second.

Helen Xu

2 Papers