DC IRApr 25, 2018

Giving Text Analytics a Boost

Raphael Polig, Kubilay Atasu, Laura Chiticariu, Christoph Hagleitner, H. Peter Hofstee, Frederick R. Reiss, Eva Sitaridi, Huaiyu Zhu

arXiv:1806.01103v114 citations

Originality Incremental advance

AI Analysis

This addresses the problem of handling 'Big Data' in text analytics for users of IBM's SystemT, though it is incremental as it builds on existing compilation and communication methods.

The paper tackled the inefficiency of traditional server architectures in analyzing large-scale textual data with IBM's SystemT software, achieving an order of magnitude improvement in throughput rates for information extraction queries using a streaming hardware accelerator.

The amount of textual data has reached a new scale and continues to grow at an unprecedented rate. IBM's SystemT software is a powerful text analytics system, which offers a query-based interface to reveal the valuable information that lies within these mounds of data. However, traditional server architectures are not capable of analyzing the so-called "Big Data" in an efficient way, despite the high memory bandwidth that is available. We show that by using a streaming hardware accelerator implemented in reconfigurable logic, the throughput rates of the SystemT's information extraction queries can be improved by an order of magnitude. We present how such a system can be deployed by extending SystemT's existing compilation flow and by using a multi-threaded communication interface that can efficiently use the bandwidth of the accelerator.

View on arXiv PDF

Similar