Truth Finding on the Deep Web: Is the Problem Solved?
It addresses data quality issues on the Web for users relying on accurate information, but is incremental as it applies existing methods to new domains.
The paper studied truthfulness in Deep Web data for Stock and Flight domains, finding significant inconsistencies and low accuracy across sources, and applied state-of-the-art data fusion methods to analyze their effectiveness and suggest future research directions.
The amount of useful information available on the Web has been growing at a dramatic pace in recent years and people rely more and more on the Web to fulfill their information needs. In this paper, we study truthfulness of Deep Web data in two domains where we believed data are fairly clean and data quality is important to people's lives: {\em Stock} and {\em Flight}. To our surprise, we observed a large amount of inconsistency on data from different sources and also some sources with quite low accuracy. We further applied on these two data sets state-of-the-art {\em data fusion} methods that aim at resolving conflicts and finding the truth, analyzed their strengths and limitations, and suggested promising research directions. We wish our study can increase awareness of the seriousness of conflicting data on the Web and in turn inspire more research in our community to tackle this problem.