Jaewon Son

8.1DBMay 25

Timehash: Hierarchical Time Indexing for Efficient Business Hours Search

Jinoh Kim, Jaewon Son

Temporal range filtering is critical in large-scale search systems, particularly location-based services filtering businesses by operating hours. Traditional approaches suffer from poor query performance (scope filtering), index size explosion (minute-level indexing), or reduced precision (coarse-grained indexing). PostgreSQL TSRANGE with GiST indexing offers exact semantics but imposes P50 latencies of 15-224 ms at 100K-1M scale, prohibitive for interactive search, and cannot embed within inverted index pipelines. We present Timehash, a hierarchical time indexing algorithm achieving over 97% reduction in index size versus minute-level indexing while maintaining 100% precision. Timehash uses a flexible multi-resolution strategy that integrates seamlessly into inverted index infrastructure. Through analysis of 12.6 million records from a production location search service deployed for 18 months, we demonstrate a domain-informed hierarchy-selection methodology via boundary-distribution analysis, with cross-dataset validation on the Yelp Open Dataset (127K US/CA businesses), where the same 5-level hierarchy reduces total terms to 0.77% of the 1-minute baseline (vs. 2.17% on the production dataset). We evaluate Timehash against naive inverted approaches, PostgreSQL GiST, and a within-Elasticsearch BKD baseline. On Yelp within a single Elasticsearch deployment with matched indexing, Timehash achieves 1.14-2.17x lower P50 latency than native BKD on production-typical multi-predicate top-K workloads (K <= 100), with methods converging at large K where document materialization dominates. A five-level hierarchy (4h, 1h, 15m, 5m, 1m) reduces index terms to 9.6 per document, a 97.8% reduction and 46x compaction, with zero false positives and zero false negatives. Per-doc cost stays constant from 100K to 12.6M POIs while supporting break times, irregular schedules, and midnight-spanning ranges

CVMay 20, 2024Code

CSTA: CNN-based Spatiotemporal Attention for Video Summarization

Jaewon Son, Jaehun Park, Kwangsu Kim

Video summarization aims to generate a concise representation of a video, capturing its essential content and key moments while reducing its overall length. Although several methods employ attention mechanisms to handle long-term dependencies, they often fail to capture the visual significance inherent in frames. To address this limitation, we propose a CNN-based SpatioTemporal Attention (CSTA) method that stacks each feature of frames from a single video to form image-like frame representations and applies 2D CNN to these frame features. Our methodology relies on CNN to comprehend the inter and intra-frame relations and to find crucial attributes in videos by exploiting its ability to learn absolute positions within images. In contrast to previous work compromising efficiency by designing additional modules to focus on spatial importance, CSTA requires minimal computational overhead as it uses CNN as a sliding window. Extensive experiments on two benchmark datasets (SumMe and TVSum) demonstrate that our proposed approach achieves state-of-the-art performance with fewer MACs compared to previous methods. Codes are available at https://github.com/thswodnjs3/CSTA.

Jaewon Son

2 Papers