Tao Liu

h-index32

4papers

1,195citations

Novelty55%

AI Score45

Ranked #45,068 of 194,257 authors (top 23%)#9,030 in CL (top 29%)

4 Papers

8.6DSJun 3, 2022Code

Falconn++: A Locality-sensitive Filtering Approach for Approximate Nearest Neighbor Search

Ninh Pham, Tao Liu

We present Falconn++, a novel locality-sensitive filtering approach for approximate nearest neighbor search on angular distance. Falconn++ can filter out potential far away points in any hash bucket \textit{before} querying, which results in higher quality candidates compared to other hashing-based solutions. Theoretically, Falconn++ asymptotically achieves lower query time complexity than Falconn, an optimal locality-sensitive hashing scheme on angular distance. Empirically, Falconn++ achieves higher recall-speed tradeoffs than Falconn on many real-world data sets. Falconn++ is also competitive with HNSW, an efficient representative of graph-based solutions on high search recall regimes.

32.8CVJan 23, 2025Code

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Tao Liu, Kai Wang, Senmao Li et al.

Text-to-image generation models can create high-quality images from input prompts. However, they struggle to support the consistent generation of identity-preserving requirements for storytelling. Existing approaches to this problem typically require extensive training in large datasets or additional modifications to the original model architectures. This limits their applicability across different domains and diverse diffusion model configurations. In this paper, we first observe the inherent capability of language models, coined context consistency, to comprehend identity through context with a single prompt. Drawing inspiration from the inherent context consistency, we propose a novel training-free method for consistent text-to-image (T2I) generation, termed "One-Prompt-One-Story" (1Prompt1Story). Our approach 1Prompt1Story concatenates all prompts into a single input for T2I diffusion models, initially preserving character identities. We then refine the generation process using two novel techniques: Singular-Value Reweighting and Identity-Preserving Cross-Attention, ensuring better alignment with the input description for each frame. In our experiments, we compare our method against various existing consistent T2I generation approaches to demonstrate its effectiveness through quantitative metrics and qualitative assessments. Code is available at https://github.com/byliutao/1Prompt1Story.

3.8CRSep 24, 2021

Finding Taint-Style Vulnerabilities in Linux-based Embedded Firmware with SSE-based Alias Analysis

Kai Cheng, Tao Liu, Le Guan et al.

Although the importance of using static analysis to detect taint-style vulnerabilities in Linux-based embedded firmware is widely recognized, existing approaches are plagued by three major limitations. (a) Approaches based on symbolic execution may miss alias information and therefore suffer from a high false-negative rate. (b) Approaches based on VSA (value set analysis) often provide an over-approximate pointer range. As a result, many false positives could be produced. (c) Existing work for detecting taint-style vulnerability does not consider indirect call resolution, whereas indirect calls are frequently used in Internet-facing embedded devices. As a result, many false negatives could be produced. In this work, we propose a precise demand-driven flow-, context- and field-sensitive alias analysis approach. Based on this new approach, we also design a novel indirect call resolution scheme. Combined with sanitization rule checking, our solution discovers taint-style vulnerabilities by static taint analysis. We implemented our idea with a prototype called EmTaint and evaluated it against 35 real-world embedded firmware samples from six popular vendors. EmTaint discovered at least 192 bugs, including 41 n-day bugs and 151 0-day bugs. At least 115 CVE/PSV numbers have been allocated from a subset of the reported vulnerabilities at the time of writing. Compared to state-of-the-art tools such as KARONTE and SaTC, EmTaint found significantly more bugs on the same dataset in less time.

32.7CLMay 12, 2018Code

Analogical Reasoning on Chinese Morphological and Semantic Relations

Shen Li, Zhe Zhao, Renfen Hu et al.

Analogical reasoning is effective in capturing linguistic regularities. This paper proposes an analogical reasoning task on Chinese. After delving into Chinese lexical knowledge, we sketch 68 implicit morphological relations and 28 explicit semantic relations. A big and balanced dataset CA8 is then built for this task, including 17813 questions. Furthermore, we systematically explore the influences of vector representations, context features, and corpora on analogical reasoning. With the experiments, CA8 is proved to be a reliable benchmark for evaluating Chinese word embeddings.