13.1CVAug 23, 2023
Sign Language Translation with Iterative PrototypeHuijie Yao, Wengang Zhou, Hao Feng et al.
This paper presents IP-SLT, a simple yet effective framework for sign language translation (SLT). Our IP-SLT adopts a recurrent structure and enhances the semantic representation (prototype) of the input sign language video via an iterative refinement manner. Our idea mimics the behavior of human reading, where a sentence can be digested repeatedly, till reaching accurate understanding. Technically, IP-SLT consists of feature extraction, prototype initialization, and iterative prototype refinement. The initialization module generates the initial prototype based on the visual feature extracted by the feature extraction module. Then, the iterative refinement module leverages the cross-attention mechanism to polish the previous prototype by aggregating it with the original video feature. Through repeated refinement, the prototype finally converges to a more stable and accurate state, leading to a fluent and appropriate translation. In addition, to leverage the sequential dependence of prototypes, we further propose an iterative distillation loss to compress the knowledge of the final iteration into previous ones. As the autoregressive decoding process is executed only once in inference, our IP-SLT is ready to improve various SLT systems with acceptable overhead. Extensive experiments are conducted on public benchmarks to demonstrate the effectiveness of the IP-SLT.
TabPedia: Towards Comprehensive Visual Table Understanding with Concept SynergyWeichao Zhao, Hao Feng, Qi Liu et al.
Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting in modal isolation and intricate workflows. In this paper, we present a novel large vision-language model, TabPedia, equipped with a concept synergy mechanism. In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts. This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering, by leveraging the capabilities of large language models (LLMs). Moreover, the concept synergy mechanism enables table perception-related and comprehension-related tasks to work in harmony, as they can effectively leverage the needed clues from the corresponding source perception embeddings. Furthermore, to better evaluate the VTU task in real-world scenarios, we establish a new and comprehensive table VQA benchmark, ComTQA, featuring approximately 9,000 QA pairs. Extensive quantitative and qualitative experiments on both table perception and comprehension tasks, conducted across various public benchmarks, validate the effectiveness of our TabPedia. The superior performance further confirms the feasibility of using LLMs for understanding visual tables when all concepts work in synergy. The benchmark ComTQA has been open-sourced at https://huggingface.co/datasets/ByteDance/ComTQA. The source code and model also have been released athttps://github.com/zhaowc-ustc/TabPedia.
21.1LGOct 20, 2024
EPIC: Efficient Position-Independent Caching for Serving Large Language ModelsJunhao Hu, Wenrui Huang, Weidong Wang et al.
Large Language Models (LLMs) show great capabilities in a wide range of applications, but serving them efficiently becomes increasingly challenging as requests (prompts) become more complex. Context caching improves serving performance by reusing Key-Value (KV) vectors, the intermediate representations of tokens that are repeated across requests. However, existing context caching requires exact prefix matches across requests, limiting reuse cases in settings such as few-shot learning and retrieval-augmented generation, where immutable content (e.g., documents) remains unchanged across requests but is preceded by varying prefixes. Position-Independent Caching (PIC) addresses this issue by enabling modular reuse of the KV vectors regardless of prefixes. We formalize PIC and advance prior work by introducing EPIC, a serving system incorporating our new LegoLink algorithm, which mitigates the inappropriate "attention sink" effect at every document beginning, to maintain accuracy with minimal computation. Experiments show that EPIC achieves up to 8x improvements in Time-To-First-Token (TTFT) and 7x throughput gains over existing systems, with negligible or no accuracy loss.
2.0CVJun 27, 2024
RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center DeviationZhaokang Liao, Hao Feng, Shaokai Liu et al.
Fisheye images are categorized fisheye into central and deviated based on the optical center position. Existing rectification methods are limited to central fisheye images, while this paper proposes a novel method that extends to deviated fisheye image rectification. The challenge lies in the variant global distortion distribution pattern caused by the random optical center position. To address this challenge, we propose a distortion vector map (DVM) that measures the degree and direction of local distortion. By learning the DVM, the model can independently identify local distortions at each pixel without relying on global distortion patterns. The model adopts a pre-training and fine-tuning training paradigm. In the pre-training stage, it predicts the distortion vector map and perceives the local distortion features of each pixel. In the fine-tuning stage, it predicts a pixel-wise flow map for deviated fisheye image rectification. We also propose a data augmentation method mixing central, deviated, and distorted-free images. Such data augmentation promotes the model performance in rectifying both central and deviated fisheye images, compared with models trained on single-type fisheye images. Extensive experiments demonstrate the effectiveness and superiority of the proposed method.
6.6CRMar 15, 2021
Formal Modelling and Security Analysis of Bitcoin's Payment ProtocolPaolo Modesti, Siamak F. Shahandashti, Patrick McCorry et al.
The Payment Protocol standard BIP70, specifying how payments in Bitcoin are performed by merchants and customers, is supported by the largest payment processors and most widely-used wallets. The protocol has been shown to be vulnerable to refund attacks due to lack of authentication of the refund addresses. In this paper, we give the first formal model of the protocol and formalise the refund address security goals for the protocol, namely refund address authentication and secrecy. The formal model utilises communication channels as abstractions conveying security goals on which the protocol modeller and verifier can rely. We analyse the Payment Protocol confirming that it is vulnerable to an attack violating the refund address authentication security goal. Moreover, we present a concrete protocol revision proposal supporting the merchant with publicly verifiable evidence that can mitigate the attack. We verify that the revised protocol meets the security goals defined for the refund address. Hence, we demonstrate that the revised protocol is secure, not only against the existing attacks, but also against any further attacks violating the formalised security goals.
3.8CRMar 10, 2021
Anti-Counterfeiting for Polymer Banknotes Based on Polymer Substrate FingerprintingShen Wang, Ehsan Toreini, Feng Hao
Polymer banknotes are the trend for printed currency and have been adopted by more than fifty countries worldwide. However, over the past years, the quantity of polymer counterfeits has been increasing, so has the quality of counterfeits. This shows that the initial advantage of bringing a new polymer technology to fight against counterfeiting is reducing. To maintain one step ahead of counterfeiters, we propose a novel anti-counterfeiting technique called Polymer Substrate Fingerprinting (PSF). Our technique is built based on the observation that the opacity coating, a critical step during the production of polymer notes, is a stochastic manufacturing process, leaving uneven thickness in the coating layer and the random dispersion of impurities from the ink. The imperfections in the coating layer result in random translucent patterns when a polymer banknote is back-lit by a light source. We show these patterns can be reliably captured by a commodity negative-film scanner and processed into a compact fingerprint to uniquely identify each banknote. Using an extensive dataset of 6,200 sample images collected from 340 UK banknotes, we show that our method can reliably authenticate banknotes, and is robust against rough daily handling of banknotes. Furthermore, we show the extracted fingerprints contain around 900 bits of entropy, which makes it extremely scalable to identify every polymer note circulated globally. As compared with previous or existing anti-counterfeiting mechanisms for banknotes, our method has a distinctive advantage: it ensures that even in the extreme case when counterfeiters have procured the same printing equipment and ink as used by a legitimate government, counterfeiting banknotes remains infeasible because of the difficulty to replicate a stochastic manufacturing process.
8.3CRMay 30, 2019
DOMtegrity: Ensuring Web Page Integrity against Malicious Browser ExtensionsEhsan Toreini, Maryam Mehrnezhad, Siamak F. Shahandashti et al.
In this paper, we address an unsolved problem in the real world: how to ensure the integrity of the web content in a browser in the presence of malicious browser extensions? The problem of exposing confidential user credentials to malicious extensions has been widely understood, which has prompted major banks to deploy two-factor authentication. However, the importance of the `integrity' of the web content has received little attention. We implement two attacks on real-world online banking websites and show that ignoring the `integrity' of the web content can fundamentally defeat two-factor solutions. To address this problem, we propose a cryptographic protocol called DOMtegrity to ensure the end-to-end integrity of the DOM structure of a web page from delivering at a web server to the rendering of the page in the user's browser. DOMtegrity is the first solution that protects DOM integrity without modifying the browser architecture or requiring extra hardware. It works by exploiting subtle yet important differences between browser extensions and in-line JavaScript code. We show how DOMtegrity prevents the earlier attacks and a whole range of man-in-the-browser (MITB) attacks. We conduct extensive experiments on more than 14,000 real-world extensions to evaluate the effectiveness of DOMtegrity.
9.6CRMar 27, 2018
Cryptanalysis of a Chaotic Image Encryption Algorithm Based on Information EntropyChengqing Li, Dongdong Lin, Bingbing Feng et al.
Recently, a chaotic image encryption algorithm based on information entropy (IEAIE) was proposed. This paper scrutinizes the security properties of the algorithm and evaluates the validity of the used quantifiable security metrics. When the round number is only one, the equivalent secret key of every basic operation of IEAIE can be recovered with a differential attack separately. Some common insecurity problems in the field of chaotic image encryption are found in IEAIE, e.g. the short orbits of the digital chaotic system and the invalid sensitivity mechanism built on information entropy of the plain image. Even worse, each security metric is questionable, which undermines the security credibility of IEAIE. Hence, IEAIE can only serve as a counterexample for illustrating common pitfalls in designing secure communication method for image data.
12.3CRNov 6, 2017
Cryptanalyzing an image encryption algorithm based on autoblocking and electrocardiographyChengqing Li, Dongdong Lin, Jinhu Lü et al.
This paper analyzes the security of an image encryption algorithm proposed by Ye and Huang [\textit{IEEE MultiMedia}, vol. 23, pp. 64-71, 2016]. The Ye-Huang algorithm uses electrocardiography (ECG) signals to generate the initial key for a chaotic system and applies an autoblocking method to divide a plain image into blocks of certain sizes suitable for subsequent encryption. The designers claimed that the proposed algorithm is "strong and flexible enough for practical applications". In this paper, we perform a thorough analysis of their algorithm from the view point of modern cryptography. We find it is vulnerable to the known plaintext attack: based on one pair of a known plain-image and its corresponding cipher-image, an adversary is able to derive a mask image, which can be used as an equivalent secret key to successfully decrypt other cipher-images encrypted under the same key with a non-negligible probability of 1/256. Using this as a typical counterexample, we summarize security defects in the design of the Ye-Huang algorithm. The lessons are generally applicable to many other image encryption schemes.
9.1CRSep 27, 2017
Botnet in the Browser: Understanding Threats Caused by Malicious Browser ExtensionsRaffaello Perrotta, Feng Hao
Browser extensions have been established as a common feature present in modern browsers. However, some extension systems risk exposing APIs which are too permissive and cohesive with the browser's internal structure, thus leaving a hole for malicious developers to exploit security critical functionality within the browser itself. In this paper, we raise the awareness of the threats caused by browser extensions by presenting a botnet framework based on malicious extensions installed in the user's browser, and an exhaustive range of attacks that can be launched in this framework. We systematically categorize, describe and implement these attacks against Chrome, Firefox and Firefox-for-Android, and verify experiments on Windows, Linux and Android systems. To the best of our knowledge, this paper presents to date the most comprehensive analysis about the threats of botnet in modern browsers due to the over-privileged capabilities possessed by browser extensions. We also discuss countermeasures to the identified problems.
4.5CRMay 6, 2017
Texture to the Rescue: Practical Paper Fingerprinting based on Texture PatternsEhsan Toreini, Siamak F. Shahandashti, Feng Hao
In this paper, we propose a novel paper fingerprinting technique based on analyzing the translucent patterns revealed when a light source shines through the paper. These patterns represent the inherent texture of paper, formed by the random interleaving of wooden particles during the manufacturing process. We show these patterns can be easily captured by a commodity camera and condensed into to a compact 2048-bit fingerprint code. Prominent works in this area (Nature 2005, IEEE S&P 2009, CCS 2011) have all focused on fingerprinting paper based on the paper "surface". We are motivated by the observation that capturing the surface alone misses important distinctive features such as the non-even thickness, the random distribution of impurities, and different materials in the paper with varying opacities. Through experiments, we demonstrate that the embedded paper texture provides a more reliable source for fingerprinting than features on the surface. Based on the collected datasets, we achieve 0% false rejection and 0% false acceptance rates. We further report that our extracted fingerprints contain 807 degrees-of-freedom (DoF), which is much higher than the 249 DoF with iris codes (that have the same size of 2048 bits). The high amount of DoF for texture-based fingerprints makes our method extremely scalable for recognition among very large databases; it also allows secure usage of the extracted fingerprint in privacy-preserving authentication schemes based on error correction techniques.
15.0CRMay 18, 2016
Stealing PINs via Mobile Sensors: Actual Risk versus User PerceptionMaryam Mehrnezhad, Ehsan Toreini, Siamak F. Shahandashti et al.
In this paper, we present the actual risks of stealing user PINs by using mobile sensors versus the perceived risks by users. First, we propose PINlogger.js which is a JavaScript-based side channel attack revealing user PINs on an Android mobile phone. In this attack, once the user visits a website controlled by an attacker, the JavaScript code embedded in the web page starts listening to the motion and orientation sensor streams without needing any permission from the user. By analysing these streams, it infers the user's PIN using an artificial neural network. Based on a test set of fifty 4-digit PINs, PINlogger.js is able to correctly identify PINs in the first attempt with a success rate of 74% which increases to 86 and 94% in the second and third attempts, respectively. The high success rates of stealing user PINs on mobile devices via JavaScript indicate a serious threat to user security. With the technical understanding of the information leakage caused by mobile phone sensors, we then study users' perception of the risks associated with these sensors. We design user studies to measure the general familiarity with different sensors and their functionality, and to investigate how concerned users are about their PIN being discovered by an app that has access to all these sensors. Our studies show that there is significant disparity between the actual and perceived levels of threat with regard to the compromise of the user PIN. We confirm our results by interviewing our participants using two different approaches, within-subject and between-subject, and compare the results. We discuss how this observation, along with other factors, renders many academic and industry solutions ineffective in preventing such side channel attacks.
9.1CRFeb 12, 2016
TouchSignatures: Identification of User Touch Actions and PINs Based on Mobile Sensor Data via JavaScriptMaryam Mehrnezhad, Ehsan Toreini, Siamak F. Shahandashti et al.
Conforming to W3C specifications, mobile web browsers allow JavaScript code in a web page to access motion and orientation sensor data without the user's permission. The associated risks to user security and privacy are however not considered in W3C specifications. In this work, for the first time, we show how user security can be compromised using these sensor data via browser, despite that the data rate is 3 to 5 times slower than what is available in app. We examine multiple popular browsers on Android and iOS platforms and study their policies in granting permissions to JavaScript code with respect to access to motion and orientation sensor data. Based on our observations, we identify multiple vulnerabilities, and propose TouchSignatures which implements an attack where malicious JavaScript code on an attack tab listens to such sensor data measurements. Based on these streams, TouchSignatures is able to distinguish the user's touch actions (i.e., tap, scroll, hold, and zoom) and her PINs, allowing a remote website to learn the client-side user activities. We demonstrate the practicality of this attack by collecting data from real users and reporting high success rates using our proof-of-concept implementations. We also present a set of potential solutions to address the vulnerabilities. The W3C community and major mobile browser vendors including Mozilla, Google, Apple and Opera have acknowledge our work and are implementing some of our proposed countermeasures.