h-index66
34papers
584citations
Novelty47%
AI Score55

34 Papers

SEFeb 12Code
An Empirical Study of the Imbalance Issue in Software Vulnerability Detection

Yuejun Guo, Qiang Hu, Qiang Tang et al.

Vulnerability detection is crucial to protect software security. Nowadays, deep learning (DL) is the most promising technique to automate this detection task, leveraging its superior ability to extract patterns and representations within extensive code volumes. Despite its promise, DL-based vulnerability detection remains in its early stages, with model performance exhibiting variability across datasets. Drawing insights from other well-explored application areas like computer vision, we conjecture that the imbalance issue (the number of vulnerable code is extremely small) is at the core of the phenomenon. To validate this, we conduct a comprehensive empirical study involving nine open-source datasets and two state-of-the-art DL models. The results confirm our conjecture. We also obtain insightful findings on how existing imbalance solutions perform in vulnerability detection. It turns out that these solutions perform differently as well across datasets and evaluation metrics. Specifically: 1) Focal loss is more suitable to improve the precision, 2) mean false error and class-balanced loss encourages the recall, and 3) random over-sampling facilitates the F1-measure. However, none of them excels across all metrics. To delve deeper, we explore external influences on these solutions and offer insights for developing new solutions.

SEJul 27, 2023
CodeLens: An Interactive Tool for Visualizing Code Representations

Yuejun Guo, Seifeddine Bettaieb, Qiang Hu et al.

Representing source code in a generic input format is crucial to automate software engineering tasks, e.g., applying machine learning algorithms to extract information. Visualizing code representations can further enable human experts to gain an intuitive insight into the code. Unfortunately, as of today, there is no universal tool that can simultaneously visualise different types of code representations. In this paper, we introduce a tool, CodeLens, which provides a visual interaction environment that supports various representation methods and helps developers understand and explore them. CodeLens is designed to support multiple programming languages, such as Java, Python, and JavaScript, and four types of code representations, including sequence of tokens, abstract syntax tree (AST), data flow graph (DFG), and control flow graph (CFG). By using CodeLens, developers can quickly visualize the specific code representation and also obtain the represented inputs for models of code. The Web-based interface of CodeLens is available at http://www.codelens.org. The demonstration video can be found at http://www.codelens.org/demo.

63.7CVMay 22
SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion

Xinyu Chen, Yuyi Qian, Jiang Lin et al.

Video object insertion requires ensuring spatio-temporal coherence and interactive realism, extending far beyond simple content placement. However, current approaches are often hindered by a reliance on explicit motion engineering or resource-intensive retraining, restricting their flexibility and generalization. To bridge this gap, we present \textit{SimInsert}, a training-free paradigm that efficiently decouples the task into intuitive single-frame editing and semantic motion description. By harnessing the robust generative priors of image-to-video diffusion models, SimInsert propagates edits temporally, strictly preserving background invariance while enabling plausible, text-driven interactions between the inserted object and the dynamic environment. Our approach hinges on non-invasive guidance mechanisms that enforce structural consistency, facilitate seamless boundary fusion, and counteract the fidelity drift that typically accumulates during the denoising trajectory. Extensive quantitative experiments validate our efficacy: SimInsert surpasses state-of-the-art methods with an 18.8\% gain in PSNR, 20.1\% in SSIM, and a 44.1\% decrease in LPIPS, offering a streamlined solution for high-fidelity video editing.

DCJan 5
RelayGR: Scaling Long-Sequence Generative Recommendation via Cross-Stage Relay-Race Inference

Jiarui Wang, Huichao Chai, Yuanhang Zhang et al.

Real-time recommender systems execute multi-stage cascades (retrieval, pre-processing, fine-grained ranking) under strict tail-latency SLOs, leaving only tens of milliseconds for ranking. Generative recommendation (GR) models can improve quality by consuming long user-behavior sequences, but in production their online sequence length is tightly capped by the ranking-stage P99 budget. We observe that the majority of GR tokens encode user behaviors that are independent of the item candidates, suggesting an opportunity to pre-infer a user-behavior prefix once and reuse it during ranking rather than recomputing it on the critical path. Realizing this idea at industrial scale is non-trivial: the prefix cache must survive across multiple pipeline stages before the final ranking instance is determined, the user population implies cache footprints far beyond a single device, and indiscriminate pre-inference would overload shared resources under high QPS. We present RelayGR, a production system that enables in-HBM relay-race inference for GR. RelayGR selectively pre-infers long-term user prefixes, keeps their KV caches resident in HBM over the request lifecycle, and ensures the subsequent ranking can consume them without remote fetches. RelayGR combines three techniques: 1) a sequence-aware trigger that admits only at-risk requests under a bounded cache footprint and pre-inference load, 2) an affinity-aware router that co-locates cache production and consumption by routing both the auxiliary pre-infer signal and the ranking request to the same instance, and 3) a memory-aware expander that uses server-local DRAM to capture short-term cross-request reuse while avoiding redundant reloads. We implement RelayGR on Huawei Ascend NPUs and evaluate it with real queries. Under a fixed P99 SLO, RelayGR supports up to 1.5$\times$ longer sequences and improves SLO-compliant throughput by up to 3.6$\times$.

CLDec 2, 2025
AutoNeural: Co-Designing Vision-Language Models for NPU Inference

Wei Chen, Liangmin Wu, Yunhai Hu et al.

While Neural Processing Units (NPUs) offer high theoretical efficiency for edge AI, state-of-the-art Vision--Language Models (VLMs) tailored for GPUs often falter on these substrates. We attribute this hardware-model mismatch to two primary factors: the quantization brittleness of Vision Transformers (ViTs) and the I/O-bound nature of autoregressive attention mechanisms, which fail to utilize the high arithmetic throughput of NPUs. To bridge this gap, we propose AutoNeural, an NPU-native VLM architecture co-designed for integer-only inference. We replace the standard ViT encoder with a MobileNetV5-style backbone utilizing depthwise separable convolutions, which ensures bounded activation distributions for stable INT4/8/16 quantization. Complementing this, our language backbone integrates State-Space Model (SSM) principles with Transformer layers, employing efficient gated convolutions to achieve linear-time complexity. This hybrid design eliminates the heavy memory I/O overhead of Key-Value caching during generation. Our approach delivers substantial efficiency gains, reducing quantization error of vision encoder by up to 7x and end-to-end latency by 14x compared to conventional baselines. The AutoNeural also delivers 3x decoding speed and 4x longer context window than the baseline. We validate these improvements via a real-world automotive case study on the Qualcomm SA8295P SoC, demonstrating real-time performance for cockpit applications. Our results highlight that rethinking model topology specifically for NPU constraints is a prerequisite for robust multi-modal edge intelligence.

CVAug 31, 2024
Towards Secure and Usable 3D Assets: A Novel Framework for Automatic Visible Watermarking

Gursimran Singh, Tianxi Hu, Mohammad Akbari et al.

3D models, particularly AI-generated ones, have witnessed a recent surge across various industries such as entertainment. Hence, there is an alarming need to protect the intellectual property and avoid the misuse of these valuable assets. As a viable solution to address these concerns, we rigorously define the novel task of automated 3D visible watermarking in terms of two competing aspects: watermark quality and asset utility. Moreover, we propose a method of embedding visible watermarks that automatically determines the right location, orientation, and number of watermarks to be placed on arbitrary 3D assets for high watermark quality and asset utility. Our method is based on a novel rigid-body optimization that uses back-propagation to automatically learn transforms for ideal watermark placement. In addition, we propose a novel curvature-matching method for fusing the watermark into the 3D model that further improves readability and security. Finally, we provide a detailed experimental analysis on two benchmark 3D datasets validating the superior performance of our approach in comparison to baselines. Code and demo are available.

CVSep 15, 2024
One-Shot Learning for Pose-Guided Person Image Synthesis in the Wild

Dongqi Fan, Tao Chen, Mingjie Wang et al.

Current Pose-Guided Person Image Synthesis (PGPIS) methods depend heavily on large amounts of labeled triplet data to train the generator in a supervised manner. However, they often falter when applied to in-the-wild samples, primarily due to the distribution gap between the training datasets and real-world test samples. While some researchers aim to enhance model generalizability through sophisticated training procedures, advanced architectures, or by creating more diverse datasets, we adopt the test-time fine-tuning paradigm to customize a pre-trained Text2Image (T2I) model. However, naively applying test-time tuning results in inconsistencies in facial identities and appearance attributes. To address this, we introduce a Visual Consistency Module (VCM), which enhances appearance consistency by combining the face, text, and image embedding. Our approach, named OnePoseTrans, requires only a single source image to generate high-quality pose transfer results, offering greater stability than state-of-the-art data-driven methods. For each test case, OnePoseTrans customizes a model in around 48 seconds with an NVIDIA V100 GPU.

CVSep 23, 2024
Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities

Peizhi Yan, Rabab Ward, Qiang Tang et al.

Recent advancements in 3D Gaussian Splatting (3DGS) have unlocked significant potential for modeling 3D head avatars, providing greater flexibility than mesh-based methods and more efficient rendering compared to NeRF-based approaches. Despite these advancements, the creation of controllable 3DGS-based head avatars remains time-intensive, often requiring tens of minutes to hours. To expedite this process, we here introduce the "Gaussian Deja-vu" framework, which first obtains a generalized model of the head avatar and then personalizes the result. The generalized model is trained on large 2D (synthetic and real) image datasets. This model provides a well-initialized 3D Gaussian head that is further refined using a monocular video to achieve the personalized head avatar. For personalizing, we propose learnable expression-aware rectification blendmaps to correct the initial 3D Gaussians, ensuring rapid convergence without the reliance on neural networks. Experiments demonstrate that the proposed method meets its objectives. It outperforms state-of-the-art 3D Gaussian head avatars in terms of photorealistic quality as well as reduces training time consumption to at least a quarter of the existing methods, producing the avatar in minutes.

CVAug 13, 2025Code
Region-to-Region: Enhancing Generative Image Harmonization with Adaptive Regional Injection

Zhiqiu Zhang, Dongqi Fan, Mingjie Wang et al.

The goal of image harmonization is to adjust the foreground in a composite image to achieve visual consistency with the background. Recently, latent diffusion model (LDM) are applied for harmonization, achieving remarkable results. However, LDM-based harmonization faces challenges in detail preservation and limited harmonization ability. Additionally, current synthetic datasets rely on color transfer, which lacks local variations and fails to capture complex real-world lighting conditions. To enhance harmonization capabilities, we propose the Region-to-Region transformation. By injecting information from appropriate regions into the foreground, this approach preserves original details while achieving image harmonization or, conversely, generating new composite data. From this perspective, We propose a novel model R2R. Specifically, we design Clear-VAE to preserve high-frequency details in the foreground using Adaptive Filter while eliminating disharmonious elements. To further enhance harmonization, we introduce the Harmony Controller with Mask-aware Adaptive Channel Attention (MACA), which dynamically adjusts the foreground based on the channel importance of both foreground and background regions. To address the limitation of existing datasets, we propose Random Poisson Blending, which transfers color and lighting information from a suitable region to the foreground, thereby generating more diverse and challenging synthetic images. Using this method, we construct a new synthetic dataset, RPHarmony. Experiments demonstrate the superiority of our method over other methods in both quantitative metrics and visual harmony. Moreover, our dataset helps the model generate more realistic images in real examples. Our code, dataset, and model weights have all been released for open access.

CVMar 14, 2025Code
StyleMorpheus: A Style-Based 3D-Aware Morphable Face Model

Peizhi Yan, Rabab K. Ward, Dan Wang et al.

For 3D face modeling, the recently developed 3D-aware neural rendering methods are able to render photorealistic face images with arbitrary viewing directions. The training of the parametric controllable 3D-aware face models, however, still relies on a large-scale dataset that is lab-collected. To address this issue, this paper introduces "StyleMorpheus", the first style-based neural 3D Morphable Face Model (3DMM) that is trained on in-the-wild images. It inherits 3DMM's disentangled controllability (over face identity, expression, and appearance) but without the need for accurately reconstructed explicit 3D shapes. StyleMorpheus employs an auto-encoder structure. The encoder aims at learning a representative disentangled parametric code space and the decoder improves the disentanglement using shape and appearance-related style codes in the different sub-modules of the network. Furthermore, we fine-tune the decoder through style-based generative adversarial learning to achieve photorealistic 3D rendering quality. The proposed style-based design enables StyleMorpheus to achieve state-of-the-art 3D-aware face reconstruction results, while also allowing disentangled control of the reconstructed face. Our model achieves real-time rendering speed, allowing its use in virtual reality applications. We also demonstrate the capability of the proposed style-based design in face editing applications such as style mixing and color editing. Project homepage: https://github.com/ubc-3d-vision-lab/StyleMorpheus.

CVNov 7, 2025
FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction

Jiang Lin, Xinyu Chen, Song Wu et al.

Controlling the spatial and semantic structure of diffusion-generated images remains a challenge. Existing methods like ControlNet rely on handcrafted condition maps and retraining, limiting flexibility and generalization. Inversion-based approaches offer stronger alignment but incur high inference cost due to dual-path denoising. We present FreeControl, a training-free framework for semantic structural control in diffusion models. Unlike prior methods that extract attention across multiple timesteps, FreeControl performs one-step attention extraction from a single, optimally chosen key timestep and reuses it throughout denoising. This enables efficient structural guidance without inversion or retraining. To further improve quality and stability, we introduce Latent-Condition Decoupling (LCD): a principled separation of the key timestep and the noised latent used in attention extraction. LCD provides finer control over attention quality and eliminates structural artifacts. FreeControl also supports compositional control via reference images assembled from multiple sources - enabling intuitive scene layout design and stronger prompt alignment. FreeControl introduces a new paradigm for test-time control, enabling structurally and semantically aligned, visually coherent generation directly from raw images, with the flexibility for intuitive compositional design and compatibility with modern diffusion models at approximately 5 percent additional cost.

CVFeb 20, 2025
Deep learning based infrared small object segmentation: Challenges and future directions

Zhengeng Yang, Hongshan Yu, Jianjun Zhang et al.

Infrared sensing is a core method for supporting unmanned systems, such as autonomous vehicles and drones. Recently, infrared sensors have been widely deployed on mobile and stationary platforms for detection and classification of objects from long distances and in wide field of views. Given its success in the vision image analysis domain, deep learning has also been applied for object recognition in infrared images. However, techniques that have proven successful in visible light perception face new challenges in the infrared domain. These challenges include extremely low signal-to-noise ratios in infrared images, very small and blurred objects of interest, and limited availability of labeled/unlabeled training data due to the specialized nature of infrared sensors. Numerous methods have been proposed in the literature for the detection and classification of small objects in infrared images achieving varied levels of success. There is a need for a survey paper that critically analyzes existing techniques in this domain, identifies unsolved challenges and provides future research directions. This paper fills the gap and offers a concise and insightful review of deep learning-based methods. It also identifies the challenges faced by existing infrared object segmentation methods and provides a structured review of existing infrared perception methods from the perspective of these challenges and highlights the motivations behind the various approaches. Finally, this review suggests promising future directions based on recent advancements within this domain.

CRNov 28, 2025
Evaluating LLMs for One-Shot Patching of Real and Artificial Vulnerabilities

Aayush Garg, Zanis Ali Khan, Renzo Degiovanni et al.

Automated vulnerability patching is crucial for software security, and recent advancements in Large Language Models (LLMs) present promising capabilities for automating this task. However, existing research has primarily assessed LLMs using publicly disclosed vulnerabilities, leaving their effectiveness on related artificial vulnerabilities largely unexplored. In this study, we empirically evaluate the patching effectiveness and complementarity of several prominent LLMs, such as OpenAI's GPT variants, LLaMA, DeepSeek, and Mistral models, using both real and artificial vulnerabilities. Our evaluation employs Proof-of-Vulnerability (PoV) test execution to concretely assess whether LLM-generated source code successfully patches vulnerabilities. Our results reveal that LLMs patch real vulnerabilities more effectively compared to artificial ones. Additionally, our analysis reveals significant variability across LLMs in terms of overlapping (multiple LLMs patching the same vulnerabilities) and complementarity (vulnerabilities patched exclusively by a single LLM), emphasizing the importance of selecting appropriate LLMs for effective vulnerability patching.

CVOct 7, 2025
ArchitectHead: Continuous Level of Detail Control for 3D Gaussian Head Avatars

Peizhi Yan, Rabab Ward, Qiang Tang et al.

3D Gaussian Splatting (3DGS) has enabled photorealistic and real-time rendering of 3D head avatars. Existing 3DGS-based avatars typically rely on tens of thousands of 3D Gaussian points (Gaussians), with the number of Gaussians fixed after training. However, many practical applications require adjustable levels of detail (LOD) to balance rendering efficiency and visual quality. In this work, we propose "ArchitectHead", the first framework for creating 3D Gaussian head avatars that support continuous control over LOD. Our key idea is to parameterize the Gaussians in a 2D UV feature space and propose a UV feature field composed of multi-level learnable feature maps to encode their latent features. A lightweight neural network-based decoder then transforms these latent features into 3D Gaussian attributes for rendering. ArchitectHead controls the number of Gaussians by dynamically resampling feature maps from the UV feature field at the desired resolutions. This method enables efficient and continuous control of LOD without retraining. Experimental results show that ArchitectHead achieves state-of-the-art (SOTA) quality in self and cross-identity reenactment tasks at the highest LOD, while maintaining near SOTA performance at lower LODs. At the lowest LOD, our method uses only 6.2\% of the Gaussians while the quality degrades moderately (L1 Loss +7.9\%, PSNR --0.97\%, SSIM --0.6\%, LPIPS Loss +24.1\%), and the rendering speed nearly doubles.

SEJun 5, 2025
A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair

Zanis Ali Khan, Aayush Garg, Qiang Tang

Software vulnerabilities pose significant security threats, requiring effective mitigation. While Automated Program Repair (APR) has advanced in fixing general bugs, vulnerability patching, a security-critical aspect of APR remains underexplored. This study investigates pre-trained language models, CodeBERT and CodeT5, for automated vulnerability patching across six datasets and four languages. We evaluate their accuracy and generalization to unknown vulnerabilities. Results show that while both models face challenges with fragmented or sparse context, CodeBERT performs comparatively better in such scenarios, whereas CodeT5 excels in capturing complex vulnerability patterns. CodeT5 also demonstrates superior scalability. Furthermore, we test fine-tuned models on both in-distribution (trained) and out-of-distribution (unseen) datasets. While fine-tuning improves in-distribution performance, models struggle to generalize to unseen data, highlighting challenges in robust vulnerability detection. This study benchmarks model performance, identifies limitations in generalization, and provides actionable insights to advance automated vulnerability patching for real-world security applications.

CVNov 5, 2021
Spatial-Temporal Residual Aggregation for High Resolution Video Inpainting

Vishnu Sanjay Ramiya Srinivasan, Rui Ma, Qiang Tang et al.

Recent learning-based inpainting algorithms have achieved compelling results for completing missing regions after removing undesired objects in videos. To maintain the temporal consistency among the frames, 3D spatial and temporal operations are often heavily used in the deep networks. However, these methods usually suffer from memory constraints and can only handle low resolution videos. We propose STRA-Net, a novel spatial-temporal residual aggregation framework for high resolution video inpainting. The key idea is to first learn and apply a spatial and temporal inpainting network on the downsampled low resolution videos. Then, we refine the low resolution results by aggregating the learned spatial and temporal image residuals (details) to the upsampled inpainted frames. Both the quantitative and qualitative evaluations show that we can produce more temporal-coherent and visually appealing results than the state-of-the-art methods on inpainting high resolution videos.

DCJun 15, 2021
Leopard: Towards High Throughput-Preserving BFT for Large-scale Systems

Kexin Hu, Kaiwen Guo, Qiang Tang et al.

With the emergence of large-scale decentralized applications, a scalable and efficient Byzantine Fault Tolerant (BFT) protocol of hundreds of replicas is desirable. Although the throughput of existing leader-based BFT protocols has reached a high level of $10^5$ requests per second for a small scale of replicas, it drops significantly when the number of replicas increases, which leads to a lack of practicality. This paper focuses on the scalability of BFT protocols and identifies a major bottleneck to leader-based BFT protocols due to the excessive workload of the leader at large scales. A new metric of scaling factor is defined to capture whether a BFT protocol will get stuck when it scales out, which can be used to measure the performance of efficiency and scalability of BFT protocols. We propose "Leopard", the first leader-based BFT protocol that scales to multiple hundreds of replicas, and more importantly, preserves a high efficiency. We remove the bottleneck by introducing a technique of achieving a constant scaling factor, which takes full advantage of the idle resource and adaptively balances the workload of the leader among all replicas. We implement Leopard and evaluate its performance compared to HotStuff, the state-of-the-art BFT protocol. We run extensive experiments on the two systems with up to 600 replicas. The results show that Leopard achieves significant performance improvements both on throughput and scalability. In particular, the throughput of Leopard remains at a high level of $10^5$ when the system scales out to 600 replicas. It achieves a $5\times$ throughput over HotStuff when the scale is 300 (which is already the largest scale we can see the progress of the latter in our experiments), and the gap becomes wider as the number of replicas further increases.

CRJun 15, 2021
Efficient Asynchronous Byzantine Agreement without Private Setups

Yingzi Gao, Yuan Lu, Zhenliang Lu et al.

Efficient asynchronous Byzantine agreement (BA) protocols were mostly studied with private setups, e.g., pre-setup threshold cryptosystem. Challenges remain to reduce the large communication in the absence of such setups. Recently, Abraham et al. (PODC'21) presented the first asynchronous validated BA (VBA) with expected $O(n^3)$ messages and $O(1)$ rounds, relying on only public key infrastructure (PKI) setup, but the design still costs $O(λn^3 \log n)$ bits. Here $n$ is the number of parties, and $λ$ is a cryptographic security parameter. In this paper, we reduce the communication of private-setup free asynchronous BA to expected $O(λn^3)$ bits. At the core of our design, we give a systematic treatment of common randomness protocols in the asynchronous network, and proceed as: - We give an efficient reasonably fair common coin protocol in the asynchronous setting with only PKI setup. It costs only $O(λn^3)$ bits and $O(1)$ rounds, and ensures that with at least 1/3 probability, all honest parties can output a common bit that is as if randomly flipped. This directly renders more efficient private-setup free asynchronous binary agreement (ABA) with expected $O(λn^3)$ bits and $O(1)$ rounds. - Then, we lift our common coin to attain perfect agreement by using a single ABA. This gives us a reasonably fair random leader election protocol with expected $O(λn^3)$ communication and expected constant rounds. It is pluggable in all existing VBA protocols (e.g., Cachin et al., CRYPTO'01; Abraham et al., PODC'19; Lu et al., PODC'20) to remove the needed private setup or distributed key generation (DKG). As such, the communication of private-setup free VBA is reduced to expected $O(λn^3)$ bits while preserving fast termination in expected $O(1)$ rounds.

CRMar 17, 2021
Bolt-Dumbo Transformer: Asynchronous Consensus As Fast As the Pipelined BFT

Yuan Lu, Zhenliang Lu, Qiang Tang

An urgent demand of deploying BFT consensus over the Internet is raised for implementing blockchain services. The deterministic (partial) synchronous protocols can be simple and fast in good network conditions, but are subject to denial-of-service when synchrony assumption fails. Asynchronous protocols, on the contrary, are robust against the adversarial network, but are substantially more complicated and slower for the inherent use of randomness. Facing the issues, optimistic asynchronous atomic broadcast ( Kursawe-Shoup, 2002; Ramasamy-Cachin, 2005) was proposed to improve the normal-case performance of the slow asynchronous consensus. They run a deterministic fastlane if the network condition remains good, and can fall back to a fully asynchronous protocol via a pace-synchronization mechanism if the fastlane fails. Unfortunately, existing pace-synchronization directly uses a heavy tool of asynchronous multi-valued validated Byzantine agreement (MVBA). We present Bolt-Dumbo Transformer (BDT), a generic framework for practical optimistic asynchronous atomic broadcast. At the core of BDT, we set forth a new fastlane abstraction that is simple and fast, while preparing honest parties to gracefully face potential fastlane failures caused by malicious leader or bad network. This enables a highly efficient pace-synchronization to handle fallback. The resulting design reduces a cumbersome MVBA to a variant of the conceptually simplest binary agreement only. Besides detailed security analyses, we also give concrete instantiations of our framework and implement them. Extensive experiments demonstrate that BDT can enjoy both the low latency of deterministic protocols (e.g. 2-chain version of HotStuff) and the robustness of state-of-the-art asynchronous protocols in practice.

CRFeb 9, 2021
Fair Peer-to-Peer Content Delivery via Blockchain

Songlin He, Yuan Lu, Qiang Tang et al.

Peer-to-peer (p2p) content delivery is promising to provide benefits like cost-saving and scalable peak-demand handling in comparison with conventional content delivery networks (CDNs) and complement the decentralized storage networks such as Filecoin. However, reliable p2p delivery requires proper enforcement of delivery fairness, i.e., the deliverers should be rewarded according to their in-time delivery. Unfortunately, most existing studies on delivery fairness are based on non-cooperative game-theoretic assumptions that are arguably unrealistic in the ad-hoc p2p setting. We for the first time put forth the expressive yet still minimalist securities for p2p content delivery, and give two efficient solutions FairDownload and FairStream via the blockchain for p2p downloading and p2p streaming scenarios, respectively. Our designs not only guarantee delivery fairness to ensure deliverers be paid (nearly) proportional to his in-time delivery, but also ensure the content consumers and content providers to be fairly treated. The fairness of each party can be guaranteed when the other two parties collude to arbitrarily misbehave. Moreover, the systems are efficient in the sense of attaining asymptotically optimal on-chain costs and optimal deliverer communication. We implement the protocols to build the prototype systems atop the Ethereum Ropsten network. Extensive experiments done in LAN and WAN settings showcase their high practicality.

CROct 26, 2020
Another Look at Privacy-Preserving Automated Contact Tracing

Qiang Tang

In the current COVID-19 pandemic, manual contact tracing has been proven very helpful to reach close contacts of infected users and slow down virus spreading. To improve its scalability, a number of automated contact tracing (ACT) solutions have proposed and some of them have been deployed. Despite the dedicated efforts, security and privacy issues of these solutions are still open and under intensive debate. In this paper, we examine the ACT concept from a broader perspective, by focusing on not only security and privacy issues but also functional issues such as interface, usability and coverage. We first elaborate on these issues and particularly point out the inevitable privacy leakages in existing BLE-based ACT solutions. Then, we propose a venue-based ACT concept, which only monitors users' contacting history in virus-spreading-prone venues and is able to incorporate different location tracking technologies such as BLE and WIFI. Finally, we instantiate the venue-based ACT concept and show that our instantiation can mitigate most of the issues we have identified in our analysis.

CVAug 1, 2020
Animating Through Warping: an Efficient Method for High-Quality Facial Expression Animation

Zili Yi, Qiang Tang, Vishnu Sanjay Ramiya Srinivasan et al.

Advances in deep neural networks have considerably improved the art of animating a still image without operating in 3D domain. Whereas, prior arts can only animate small images (typically no larger than 512x512) due to memory limitations, difficulty of training and lack of high-resolution (HD) training datasets, which significantly reduce their potential for applications in movie production and interactive systems. Motivated by the idea that HD images can be generated by adding high-frequency residuals to low-resolution results produced by a neural network, we propose a novel framework known as Animating Through Warping (ATW) to enable efficient animation of HD images. Specifically, the proposed framework consists of two modules, a novel two-stage neural-network generator and a novel post-processing module known as Animating Through Warping (ATW). It only requires the generator to be trained on small images and can do inference on an image of any size. During inference, an HD input image is decomposed into a low-resolution component(128x128) and its corresponding high-frequency residuals. The generator predicts the low-resolution result as well as the motion field that warps the input face to the desired status (e.g., expressions categories or action units). Finally, the ResWarp module warps the residuals based on the motion field and adding the warped residuals to generates the final HD results from the naively up-sampled low-resolution results. Experiments show the effectiveness and efficiency of our method in generating high-resolution animations. Our proposed framework successfully animates a 4K facial image, which has never been achieved by prior neural models. In addition, our method generally guarantee the temporal coherency of the generated animations. Source codes will be made publicly available.

CVMay 19, 2020
Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting

Zili Yi, Qiang Tang, Shekoofeh Azizi et al.

Recently data-driven image inpainting methods have made inspiring progress, impacting fundamental image editing tasks such as object removal and damaged image repairing. These methods are more effective than classic approaches, however, due to memory limitations they can only handle low-resolution inputs, typically smaller than 1K. Meanwhile, the resolution of photos captured with mobile devices increases up to 8K. Naive up-sampling of the low-resolution inpainted result can merely yield a large yet blurry result. Whereas, adding a high-frequency residual image onto the large blurry image can generate a sharp result, rich in details and textures. Motivated by this, we propose a Contextual Residual Aggregation (CRA) mechanism that can produce high-frequency residuals for missing contents by weighted aggregating residuals from contextual patches, thus only requiring a low-resolution prediction from the network. Since convolutional layers of the neural network only need to operate on low-resolution inputs and outputs, the cost of memory and computing power is thus well suppressed. Moreover, the need for high-resolution training datasets is alleviated. In our experiments, we train the proposed model on small images with resolutions 512x512 and perform inference on high-resolution images, achieving compelling inpainting quality. Our model can inpaint images as large as 8K with considerable hole sizes, which is intractable with previous learning-based approaches. We further elaborate on the light-weight design of the network architecture, achieving real-time performance on 2K images on a GTX 1080 Ti GPU. Codes are available at: Atlas200dk/sample-imageinpainting-HiFill.

CRApr 14, 2020
Privacy-Preserving Contact Tracing: current solutions and open questions

Qiang Tang

The COVID-19 pandemic has posed a unique challenge for the world to find solutions, ranging from vaccines to ICT solutions to slow down the virus spreading. Due to the highly contagious nature of the virus, social distancing is one fundamental measure which has already adopted by many countries. At the technical level, this prioritises contact tracing solutions, which can alert the users who have been in close contact with the infected persons and meanwhile allow heath authorities to take proper actions. In this paper, we examine several existing privacy-aware contact tracing solutions and analyse their (dis)advantages. At the end, we describe several major observations and outline an interdisciplinary research agenda towards more comprehensive and effective privacy-aware contact tracing solutions.

CRMar 23, 2020
Dragoon: Private Decentralized HITs Made Practical

Yuan Lu, Qiang Tang, Guiling Wang

With the rapid popularity of blockchain, decentralized human intelligence tasks (HITs) are proposed to crowdsource human knowledge without relying on vulnerable third-party platforms. However, the inherent limits of blockchain cause decentralized HITs to face a few "new" challenges. For example, the confidentiality of solicited data turns out to be the sine qua non, though it was an arguably dispensable property in the centralized setting. To ensure the "new" requirement of data privacy, existing decentralized HITs use generic zero-knowledge proof frameworks (e.g. SNARK), but scarcely perform well in practice, due to the inherently expensive cost of generality. We present a practical decentralized protocol for HITs, which also achieves the fairness between requesters and workers. At the core of our contributions, we avoid the powerful yet highly-costly generic zk-proof tools and propose a special-purpose scheme to prove the quality of encrypted data. By various non-trivial statement reformations, proving the quality of encrypted data is reduced to efficient verifiable decryption, thus making decentralized HITs practical. Along the way, we rigorously define the ideal functionality of decentralized HITs and then prove the security due to the ideal-real paradigm. We further instantiate our protocol to implement a system called Dragoon, an instance of which is deployed atop Ethereum to facilitate an image annotation task used by ImageNet. Our evaluations demonstrate its practicality: the on-chain handling cost of Dragoon is even less than the handling fee of Amazon's Mechanical Turk for the same ImageNet HIT.

GTMar 14, 2020
Generic Superlight Client for Permissionless Blockchains

Yuan Lu, Qiang Tang, Guiling Wang

We conduct a systematic study on the light client of permissionless blockchains, in the setting where the full nodes and the light clients are rational. Under such a game-theoretic model, we design a superlight-client protocol to enable a client to employ some relaying full nodes (e.g. two or one) to read the blockchain. The protocol is "generic", i.e., it can be deployed disregarding the underlying consensuses, and also "superlight", i.e., the computational cost of the light client to predicate the (non)existence of a transaction in the blockchain becomes a small constant. Since our protocol resolves a fundamental challenge of broadening the usage of blockchain technology, it captures a wide variety of important use-cases such as multi-chain wallets, DApp browsers and more.

CROct 9, 2019
Privacy-preserving and yet Robust Collaborative Filtering Recommender as a Service

Qiang Tang

Collaborative filtering recommenders provide effective personalization services at the cost of sacrificing the privacy of their end users. Due to the increasing concerns from the society and stricter privacy regulations, it is an urgent research challenge to design privacy-preserving and yet robust recommenders which offer recommendation services to privacy-aware users. Our analysis shows that existing solutions fall short in several aspects, including lacking attention to the precise output to end users and ignoring the correlated robustness issues. In this paper, we provide a general system structure for latent factor based collaborative filtering recommenders by formulating them into model training and prediction computing stages, and also describe a new security model. Aiming at pragmatic solutions, we first show how to construct privacy-preserving and yet robust model training stage based on existing solutions. Then, we propose two cryptographic protocols to realize a privacy-preserving prediction computing stage, depending on whether or not an extra proxy is involved. Different from standard Top-k recommendations, we alternatively let the end user retrieve the unrated items whose predictions are above a threshold, as a result of our privacy by design strategy. Experimental results show that our new protocols are quite efficient.

CRAug 26, 2019
Towards Blockchain-enabled Searchable Encryption

Qiang Tang

Distributed Leger Technologies (DLTs), most notably Blockchain technologies, bring decentralised platforms that eliminate a single trusted third party and avoid the notorious single point of failure vulnerability. Since Nakamoto's Bitcoin cryptocurrency system, an enormous number of decentralised applications have been proposed on top of these technologies, aiming at more transparency and trustworthiness than their traditional counterparts. These applications spread over a lot of areas, e.g. financial services, healthcare, transportation, supply chain management, and cloud computing. While Blockchain brings transparency and decentralised trust intuitively due to the consensus of a (very large) group of nodes (or, miners), it introduces very subtle implications for other desirable properties such as privacy. In this work, we demonstrate these subtle implications for Blockchain-based searchable encryption solutions, which are one specific use case of cloud computing services. These solutions rely on Blockchain to achieve both the standard privacy property and the new fairness property, which requires that search operations are carried out faithfully and are rewarded accordingly. We show that directly replacing the server in an existing searchable encryption solution with a Blockchain will cause undesirable operational cost, privacy loss, and security vulnerabilities. The analysis results indicate that a dedicated server is still needed to achieve the desired privacy guarantee. To this end, we propose two frameworks which can be instantiated based on most existing searchable encryption schemes. Through analysing these two frameworks, we affirmatively show that a carefully engineered Blockchain-based solution can achieve the desired fairness property while preserving the privacy guarantee of the original searchable encryption scheme simultaneously.

HCMar 3, 2018
ZebraLancer: Decentralized Crowdsourcing of Human Knowledge atop Open Blockchain

Yuan Lu, Qiang Tang, Guiling Wang

We design and implement the first private and anonymous decentralized crowdsourcing system ZebraLancer, and overcome two fundamental challenges of decentralizing crowdsourcing, i.e., data leakage and identity breach. First, our outsource-then-prove methodology resolves the tension between the blockchain transparency and the data confidentiality to guarantee the basic utilities/fairness requirements of data crowdsourcing, thus ensuring: (i) a requester will not pay more than what data deserve, according to a policy announced when her task is published via the blockchain; (ii) each worker indeed gets a payment based on the policy, if he submits data to the blockchain; (iii) the above properties are realized not only without a central arbiter, but also without leaking the data to the open blockchain. Second, the transparency of blockchain allows one to infer private information about workers and requesters through their participation history. Simply enabling anonymity is seemingly attempting but will allow malicious workers to submit multiple times to reap rewards. ZebraLancer also overcomes this problem by allowing anonymous requests/submissions without sacrificing accountability. The idea behind is a subtle linkability: if a worker submits twice to a task, anyone can link the submissions, or else he stays anonymous and unlinkable across tasks. To realize this delicate linkability, we put forward a novel cryptographic concept, i.e., the common-prefix-linkable anonymous authentication. We remark the new anonymous authentication scheme might be of independent interest. Finally, we implement our protocol for a common image annotation task and deploy it in a test net of Ethereum. The experiment results show the applicability of our protocol atop the existing real-world blockchain.

CRFeb 7, 2018
CryptoRec: Privacy-preserving Recommendation as a Service

Jun Wang, Afonso Arriaga, Qiang Tang et al.

Recommender systems rely on large datasets of historical data and entail serious privacy risks. A server offering Recommendation as a Service to a client might leak more information than necessary regarding its recommendation model and dataset. At the same time, the disclosure of the client's preferences to the server is also a matter of concern. Devising privacy-preserving protocols using general cryptographic primitives (e.g., secure multi-party computation or homomorphic encryption), is a typical approach to overcome privacy concerns, but in conjunction with state-of-the-art recommender systems often yields far-from-practical solutions. In this paper, we tackle this problem from the direction of constructing crypto-friendly machine learning algorithms. In particular, we propose CryptoRec, a secure two-party computation protocol for Recommendation as a Service, which encompasses a novel recommender system. This model possesses two interesting properties: (1) It models user-item interactions in an item-only latent feature space in which personalized user representations are automatically captured by an aggregation of pre-learned item features. This means that a server with a pre-trained model can provide recommendations for a client whose data is not in its training set. Nevertheless, re-training the model with the client's data still improves accuracy. (2) It only uses addition and multiplication operations, making the model straightforwardly compatible with homomorphic encryption schemes. We demonstrate the efficiency and accuracy of CryptoRec on three real-world datasets. CryptoRec allows a server with thousands of items to privately answer a prediction query within a few seconds on a single PC, while its prediction accuracy is still competitive with state-of-the-art recommender systems computing over clear data.

GTDec 1, 2017
Together or Alone: The Price of Privacy in Collaborative Learning

Balazs Pejo, Qiang Tang, Gergely Biczok

Machine learning algorithms have reached mainstream status and are widely deployed in many applications. The accuracy of such algorithms depends significantly on the size of the underlying training dataset; in reality a small or medium sized organization often does not have the necessary data to train a reasonably accurate model. For such organizations, a realistic solution is to train their machine learning models based on their joint dataset (which is a union of the individual ones). Unfortunately, privacy concerns prevent them from straightforwardly doing so. While a number of privacy-preserving solutions exist for collaborating organizations to securely aggregate the parameters in the process of training the models, we are not aware of any work that provides a rational framework for the participants to precisely balance the privacy loss and accuracy gain in their collaboration. In this paper, by focusing on a two-player setting, we model the collaborative training process as a two-player game where each player aims to achieve higher accuracy while preserving the privacy of its own dataset. We introduce the notion of Price of Privacy, a novel approach for measuring the impact of privacy protection on the accuracy in the proposed framework. Furthermore, we develop a game-theoretical model for different player types, and then either find or prove the existence of a Nash Equilibrium with regard to the strength of privacy protection for each player. Using recommendation systems as our main use case, we demonstrate how two players can make practical use of the proposed theoretical framework, including setting up the parameters and approximating the non-trivial Nash Equilibrium.

CRJan 9, 2017
Differentially Private Neighborhood-based Recommender Systems

Jun Wang, Qiang Tang

Privacy issues of recommender systems have become a hot topic for the society as such systems are appearing in every corner of our life. In contrast to the fact that many secure multi-party computation protocols have been proposed to prevent information leakage in the process of recommendation computation, very little has been done to restrict the information leakage from the recommendation results. In this paper, we apply the differential privacy concept to neighborhood-based recommendation methods (NBMs) under a probabilistic framework. We first present a solution, by directly calibrating Laplace noise into the training process, to differential-privately find the maximum a posteriori parameters similarity. Then we connect differential privacy to NBMs by exploiting a recent observation that sampling from the scaled posterior distribution of a Bayesian model results in provably differentially private systems. Our experiments show that both solutions allow promising accuracy with a modest privacy budget, and the second solution yields better accuracy if the sampling asymptotically converges. We also compare our solutions to the recent differentially private matrix factorization (MF) recommender systems, and show that our solutions achieve better accuracy when the privacy budget is reasonably small. This is an interesting result because MF systems often offer better accuracy when differential privacy is not applied.

IRJan 5, 2017
A Probabilistic View of Neighborhood-based Recommendation Methods

Jun Wang, Qiang Tang

Probabilistic graphic model is an elegant framework to compactly present complex real-world observations by modeling uncertainty and logical flow (conditionally independent factors). In this paper, we present a probabilistic framework of neighborhood-based recommendation methods (PNBM) in which similarity is regarded as an unobserved factor. Thus, PNBM leads the estimation of user preference to maximizing a posterior over similarity. We further introduce a novel multi-layer similarity descriptor which models and learns the joint influence of various features under PNBM, and name the new framework MPNBM. Empirical results on real-world datasets show that MPNBM allows very accurate estimation of user preferences.

CRNov 18, 2013
On the Security of Key Extraction from Measuring Physical Quantities

Matt Edman, Aggelos Kiayias, Qiang Tang et al.

Key extraction via measuring a physical quantity is a class of information theoretic key exchange protocols that rely on the physical characteristics of the communication channel to enable the computation of a shared key by two (or more) parties that share no prior secret information. The key is supposed to be information theoretically hidden to an eavesdropper. Despite the recent surge of research activity in the area, concrete claims about the security of the protocols typically rely on channel abstractions that are not fully experimentally substantiated. In this work, we propose a novel methodology for the {\em experimental} security analysis of these protocols. The crux of our methodology is a falsifiable channel abstraction that is accompanied by an efficient experimental approximation algorithm of the {\em conditional min-entropy} available to the two parties given the view of the eavesdropper. We focus on the signal strength between two wirelessly communicating transceivers as the measured quantity and we use an experimental setup to compute the conditional min-entropy of the channel given the view of the attacker which we find to be linearly increasing. Armed with this understanding of the channel, we showcase the methodology by providing a general protocol for key extraction in this setting that is shown to be secure for a concrete parameter selection. In this way we provide a first comprehensively analyzed wireless key extraction protocol that is demonstrably secure against passive adversaries. Our methodology uses hidden Markov models as the channel model and a dynamic programming approach to approximate conditional min-entropy but other possible instantiations of the methodology can be motivated by our work.