LGFeb 4
RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable MemoryYi-Chun Liao, Chieh-Lin Tsai, Yuan-Hao Chang et al.
Although deep learning has demonstrated remarkable capability in learning from unstructured data, modern tree-based ensemble models remain superior in extracting relevant information and learning from structured datasets. While several efforts have been made to accelerate tree-based models, the inherent characteristics of the models pose significant challenges for conventional accelerators. Recent research leveraging content-addressable memory (CAM) offers a promising solution for accelerating tree-based models, yet existing designs suffer from excessive memory consumption and low utilization. This work addresses these challenges by introducing RETENTION, an end-to-end framework that significantly reduces CAM capacity requirement for tree-based model inference. We propose an iterative pruning algorithm with a novel pruning criterion tailored for bagging-based models (e.g., Random Forest), which minimizes model complexity while ensuring controlled accuracy degradation. Additionally, we present a tree mapping scheme that incorporates two innovative data placement strategies to alleviate the memory redundancy caused by the widespread use of don't care states in CAM. Experimental results show that implementing the tree mapping scheme alone reduces CAM capacity requirement by $1.46\times$ to $21.30 \times$, while the full RETENTION framework achieves $4.35\times$ to $207.12\times$ reduction with less than 3\% accuracy loss. These results demonstrate that RETENTION is highly effective in minimizing CAM resource demand, providing a resource-efficient direction for tree-based model acceleration.
34.4LGMar 24
APreQEL: Adaptive Mixed Precision Quantization For Edge LLMsMeriem Bouzouad, Yuan-Hao Chang, Jalil Boukhobza
Today, large language models have demonstrated their strengths in various tasks ranging from reasoning, code generation, and complex problem solving. However, this advancement comes with a high computational cost and memory requirements, making it challenging to deploy these models on edge devices to ensure real-time responses and data privacy. Quantization is one common approach to reducing memory use, but most methods apply it uniformly across all layers. This does not account for the fact that different layers may respond differently to reduced precision. Importantly, memory consumption and computational throughput are not necessarily aligned, further complicating deployment decisions. This paper proposes an adaptive mixed precision quantization mechanism that balances memory, latency, and accuracy in edge deployment under user-defined priorities. This is achieved by analyzing the layer-wise contribution and by inferring how different quantization types behave across the target hardware platform in order to assign the most suitable quantization type to each layer. This integration ensures that layer importance and the overall performance trade-offs are jointly respected in this design. Our work unlocks new configuration designs that uniform quantization cannot achieve, expanding the solution space to efficiently deploy the LLMs on resource-constrained devices.
CVNov 7, 2023
CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake VideosWasim Ahmad, Yan-Tsung Peng, Yuan-Hao Chang et al.
Deep-fake videos, generated through AI face-swapping techniques, have gained significant attention due to their potential for impactful impersonation attacks. While most research focuses on real vs. fake detection, attributing a deep-fake to its specific generation model or encoder is vital for forensic analysis, enabling source tracing and tailored countermeasures. This enhances detection by leveraging model-specific artifacts and supports proactive defenses. We investigate the model attribution problem for deep-fake videos using two datasets: Deepfakes from Different Models (DFDM) and GANGen-Detection, both comprising deep-fake videos and GAN-generated images. We use only fake images from GANGen-Detection to align with DFDM's focus on attribution rather than binary classification. We formulate the task as a multiclass classification problem and introduce a novel Capsule-Spatial-Temporal (CapST) model that integrates a truncated VGG19 network for feature extraction, capsule networks for hierarchical encoding, and a spatio-temporal attention mechanism. Video-level fusion captures temporal dependencies across frames. Experiments on DFDM and GANGen-Detection show CapST outperforms baseline models in attribution accuracy while reducing computational cost.
CVJun 13, 2025
FAME: A Lightweight Spatio-Temporal Network for Model Attribution of Face-Swap DeepfakesWasim Ahmad, Yan-Tsung Peng, Yuan-Hao Chang
The widespread emergence of face-swap Deepfake videos poses growing risks to digital security, privacy, and media integrity, necessitating effective forensic tools for identifying the source of such manipulations. Although most prior research has focused primarily on binary Deepfake detection, the task of model attribution -- determining which generative model produced a given Deepfake -- remains underexplored. In this paper, we introduce FAME (Fake Attribution via Multilevel Embeddings), a lightweight and efficient spatio-temporal framework designed to capture subtle generative artifacts specific to different face-swap models. FAME integrates spatial and temporal attention mechanisms to improve attribution accuracy while remaining computationally efficient. We evaluate our model on three challenging and diverse datasets: Deepfake Detection and Manipulation (DFDM), FaceForensics++, and FakeAVCeleb. Results show that FAME consistently outperforms existing methods in both accuracy and runtime, highlighting its potential for deployment in real-world forensic and information security applications.