33.3LGJun 3
Toward Multi-Domain and Long-Tailed Quantization via Feature Alignment and ScalingChin-Yuan Yeh, Ting-An Chen, De-Nian Yang et al.
Quantizing deep neural networks is essential for efficient inference on resource-constrained devices. However, most existing methods are designed for single-domain and class-balanced data, leaving practical settings with domain shifts or severe class imbalance underexplored. We address these challenges with Efficient Multi-Domain Alignment Quantization (EmaQ), which aligns domain distributions through a CDF-based projection and uses sensitivity-aware weight aggregation to stabilize multi-domain quantization. We further extend EmaQ to EmaQ-LT for long-tailed quantization by introducing class-conditioned variance scaling and confidence-based logit adjustment to mitigate majority-class overconfidence. Theoretical analyses establish convergence guarantees and motivate the proposed sensitivity and scaling mechanisms. Experiments on standard, multi-domain (Office-31, Digits), and long-tailed (SynDigits-LT, CIFAR-10-LT, CIFAR-100-LT) benchmarks show that EmaQ and EmaQ-LT achieve strong low-bit performance under domain shift and class imbalance.
SINov 3, 2023
CDGraph: Dual Conditional Social Graph Synthesizing via Diffusion ModelJui-Yi Tsai, Ya-Wen Teng, Ho Chiok Yew et al.
The social graphs synthesized by the generative models are increasingly in demand due to data scarcity and concerns over user privacy. One of the key performance criteria for generating social networks is the fidelity to specified conditionals, such as users with certain membership and financial status. While recent diffusion models have shown remarkable performance in generating images, their effectiveness in synthesizing graphs has not yet been explored in the context of conditional social graphs. In this paper, we propose the first kind of conditional diffusion model for social networks, CDGraph, which trains and synthesizes graphs based on two specified conditions. We propose the co-evolution dependency in the denoising process of CDGraph to capture the mutual dependencies between the dual conditions and further incorporate social homophily and social contagion to preserve the connectivity between nodes while satisfying the specified conditions. Moreover, we introduce a novel classifier loss, which guides the training of the diffusion process through the mutual dependency of dual conditions. We evaluate CDGraph against four existing graph generative methods, i.e., SPECTRE, GSM, EDGE, and DiGress, on four datasets. Our results show that the generated graphs from CDGraph achieve much higher dual-conditional validity and lower discrepancy in various social network metrics than the baselines, thus demonstrating its proficiency in generating dual-conditional social graphs.
SIFeb 15, 2025
Human-Centric Community Detection in Hybrid Metaverse Networks with Integrated AI EntitiesShih-Hsuan Chiu, Ya-Wen Teng, De-Nian Yang et al.
Community detection is a cornerstone problem in social network analysis (SNA), aimed at identifying cohesive communities with minimal external links. However, the rise of generative AI and Metaverse introduce complexities by creating hybrid human-AI social networks (denoted by HASNs), where traditional methods fall short, especially in human-centric settings. This paper introduces a novel community detection problem in HASNs (denoted by MetaCD), which seeks to enhance human connectivity within communities while reducing the presence of AI nodes. Effective processing of MetaCD poses challenges due to the delicate trade-off between excluding certain AI nodes and maintaining community structure. To address this, we propose CUSA, an innovative framework incorporating AI-aware clustering techniques that navigate this trade-off by selectively retaining AI nodes that contribute to community integrity. Furthermore, given the scarcity of real-world HASNs, we devise four strategies for synthesizing these networks under various hypothetical scenarios. Empirical evaluations on real social networks, reconfigured as HASNs, demonstrate the effectiveness and practicality of our approach compared to traditional non-deep learning and graph neural network (GNN)-based methods.
MADec 21, 2024
Self-guided Knowledgeable Network of Thoughts: Amplifying Reasoning with Large Language ModelsChao-Chi Chen, Chin-Yuan Yeh, Hsi-Wen Chen et al.
We introduce Knowledgeable Network of Thoughts (kNoT): a prompt scheme that advances the capabilities of large language models (LLMs) beyond existing paradigms like Chain-of-Thought (CoT), Tree of Thoughts (ToT), and Graph of Thoughts (GoT). The key innovation of kNoT is the LLM Workflow Template (LWT), which allows for an executable plan to be specified by LLMs for LLMs. LWT allows these plans to be arbitrary networks, where single-step LLM operations are nodes, and edges correspond to message passing between these steps. Furthermore, LWT supports selection of individual elements through indexing, facilitating kNoT to produce intricate plans where each LLM operation can be limited to elementary operations, greatly enhancing reliability over extended task sequences. We demonstrate that kNoT significantly outperforms the state of the art on six use cases, while reducing the need for extensive prompt engineering. For instance, kNoT finds 92% accuracy for sorting 32 numbers over 12% and 31% for ToT and GoT, while utilizing up to 84.4% and 87.3% less task-specific prompts, respectively.
CVMay 1, 2024
In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic Detection under Rebalanced Deepfake Detection ProtocolWei-Han Wang, Chin-Yuan Yeh, Hsi-Wen Chen et al.
As deep generative models advance, we anticipate deepfakes achieving "perfection"-generating no discernible artifacts or noise. However, current deepfake detectors, intentionally or inadvertently, rely on such artifacts for detection, as they are exclusive to deepfakes and absent in genuine examples. To bridge this gap, we introduce the Rebalanced Deepfake Detection Protocol (RDDP) to stress-test detectors under balanced scenarios where genuine and forged examples bear similar artifacts. We offer two RDDP variants: RDDP-WHITEHAT uses white-hat deepfake algorithms to create 'self-deepfakes,' genuine portrait videos with the resemblance of the underlying identity, yet carry similar artifacts to deepfake videos; RDDP-SURROGATE employs surrogate functions (e.g., Gaussian noise) to process both genuine and forged examples, introducing equivalent noise, thereby sidestepping the need of deepfake algorithms. Towards detecting perfect deepfake videos that aligns with genuine ones, we present ID-Miner, a detector that identifies the puppeteer behind the disguise by focusing on motion over artifacts or appearances. As an identity-based detector, it authenticates videos by comparing them with reference footage. Equipped with the artifact-agnostic loss at frame-level and the identity-anchored loss at video-level, ID-Miner effectively singles out identity signals amidst distracting variations. Extensive experiments comparing ID-Miner with 12 baseline detectors under both conventional and RDDP evaluations with two deepfake datasets, along with additional qualitative studies, affirm the superiority of our method and the necessity for detectors designed to counter perfect deepfakes.
CVOct 6, 2021
Attack as the Best Defense: Nullifying Image-to-image Translation GANs via Limit-aware Adversarial AttackChin-Yuan Yeh, Hsi-Wen Chen, Hong-Han Shuai et al.
With the successful creation of high-quality image-to-image (Img2Img) translation GANs comes the non-ethical applications of DeepFake and DeepNude. Such misuses of img2img techniques present a challenging problem for society. In this work, we tackle the problem by introducing the Limit-Aware Self-Guiding Gradient Sliding Attack (LaS-GSA). LaS-GSA follows the Nullifying Attack to cancel the img2img translation process under a black-box setting. In other words, by processing input images with the proposed LaS-GSA before publishing, any targeted img2img GANs can be nullified, preventing the model from maliciously manipulating the images. To improve efficiency, we introduce the limit-aware random gradient-free estimation and the gradient sliding mechanism to estimate the gradient that adheres to the adversarial limit, i.e., the pixel value limitations of the adversarial example. Theoretical justifications validate how the above techniques prevent inefficiency caused by the adversarial limit in both the direction and the step length. Furthermore, an effective self-guiding prior is extracted solely from the threat model and the target image to efficiently leverage the prior information and guide the gradient estimation process. Extensive experiments demonstrate that LaS-GSA requires fewer queries to nullify the image translation process with higher success rates than 4 state-of-the-art black-box methods.
IROct 5, 2021
Live Multi-Streaming and Donation Recommendations via Coupled Donation-Response Tensor FactorizationHsu-Chao Lai, Jui-Yi Tsai, Hong-Han Shuai et al.
In contrast to traditional online videos, live multi-streaming supports real-time social interactions between multiple streamers and viewers, such as donations. However, donation and multi-streaming channel recommendations are challenging due to complicated streamer and viewer relations, asymmetric communications, and the tradeoff between personal interests and group interactions. In this paper, we introduce Multi-Stream Party (MSP) and formulate a new multi-streaming recommendation problem, called Donation and MSP Recommendation (DAMRec). We propose Multi-stream Party Recommender System (MARS) to extract latent features via socio-temporal coupled donation-response tensor factorization for donation and MSP recommendations. Experimental results on Twitch and Douyu manifest that MARS significantly outperforms existing recommenders by at least 38.8% in terms of hit ratio and mean average precision.
NIOct 4, 2021
Cybersickness-aware Tile-based Adaptive 360° Video StreamingChiao-Wen Lin, Chih-Hang Wang, De-Nian Yang et al.
In contrast to traditional videos, the imaging in virtual reality (VR) is 360°, and it consumes larger bandwidth to transmit video contents. To reduce bandwidth consumption, tile-based streaming has been proposed to deliver the focused part of the video, instead of the whole one. On the other hand, the techniques to alleviate cybersickness, which is akin to motion sickness and happens when using digital displays, have not been jointly explored with the tile selection in VR. In this paper, we investigate Tile Selection with Cybersickness Control (TSCC) in an adaptive 360° video streaming system with cybersickness alleviation. We propose an m-competitive online algorithm with Cybersickness Indicator (CI) and Video Loss Indicator (VLI) to evaluate instant cybersickness and the total loss of video quality. Moreover, the algorithm exploits Sickness Migration Indicator (SMI) to evaluate the cybersickness accumulated over time and the increase of optical flow to improve the tile quality assignment. Simulations with a real network dataset show that our algorithm outperforms the baselines regarding video quality and cybersickness accumulation.
MMMar 30, 2015
Error-Resilient Multicasting for Multi-View 3D Videos in Wireless NetworksChi-Heng Lin, De-Nian Yang, Ji-Tang Lee et al.
With the emergence of naked-eye 3D mobile devices, mobile 3D video services are becoming increasingly important for video service providers, such as Youtube and Netflix, while multi-view 3D videos have the potential to inspire a variety of innovative applications. However, enabling multi-view 3D video services may overwhelm WiFi networks when every view of a video are multicasted. In this paper, therefore, we propose to incorporate depth-image-based rendering (DIBR), which allows each mobile client to synthesize the desired view from nearby left and right views, in order to effectively reduce the bandwidth consumption. Moreover, when each client suffers from packet losses, retransmissions incur additional bandwidth consumption and excess delay, which in turn undermines the quality of experience in video applications. To address the above issue, we first discover the merit of view protection via DIBR for multi-view video multicast using a mathematical analysis and then design a new protocol, named Multi-View Group Management Protocol (MVGMP), to support the dynamic join and leave of users and the change of desired views. The simulation results demonstrate that our protocol effectively reduces bandwidth consumption and increases the probability for each client to successfully playback the desired views in a multi-view 3D video.
MMOct 15, 2014
Multi-View 3D Video Multicast for Broadband IP NetworksTing-Yu Ho, Yi-Nung Yeh, De-Nian Yang
With the recent emergence of 3D-supported TVs, video service providers now face an opportunity to provide high resolution multi-view 3D videos over IP networks. One simple way to support efficient communications between a video server and multiple clients is to deliver each desired view in a multicast stream. Nevertheless, it is expected that significantly increased bandwidth will be required to support the transmission of all views in multi-view 3D videos. However, the recent emergence of a new video synthesis technique called Depth-Image-Based Rendering (DIBR) suggests that multi-view 3D video does not necessarily require the transmission of all views. Therefore, we formulate a new problem, named Multi-view and Multicast Delivery Selection Problem (MMDS), and design an algorithm, called MMDEA, to find the optimal solution. Simulation results manifest that using DIBR can effectively reduce bandwidth consumption by 35% compared to the original multicast delivery scheme.