NIMay 13, 2024Code
DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoSQingyang Li, Yihang Zhang, Zhidong Jia et al.
It is an interesting question Can and How Large Language Models (LLMs) understand non-language network data, and help us detect unknown malicious flows. This paper takes Carpet Bombing as a case study and shows how to exploit LLMs' powerful capability in the networking area. Carpet Bombing is a new DDoS attack that has dramatically increased in recent years, significantly threatening network infrastructures. It targets multiple victim IPs within subnets, causing congestion on access links and disrupting network services for a vast number of users. Characterized by low-rates, multi-vectors, these attacks challenge traditional DDoS defenses. We propose DoLLM, a DDoS detection model utilizes open-source LLMs as backbone. By reorganizing non-contextual network flows into Flow-Sequences and projecting them into LLMs semantic space as token embeddings, DoLLM leverages LLMs' contextual understanding to extract flow representations in overall network context. The representations are used to improve the DDoS detection performance. We evaluate DoLLM with public datasets CIC-DDoS2019 and real NetFlow trace from Top-3 countrywide ISP. The tests have proven that DoLLM possesses strong detection capabilities. Its F1 score increased by up to 33.3% in zero-shot scenarios and by at least 20.6% in real ISP traces.
CVApr 14, 2025
GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian SplattingJunlin Hao, Peiheng Wang, Haoyang Wang et al.
Single-image 3D scene reconstruction presents significant challenges due to its inherently ill-posed nature and limited input constraints. Recent advances have explored two promising directions: multiview generative models that train on 3D consistent datasets but struggle with out-of-distribution generalization, and 3D scene inpainting and completion frameworks that suffer from cross-view inconsistency and suboptimal error handling, as they depend exclusively on depth data or 3D smoothness, which ultimately degrades output quality and computational performance. Building upon these approaches, we present GaussVideoDreamer, which advances generative multimedia approaches by bridging the gap between image, video, and 3D generation, integrating their strengths through two key innovations: (1) A progressive video inpainting strategy that harnesses temporal coherence for improved multiview consistency and faster convergence. (2) A 3D Gaussian Splatting consistency mask to guide the video diffusion with 3D consistent multiview evidence. Our pipeline combines three core components: a geometry-aware initialization protocol, Inconsistency-Aware Gaussian Splatting, and a progressive video inpainting strategy. Experimental results demonstrate that our approach achieves 32% higher LLaVA-IQA scores and at least 2x speedup compared to existing methods while maintaining robust performance across diverse scenes.
NIMar 20, 2025
PromptMobile: Efficient Promptus for Low Bandwidth Mobile Video StreamingLiming Liu, Jiangkai Wu, Haoyang Wang et al.
Traditional video compression algorithms exhibit significant quality degradation at extremely low bitrates. Promptus emerges as a new paradigm for video streaming, substantially cutting down the bandwidth essential for video streaming. However, Promptus is computationally intensive and can not run in real-time on mobile devices. This paper presents PromptMobile, an efficient acceleration framework tailored for on-device Promptus. Specifically, we propose (1) a two-stage efficient generation framework to reduce computational cost by 8.1x, (2) a fine-grained inter-frame caching to reduce redundant computations by 16.6%, (3) system-level optimizations to further enhance efficiency. The evaluations demonstrate that compared with the original Promptus, PromptMobile achieves a 13.6x increase in image generation speed. Compared with other streaming methods, PromptMobile achives an average LPIPS improvement of 0.016 (compared with H.265), reducing 60% of severely distorted frames (compared to VQGAN).
CVMar 23, 2021
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in VideosSijie Song, Xudong Lin, Jiaying Liu et al.
In this paper, we address the problem of referring expression comprehension in videos, which is challenging due to complex expression and scene dynamics. Unlike previous methods which solve the problem in multiple stages (i.e., tracking, proposal-based matching), we tackle the problem from a novel perspective, \textbf{co-grounding}, with an elegant one-stage framework. We enhance the single-frame grounding accuracy by semantic attention learning and improve the cross-frame grounding consistency with co-grounding feature learning. Semantic attention learning explicitly parses referring cues in different attributes to reduce the ambiguity in the complex expression. Co-grounding feature learning boosts visual feature representations by integrating temporal correlation to reduce the ambiguity caused by scene dynamics. Experiment results demonstrate the superiority of our framework on the video grounding datasets VID and LiOTB in generating accurate and stable results across frames. Our model is also applicable to referring expression comprehension in images, illustrated by the improved performance on the RefCOCO dataset. Our project is available at https://sijiesong.github.io/co-grounding.
CVJan 31, 2020
Modality Compensation Network: Cross-Modal Adaptation for Action RecognitionSijie Song, Jiaying Liu, Yanghao Li et al.
With the prevalence of RGB-D cameras, multi-modal video data have become more available for human action recognition. One main challenge for this task lies in how to effectively leverage their complementary information. In this work, we propose a Modality Compensation Network (MCN) to explore the relationships of different modalities, and boost the representations for human action recognition. We regard RGB/optical flow videos as source modalities, skeletons as auxiliary modality. Our goal is to extract more discriminative features from source modalities, with the help of auxiliary modality. Built on deep Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) networks, our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning, that the network learns to compensate for the loss of skeletons at test time and even at training time. We explore multiple adaptation schemes to narrow the distance between source and auxiliary modal distributions from different levels, according to the alignment of source and auxiliary data in training. In addition, skeletons are only required in the training phase. Our model is able to improve the recognition performance with source data when testing. Experimental results reveal that MCN outperforms state-of-the-art approaches on four widely-used action recognition benchmarks.
CVJan 9, 2020
Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn SketchesShuai Yang, Zhangyang Wang, Jiaying Liu et al.
Sketch-based image editing aims to synthesize and modify photos based on the structural information provided by the human-drawn sketches. Since sketches are difficult to collect, previous methods mainly use edge maps instead of sketches to train models (referred to as edge-based models). However, sketches display great structural discrepancy with edge maps, thus failing edge-based models. Moreover, sketches often demonstrate huge variety among different users, demanding even higher generalizability and robustness for the editing model to work. In this paper, we propose Deep Plastic Surgery, a novel, robust and controllable image editing framework that allows users to interactively edit images using hand-drawn sketch inputs. We present a sketch refinement strategy, as inspired by the coarse-to-fine drawing process of the artists, which we show can help our model well adapt to casual and varied sketches without the need for real sketch training data. Our model further provides a refinement level control parameter that enables users to flexibly define how "reliable" the input sketch should be considered for the final output, balancing between sketch faithfulness and output verisimilitude (as the two goals might contradict if the input sketch is drawn poorly). To achieve the multi-level refinement, we introduce a style-based module for level conditioning, which allows adaptive feature representations for different levels in a singe network. Extensive experimental results demonstrate the superiority of our approach in improving the visual quality and user controllablity of image editing over the state-of-the-art methods.
MMNov 11, 2019
Pano: Optimizing 360° Video Streaming with a Better Understanding of Quality PerceptionYu Guan, Chengyuan Zheng, Zongming Guo et al.
Streaming 360° videos requires more bandwidth than non-360° videos. This is because current solutions assume that users perceive the quality of 360° videos in the same way they perceive the quality of non-360° videos. This means the bandwidth demand must be proportional to the size of the user's field of view. However, we found several qualitydetermining factors unique to 360°videos, which can help reduce the bandwidth demand. They include the moving speed of a user's viewpoint (center of the user's field of view), the recent change of video luminance, and the difference in depth-of-fields of visual objects around the viewpoint. This paper presents Pano, a 360° video streaming system that leverages the 360° video-specific factors. We make three contributions. (1) We build a new quality model for 360° videos that captures the impact of the 360° video-specific factors. (2) Pano proposes a variable-sized tiling scheme in order to strike a balance between the perceived quality and video encoding efficiency. (3) Pano proposes a new qualityadaptation logic that maximizes 360° video user-perceived quality and is readily deployable. Our evaluation (based on user study and trace analysis) shows that compared with state-of-the-art techniques, Pano can save 41-46% bandwidth without any drop in the perceived quality, or it can raise the perceived quality (user rating) by 25%-142% without using more bandwidth.
LGSep 11, 2019
Joint Learning of Graph Representation and Node Features in Graph Convolutional Neural NetworksJiaxiang Tang, Wei Hu, Xiang Gao et al.
Graph Convolutional Neural Networks (GCNNs) extend classical CNNs to graph data domain, such as brain networks, social networks and 3D point clouds. It is critical to identify an appropriate graph for the subsequent graph convolution. Existing methods manually construct or learn one fixed graph for all the layers of a GCNN. In order to adapt to the underlying structure of node features in different layers, we propose dynamic learning of graphs and node features jointly in GCNNs. In particular, we cast the graph optimization problem as distance metric learning to capture pairwise similarities of features in each layer. We deploy the Mahalanobis distance metric and further decompose the metric matrix into a low-dimensional matrix, which converts graph learning to the optimization of a low-dimensional matrix for efficient implementation. Extensive experiments on point clouds and citation network datasets demonstrate the superiority of the proposed method in terms of both accuracies and robustness.
MMAug 6, 2019
Predictive Generalized Graph Fourier Transform for Attribute Compression of Dynamic Point CloudsYiqun Xu, Wei Hu, Shanshe Wang et al.
As 3D scanning devices and depth sensors advance, dynamic point clouds have attracted increasing attention as a format for 3D objects in motion, with applications in various fields such as immersive telepresence, navigation for autonomous driving and gaming. Nevertheless, the tremendous amount of data in dynamic point clouds significantly burden transmission and storage. To this end, we propose a complete compression framework for attributes of 3D dynamic point clouds, focusing on optimal inter-coding. Firstly, we derive the optimal inter-prediction and predictive transform coding assuming the Gaussian Markov Random Field model with respect to a spatio-temporal graph underlying the attributes of dynamic point clouds. The optimal predictive transform proves to be the Generalized Graph Fourier Transform in terms of spatio-temporal decorrelation. Secondly, we propose refined motion estimation via efficient registration prior to inter-prediction, which searches the temporal correspondence between adjacent frames of irregular point clouds. Finally, we present a complete framework based on the optimal inter-coding and our previously proposed intra-coding, where we determine the optimal coding mode from rate-distortion optimization with the proposed offline-trained $λ$-Q model. Experimental results show that we achieve around 17% bit rate reduction on average over competitive dynamic point cloud compression methods.
CVJul 22, 2019
Feature Graph Learning for 3D Point Cloud DenoisingWei Hu, Xiang Gao, Gene Cheung et al.
Identifying an appropriate underlying graph kernel that reflects pairwise similarities is critical in many recent graph spectral signal restoration schemes, including image denoising, dequantization, and contrast enhancement. Existing graph learning algorithms compute the most likely entries of a properly defined graph Laplacian matrix $\mathbf{L}$, but require a large number of signal observations $\mathbf{z}$'s for a stable estimate. In this work, we assume instead the availability of a relevant feature vector $\mathbf{f}_i$ per node $i$, from which we compute an optimal feature graph via optimization of a feature metric. Specifically, we alternately optimize the diagonal and off-diagonal entries of a Mahalanobis distance matrix $\mathbf{M}$ by minimizing the graph Laplacian regularizer (GLR) $\mathbf{z}^{\top} \mathbf{L} \mathbf{z}$, where edge weight is $w_{i,j} = \exp\{-(\mathbf{f}_i - \mathbf{f}_j)^{\top} \mathbf{M} (\mathbf{f}_i - \mathbf{f}_j) \}$, given a single observation $\mathbf{z}$. We optimize diagonal entries via proximal gradient (PG), where we constrain $\mathbf{M}$ to be positive definite (PD) via linear inequalities derived from the Gershgorin circle theorem. To optimize off-diagonal entries, we design a block descent algorithm that iteratively optimizes one row and column of $\mathbf{M}$. To keep $\mathbf{M}$ PD, we constrain the Schur complement of sub-matrix $\mathbf{M}_{2,2}$ of $\mathbf{M}$ to be PD when optimizing via PG. Our algorithm mitigates full eigen-decomposition of $\mathbf{M}$, thus ensuring fast computation speed even when feature vector $\mathbf{f}_i$ has high dimension. To validate its usefulness, we apply our feature graph learning algorithm to the problem of 3D point cloud denoising, resulting in state-of-the-art performance compared to competing schemes in extensive experiments.
MMMay 15, 2019
Statistical Learning Based Congestion Control for Real-time Video CommunicationTongyu Dai, Xinggong Zhang, Yihang Zhang et al.
With the increasing demands on interactive video applications, how to adapt video bit rate to avoid network congestion has become critical, since congestion results in self-inflicted delay and packet loss which deteriorate the quality of real-time video service. The existing congestion control is hard to simultaneously achieve low latency, high throughput, good adaptability and fair bandwidth allocation, mainly because of the hardwired control strategy and egocentric convergence objective. To address these issues, we propose an end-to-end statistical learning based congestion control, named Iris. By exploring the underlying principles of self-inflicted delay, we reveal that congestion delay is determined by sending rate, receiving rate and network status, which inspires us to control video bit rate using a statistical-learning congestion control model. The key idea of Iris is to force all flows to converge to the same queue load, and adjust the bit rate by the model. All flows keep a small and fixed number of packets queuing in the network, thus the fair bandwidth allocation and low latency are both achieved. Besides, the adjustment step size of sending rate is updated by online learning, to better adapt to dynamically changing networks. We carried out extensive experiments to evaluate the performance of Iris, with the implementations of transport layer (UDP) and application layer (QUIC) respectively. The testing environment includes emulated network, real-world Internet and commercial LTE networks. Compared against TCP flavors and state-of-the-art protocols, Iris is able to achieve high bandwidth utilization, low latency and good fairness concurrently. Especially over QUIC, Iris is able to increase the video bitrate up to 25%, and PSNR up to 1dB.
CVMay 3, 2019
Controllable Artistic Text Style Transfer via Shape-Matching GANShuai Yang, Zhangyang Wang, Zhaowen Wang et al.
Artistic text style transfer is the task of migrating the style from a source image to the target text to create artistic typography. Recent style transfer methods have considered texture control to enhance usability. However, controlling the stylistic degree in terms of shape deformation remains an important open challenge. In this paper, we present the first text style transfer network that allows for real-time control of the crucial stylistic degree of the glyph through an adjustable parameter. Our key contribution is a novel bidirectional shape matching framework to establish an effective glyph-style mapping at various deformation levels without paired ground truth. Based on this idea, we propose a scale-controllable module to empower a single network to continuously characterize the multi-scale shape features of the style image and transfer these features to the target text. The proposed method demonstrates its superiority over previous state-of-the-arts in generating diverse, controllable and high-quality stylized text.
GRApr 23, 2019
3D Dynamic Point Cloud Inpainting via Temporal Consistency on GraphsZeqing Fu, Wei Hu, Zongming Guo
With the development of 3D laser scanning techniques and depth sensors, 3D dynamic point clouds have attracted increasing attention as a representation of 3D objects in motion, enabling various applications such as 3D immersive tele-presence, gaming and navigation. However, dynamic point clouds usually exhibit holes of missing data, mainly due to the fast motion, the limitation of acquisition and complicated structure. Leveraging on graph signal processing tools, we represent irregular point clouds on graphs and propose a novel inpainting method exploiting both intra-frame self-similarity and inter-frame consistency in 3D dynamic point clouds. Specifically, for each missing region in every frame of the point cloud sequence, we search for its self-similar regions in the current frame and corresponding ones in adjacent frames as references. Then we formulate dynamic point cloud inpainting as an optimization problem based on the two types of references, which is regularized by a graph-signal smoothness prior. Experimental results show the proposed approach outperforms three competing methods significantly, both in objective and subjective quality.
LGApr 23, 2019
Exploring Structure-Adaptive Graph Learning for Robust Semi-Supervised ClassificationXiang Gao, Wei Hu, Zongming Guo
Graph Convolutional Neural Networks (GCNNs) are generalizations of CNNs to graph-structured data, in which convolution is guided by the graph topology. In many cases where graphs are unavailable, existing methods manually construct graphs or learn task-driven adaptive graphs. In this paper, we propose Graph Learning Neural Networks (GLNNs), which exploit the optimization of graphs (the adjacency matrix in particular) from both data and tasks. Leveraging on spectral graph theory, we propose the objective of graph learning from a sparsity constraint, properties of a valid adjacency matrix as well as a graph Laplacian regularizer via maximum a posteriori estimation. The optimization objective is then integrated into the loss function of the GCNN, which adapts the graph topology to not only labels of a specific task but also the input data. Experimental results show that our proposed GLNN outperforms state-of-the-art approaches over widely adopted social network datasets and citation network datasets for semi-supervised classification.
CVDec 29, 2018
Feature Preserving and Uniformity-controllable Point Cloud Simplification on GraphJunkun Qi, Wei Hu, Zongming Guo
With the development of 3D sensing technologies, point clouds have attracted increasing attention in a variety of applications for 3D object representation, such as autonomous driving, 3D immersive tele-presence and heritage reconstruction. However, it is challenging to process large-scale point clouds in terms of both computation time and storage due to the tremendous amounts of data. Hence, we propose a point cloud simplification algorithm, aiming to strike a balance between preserving sharp features and keeping uniform density during resampling. In particular, leveraging on graph spectral processing, we represent irregular point clouds naturally on graphs, and propose concise formulations of feature preservation and density uniformity based on graph filters. The problem of point cloud simplification is finally formulated as a trade-off between the two factors and efficiently solved by our proposed algorithm. Experimental results demonstrate the superiority of our method, as well as its efficient application in point cloud registration.
CVDec 16, 2018
TET-GAN: Text Effects Transfer via Stylization and DestylizationShuai Yang, Jiaying Liu, Wenjing Wang et al.
Text effects transfer technology automatically makes the text dramatically more impressive. However, previous style transfer methods either study the model for general style, which cannot handle the highly-structured text effects along the glyph, or require manual design of subtle matching criteria for text effects. In this paper, we focus on the use of the powerful representation abilities of deep neural features for text effects transfer. For this purpose, we propose a novel Texture Effects Transfer GAN (TET-GAN), which consists of a stylization subnetwork and a destylization subnetwork. The key idea is to train our network to accomplish both the objective of style transfer and style removal, so that it can learn to disentangle and recombine the content and style features of text effects images. To support the training of our network, we propose a new text effects dataset with as much as 64 professionally designed styles on 837 characters. We show that the disentangled feature representations enable us to transfer or remove all these styles on arbitrary glyphs using one network. Furthermore, the flexible network design empowers TET-GAN to efficiently extend to a new text style via one-shot learning where only one example is required. We demonstrate the superiority of the proposed method in generating high-quality stylized text over the state-of-the-art methods.
CVNov 29, 2018
Optimized Skeleton-based Action Recognition via Sparsified Graph RegressionXiang Gao, Wei Hu, Jiaxiang Tang et al.
With the prevalence of accessible depth sensors, dynamic human body skeletons have attracted much attention as a robust modality for action recognition. Previous methods model skeletons based on RNN or CNN, which has limited expressive power for irregular skeleton joints. While graph convolutional networks (GCN) have been proposed to address irregular graph-structured data, the fundamental graph construction remains challenging. In this paper, we represent skeletons naturally on graphs, and propose a graph regression based GCN (GR-GCN) for skeleton-based action recognition, aiming to capture the spatio-temporal variation in the data. As the graph representation is crucial to graph convolution, we first propose graph regression to statistically learn the underlying graph from multiple observations. In particular, we provide spatio-temporal modeling of skeletons and pose an optimization problem on the graph structure over consecutive frames, which enforces the sparsity of the underlying graph for efficient representation. The optimized graph not only connects each joint to its neighboring joints in the same frame strongly or weakly, but also links with relevant joints in the previous and subsequent frames. We then feed the optimized graph into the GCN along with the coordinates of the skeleton sequence for feature learning, where we deploy high-order and fast Chebyshev approximation of spectral graph convolution. Further, we provide analysis of the variation characterization by the Chebyshev approximation. Experimental results validate the effectiveness of the proposed graph regression and show that the proposed GR-GCN achieves the state-of-the-art performance on the widely used NTU RGB+D, UT-Kinect and SYSU 3D datasets.
CVNov 28, 2018
Exploring Hypergraph Representation on Face Anti-spoofing Beyond 2D AttacksWei Hu, Gusi Te, Ju He et al.
Face anti-spoofing plays a crucial role in protecting face recognition systems from various attacks. Previous model-based and deep learning approaches achieve satisfactory performance for 2D face spoofs, but remain limited for more advanced 3D attacks such as vivid masks. In this paper, we address 3D face anti-spoofing via the proposed Hypergraph Convolutional Neural Networks (HGCNN). Firstly, we construct a computation-efficient and posture-invariant face representation with only a few key points on hypergraphs. The hypergraph representation is then fed into the designed HGCNN with hypergraph convolution for feature extraction, while the depth auxiliary is also exploited for 3D mask anti-spoofing. Further, we build a 3D face attack database with color, depth and infrared light information to overcome the deficiency of 3D face anti-spoofing data. Experiments show that our method achieves the state-of-the-art performance over widely used 3D and 2D databases as well as the proposed one under various tests.
CVOct 9, 2018
Context-Aware Text-Based Binary Image Stylization and SynthesisShuai Yang, Jiaying Liu, Wenhan Yang et al.
In this work, we present a new framework for the stylization of text-based binary images. First, our method stylizes the stroke-based geometric shape like text, symbols and icons in the target binary image based on an input style image. Second, the composition of the stylized geometric shape and a background image is explored. To accomplish the task, we propose legibility-preserving structure and texture transfer algorithms, which progressively narrow the visual differences between the binary image and the style image. The stylization is then followed by a context-aware layout design algorithm, where cues for both seamlessness and aesthetics are employed to determine the optimal layout of the shape in the background. Given the layout, the binary image is seamlessly embedded into the background by texture synthesis under a context-aware boundary constraint. According to the contents of binary images, our method can be applied to many fields. We show that the proposed method is capable of addressing the unsupervised text stylization problem and is superior to state-of-the-art style transfer methods in automatic artistic typography creation. Besides, extensive experiments on various tasks, such as visual-textual presentation synthesis, icon/symbol rendering and structure-guided image inpainting, demonstrate the effectiveness of the proposed method.
CVSep 28, 2018
Local Frequency Interpretation and Non-Local Self-Similarity on Graph for Point Cloud InpaintingZeqing Fu, Wei Hu, Zongming Guo
As 3D scanning devices and depth sensors mature, point clouds have attracted increasing attention as a format for 3D object representation, with applications in various fields such as tele-presence, navigation and heritage reconstruction. However, point clouds usually exhibit holes of missing data, mainly due to the limitation of acquisition techniques and complicated structure. Further, point clouds are defined on irregular non-Euclidean domains, which is challenging to address especially with conventional signal processing tools. Hence, leveraging on recent advances in graph signal processing, we propose an efficient point cloud inpainting method, exploiting both the local smoothness and the non-local self-similarity in point clouds. Specifically, we first propose a frequency interpretation in graph nodal domain, based on which we introduce the local graph-signal smoothness prior in order to describe the local smoothness of point clouds. Secondly, we explore the characteristics of non-local self-similarity, by globally searching for the most similar area to the missing region. The similarity metric between two areas is defined based on the direct component and the anisotropic graph total variation of normals in each area. Finally, we formulate the hole-filling step as an optimization problem based on the selected most similar area and regularized by the graph-signal smoothness prior. Besides, we propose voxelization and automatic hole detection methods for the point cloud prior to inpainting. Experimental results show that the proposed approach outperforms four competing methods significantly, both in objective and subjective quality.
CVJun 8, 2018
RGCNN: Regularized Graph CNN for Point Cloud SegmentationGusi Te, Wei Hu, Zongming Guo et al.
Point cloud, an efficient 3D object representation, has become popular with the development of depth sensing and 3D laser scanning techniques. It has attracted attention in various applications such as 3D tele-presence, navigation for unmanned vehicles and heritage reconstruction. The understanding of point clouds, such as point cloud segmentation, is crucial in exploiting the informative value of point clouds for such applications. Due to the irregularity of the data format, previous deep learning works often convert point clouds to regular 3D voxel grids or collections of images before feeding them into neural networks, which leads to voluminous data and quantization artifacts. In this paper, we instead propose a regularized graph convolutional neural network (RGCNN) that directly consumes point clouds. Leveraging on spectral graph theory, we treat features of points in a point cloud as signals on graph, and define the convolution over graph by Chebyshev polynomial approximation. In particular, we update the graph Laplacian matrix that describes the connectivity of features in each layer according to the corresponding learned features, which adaptively captures the structure of dynamic graphs. Further, we deploy a graph-signal smoothness prior in the loss function, thus regularizing the learning process. Experimental results on the ShapeNet part dataset show that the proposed approach significantly reduces the computational complexity while achieving competitive performance with the state of the art. Also, experiments show RGCNN is much more robust to both noise and point cloud density in comparison with other methods. We further apply RGCNN to point cloud classification and achieve competitive results on ModelNet40 dataset.
MMApr 27, 2017
TFDASH: A Fairness, Stability, and Efficiency Aware Rate Control Approach for Multiple Clients over DASHChao Zhou, Chia-Wen Lin, Xinggong Zhang et al.
Dynamic adaptive streaming over HTTP (DASH) has recently been widely deployed in the Internet and adopted in the industry. It, however, does not impose any adaptation logic for selecting the quality of video fragments requested by clients and suffers from lackluster performance with respect to a number of desirable properties: efficiency, stability, and fairness when multiple players compete for a bottleneck link. In this paper, we propose a throughput-friendly DASH (TFDASH) rate control scheme for video streaming with multiple clients over DASH to well balance the trade-offs among efficiency, stability, and fairness. The core idea behind guaranteeing fairness and high efficiency (bandwidth utilization) is to avoid OFF periods during the downloading process for all clients, i.e., the bandwidth is in perfect-subscription or over-subscription with bandwidth utilization approach to 100\%. We also propose a dual-threshold buffer model to solve the instability problem caused by the above idea. As a result, by integrating these novel components, we also propose a probability-driven rate adaption logic taking into account several key factors that most influence visual quality, including buffer occupancy, video playback quality, video bit-rate switching frequency and amplitude, to guarantee high-quality video streaming. Our experiments evidently demonstrate the superior performance of the proposed method.
CVJan 20, 2017
Dual Recovery Network with Online Compensation for Image Super-ResolutionSifeng Xia, Wenhan Yang, Jiaying Liu et al.
Image super-resolution (SR) methods essentially lead to a loss of some high-frequency (HF) information when predicting high-resolution (HR) images from low-resolution (LR) images without using external references. To address this issue, we additionally utilize online retrieved data to facilitate image SR in a unified deep framework. A novel dual high-frequency recovery network (DHN) is proposed to predict an HR image with three parts: an LR image, an internal inferred HF (IHF) map (HF missing part inferred solely from the LR image) and an external extracted HF (EHF) map. In particular, we infer the HF information based on both the LR image and similar HR references which are retrieved online. For the EHF map, we align the references with affine transformation and then in the aligned references, part of HF signals are extracted by the proposed DHN to compensate for the HF loss. Extensive experimental results demonstrate that our DHN achieves notably better performance than state-of-the-art SR methods.
CVNov 28, 2016
Awesome Typography: Statistics-Based Text Effects TransferShuai Yang, Jiaying Liu, Zhouhui Lian et al.
In this work, we explore the problem of generating fantastic special-effects for the typography. It is quite challenging due to the model diversities to illustrate varied text effects for different characters. To address this issue, our key idea is to exploit the analytics on the high regularity of the spatial distribution for text effects to guide the synthesis process. Specifically, we characterize the stylized patches by their normalized positions and the optimal scales to depict their style elements. Our method first estimates these two features and derives their correlation statistically. They are then converted into soft constraints for texture transfer to accomplish adaptive multi-scale texture synthesis and to make style element distribution uniform. It allows our algorithm to produce artistic typography that fits for both local texture patterns and the global spatial distribution in the example. Experimental results demonstrate the superiority of our method for various text effects over conventional style transfer methods. In addition, we validate the effectiveness of our algorithm with extensive artistic typography library generation.
CVSep 25, 2016
Deep Joint Rain Detection and Removal from a Single ImageWenhan Yang, Robby T. Tan, Jiashi Feng et al.
In this paper, we address a rain removal problem from a single image, even in the presence of heavy rain and rain streak accumulation. Our core ideas lie in the new rain image models and a novel deep learning architecture. We first modify an existing model comprising a rain streak layer and a background layer, by adding a binary map that locates rain streak regions. Second, we create a new model consisting of a component representing rain streak accumulation (where individual streaks cannot be seen, and thus visually similar to mist or fog), and another component representing various shapes and directions of overlapping rain streaks, which usually happen in heavy rain. Based on the first model, we develop a multi-task deep learning architecture that learns the binary rain streak map, the appearance of rain streaks, and the clean background, which is our ultimate output. The additional binary map is critically beneficial, since its loss function can provide additional strong information to the network. To handle rain streak accumulation (again, a phenomenon visually similar to mist or fog) and various shapes and directions of overlapping rain streaks, we propose a recurrent rain detection and removal network that removes rain streaks and clears up the rain accumulation iteratively and progressively. In each recurrence of our method, a new contextualized dilated network is developed to exploit regional contextual information and outputs better representation for rain detection. The evaluation on real images, particularly on heavy rain, shows the effectiveness of our novel models and architecture, outperforming the state-of-the-art methods significantly. Our codes and data sets will be publicly available.
CVMay 3, 2016
MARLow: A Joint Multiplanar Autoregressive and Low-Rank Approach for Image CompletionMading Li, Jiaying Liu, Zhiwei Xiong et al.
In this paper, we propose a novel multiplanar autoregressive (AR) model to exploit the correlation in cross-dimensional planes of a similar patch group collected in an image, which has long been neglected by previous AR models. On that basis, we then present a joint multiplanar AR and low-rank based approach (MARLow) for image completion from random sampling, which exploits the nonlocal self-similarity within natural images more effectively. Specifically, the multiplanar AR model constraints the local stationarity in different cross-sections of the patch group, while the low-rank minimization captures the intrinsic coherence of nonlocal patches. The proposed approach can be readily extended to multichannel images (e.g. color images), by simultaneously considering the correlation in different channels. Experimental results demonstrate that the proposed approach significantly outperforms state-of-the-art methods, even if the pixel missing rate is as high as 90%.
CVApr 29, 2016
Deep Edge Guided Recurrent Residual Learning for Image Super-ResolutionWenhan Yang, Jiashi Feng, Jianchao Yang et al.
In this work, we consider the image super-resolution (SR) problem. The main challenge of image SR is to recover high-frequency details of a low-resolution (LR) image that are important for human perception. To address this essentially ill-posed problem, we introduce a Deep Edge Guided REcurrent rEsidual~(DEGREE) network to progressively recover the high-frequency details. Different from most of existing methods that aim at predicting high-resolution (HR) images directly, DEGREE investigates an alternative route to recover the difference between a pair of LR and HR images by recurrent residual learning. DEGREE further augments the SR process with edge-preserving capability, namely the LR image and its edge map can jointly infer the sharp edge details of the HR image during the recurrent recovery process. To speed up its training convergence rate, by-pass connections across multiple layers of DEGREE are constructed. In addition, we offer an understanding on DEGREE from the view-point of sub-band frequency decomposition on image signal and experimentally demonstrate how DEGREE can recover different frequency bands separately. Extensive experiments on three benchmark datasets clearly demonstrate the superiority of DEGREE over well-established baselines and DEGREE also provides new state-of-the-arts on these datasets.