IVFeb 27, 2023
LSR: A Light-Weight Super-Resolution MethodWei Wang, Xuejing Lei, Yueru Chen et al.
A light-weight super-resolution (LSR) method from a single image targeting mobile applications is proposed in this work. LSR predicts the residual image between the interpolated low-resolution (ILR) and high-resolution (HR) images using a self-supervised framework. To lower the computational complexity, LSR does not adopt the end-to-end optimization deep networks. It consists of three modules: 1) generation of a pool of rich and diversified representations in the neighborhood of a target pixel via unsupervised learning, 2) selecting a subset from the representation pool that is most relevant to the underlying super-resolution task automatically via supervised learning, 3) predicting the residual of the target pixel via regression. LSR has low computational complexity and reasonable model size so that it can be implemented on mobile/edge platforms conveniently. Besides, it offers better visual quality than classical exemplar-based methods in terms of PSNR/SSIM measures.
ROOct 2, 2025
PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics RandomizationZixing Lei, Zibo Zhou, Sheng Yin et al.
Humanoid whole-body control (WBC) policies trained in simulation often suffer from the sim-to-real gap, which fundamentally arises from simulator inductive bias, the inherent assumptions and limitations of any single simulator. These biases lead to nontrivial discrepancies both across simulators and between simulation and the real world. To mitigate the effect of simulator inductive bias, the key idea is to train policies jointly across multiple simulators, encouraging the learned controller to capture dynamics that generalize beyond any single simulator's assumptions. We thus introduce PolySim, a WBC training platform that integrates multiple heterogeneous simulators. PolySim can launch parallel environments from different engines simultaneously within a single training run, thereby realizing dynamics-level domain randomization. Theoretically, we show that PolySim yields a tighter upper bound on simulator inductive bias than single-simulator training. In experiments, PolySim substantially reduces motion-tracking error in sim-to-sim evaluations; for example, on MuJoCo, it improves execution success by 52.8 over an IsaacSim baseline. PolySim further enables zero-shot deployment on a real Unitree G1 without additional fine-tuning, showing effective transfer from simulation to the real world. We will release the PolySim code upon acceptance of this work.
CVApr 1, 2025
Hierarchical Attention Networks for Lossless Point Cloud Attribute CompressionYueru Chen, Wei Zhang, Dingquan Li et al.
In this paper, we propose a deep hierarchical attention context model for lossless attribute compression of point clouds, leveraging a multi-resolution spatial structure and residual learning. A simple and effective Level of Detail (LoD) structure is introduced to yield a coarse-to-fine representation. To enhance efficiency, points within the same refinement level are encoded in parallel, sharing a common context point group. By hierarchically aggregating information from neighboring points, our attention model learns contextual dependencies across varying scales and densities, enabling comprehensive feature extraction. We also adopt normalization for position coordinates and attributes to achieve scale-invariant compression. Additionally, we segment the point cloud into multiple slices to facilitate parallel processing, further optimizing time complexity. Experimental results demonstrate that the proposed method offers better coding performance than the latest G-PCC for color and reflectance attributes while maintaining more efficient encoding and decoding runtimes.
CVOct 29, 2020
Point Cloud Attribute Compression via Successive Subspace Graph TransformYueru Chen, Yiting Shao, Jing Wang et al.
Inspired by the recently proposed successive subspace learning (SSL) principles, we develop a successive subspace graph transform (SSGT) to address point cloud attribute compression in this work. The octree geometry structure is utilized to partition the point cloud, where every node of the octree represents a point cloud subspace with a certain spatial size. We design a weighted graph with self-loop to describe the subspace and define a graph Fourier transform based on the normalized graph Laplacian. The transforms are applied to large point clouds from the leaf nodes to the root node of the octree recursively, while the represented subspace is expanded from the smallest one to the whole point cloud successively. It is shown by experimental results that the proposed SSGT method offers better R-D performances than the previous Region Adaptive Haar Transform (RAHT) method.
IVFeb 8, 2020
PixelHop++: A Small Successive-Subspace-Learning-Based (SSL-based) Model for Image ClassificationYueru Chen, Mozhdeh Rouhsedaghat, Suya You et al.
The successive subspace learning (SSL) principle was developed and used to design an interpretable learning model, known as the PixelHop method,for image classification in our prior work. Here, we propose an improved PixelHop method and call it PixelHop++. First, to make the PixelHop model size smaller, we decouple a joint spatial-spectral input tensor to multiple spatial tensors (one for each spectral component) under the spatial-spectral separability assumption and perform the Saab transform in a channel-wise manner, called the channel-wise (c/w) Saab transform.Second, by performing this operation from one hop to another successively, we construct a channel-decomposed feature tree whose leaf nodes contain features of one dimension (1D). Third, these 1D features are ranked according to their cross-entropy values, which allows us to select a subset of discriminant features for image classification. In PixelHop++, one can control the learning model size of fine-granularity,offering a flexible tradeoff between the model size and the classification performance. We demonstrate the flexibility of PixelHop++ on MNIST, Fashion MNIST, and CIFAR-10 three datasets.
LGSep 17, 2019
PixelHop: A Successive Subspace Learning (SSL) Method for Object ClassificationYueru Chen, C. -C. Jay Kuo
A new machine learning methodology, called successive subspace learning (SSL), is introduced in this work. SSL contains four key ingredients: 1) successive near-to-far neighborhood expansion; 2) unsupervised dimension reduction via subspace approximation; 3) supervised dimension reduction via label-assisted regression (LAG); and 4) feature concatenation and decision making. An image-based object classification method, called PixelHop, is proposed to illustrate the SSL design. It is shown by experimental results that the PixelHop method outperforms the classic CNN model of similar model complexity in three benchmarking datasets (MNIST, Fashion MNIST and CIFAR-10). Although SSL and deep learning (DL) have some high-level concept in common, they are fundamentally different in model formulation, the training process and training complexity. Extensive discussion on the comparison of SSL and DL is made to provide further insights into the potential of SSL.
CVFeb 6, 2019
Semi-supervised learning via Feedforward-Designed Convolutional Neural NetworksYueru Chen, Yijing Yang, Min Zhang et al.
A semi-supervised learning framework using the feedforward-designed convolutional neural networks (FF-CNNs) is proposed for image classification in this work. One unique property of FF-CNNs is that no backpropagation is used in model parameters determination. Since unlabeled data may not always enhance semi-supervised learning, we define an effective quality score and use it to select a subset of unlabeled data in the training process. We conduct experiments on the MNIST, SVHN, and CIFAR-10 datasets, and show that the proposed semi-supervised FF-CNN solution outperforms the CNN trained by backpropagation (BP-CNN) when the amount of labeled data is reduced. Furthermore, we develop an ensemble system that combines the output decision vectors of different semi-supervised FF-CNNs to boost classification accuracy. The ensemble systems can achieve further performance gains on all three benchmarking datasets.
CVJan 8, 2019
Ensembles of feedforward-designed convolutional neural networksYueru Chen, Yijing Yang, Wei Wang et al.
An ensemble method that fuses the output decision vectors of multiple feedforward-designed convolutional neural networks (FF-CNNs) to solve the image classification problem is proposed in this work. To enhance the performance of the ensemble system, it is critical to increasing the diversity of FF-CNN models. To achieve this objective, we introduce diversities by adopting three strategies: 1) different parameter settings in convolutional layers, 2) flexible feature subsets fed into the Fully-connected (FC) layers, and 3) multiple image embeddings of the same input source. Furthermore, we partition input samples into easy and hard ones based on their decision confidence scores. As a result, we can develop a new ensemble system tailored to hard samples to further boost classification accuracy. Experiments are conducted on the MNIST and CIFAR-10 datasets to demonstrate the effectiveness of the ensemble method.
CVDec 19, 2018
Towards Visible and Thermal Drone Monitoring with Convolutional Neural NetworksYe Wang, Yueru Chen, Jongmoo Choi et al.
This paper reports a visible and thermal drone monitoring system that integrates deep-learning-based detection and tracking modules. The biggest challenge in adopting deep learning methods for drone detection is the paucity of training drone images especially thermal drone images. To address this issue, we develop two data augmentation techniques. One is a model-based drone augmentation technique that automatically generates visible drone images with a bounding box label on the drone's location. The other is exploiting an adversarial data augmentation methodology to create thermal drone images. To track a small flying drone, we utilize the residual information between consecutive image frames. Finally, we present an integrated detection and tracking system that outperforms the performance of each individual module containing detection or tracking only. The experiments show that even being trained on synthetic data, the proposed system performs well on real-world drone images with complex background. The USC drone detection and tracking dataset with user labeled bounding boxes is available to the public.
CVDec 19, 2018
Unsupervised Video Object Segmentation with Distractor-Aware Online AdaptationYe Wang, Jongmoo Choi, Yueru Chen et al.
Unsupervised video object segmentation is a crucial application in video analysis without knowing any prior information about the objects. It becomes tremendously challenging when multiple objects occur and interact in a given video clip. In this paper, a novel unsupervised video object segmentation approach via distractor-aware online adaptation (DOA) is proposed. DOA models spatial-temporal consistency in video sequences by capturing background dependencies from adjacent frames. Instance proposals are generated by the instance segmentation network for each frame and then selected by motion information as hard negatives if they exist and positives. To adopt high-quality hard negatives, the block matching algorithm is then applied to preceding frames to track the associated hard negatives. General negatives are also introduced in case that there are no hard negatives in the sequence and experiments demonstrate both kinds of negatives (distractors) are complementary. Finally, we conduct DOA using the positive, negative, and hard negative masks to update the foreground/background segmentation. The proposed approach achieves state-of-the-art results on two benchmark datasets, DAVIS 2016 and FBMS-59 datasets.
CVDec 13, 2018
Design Pseudo Ground Truth with Motion Cue for Unsupervised Video Object SegmentationYe Wang, Jongmoo Choi, Yueru Chen et al.
One major technique debt in video object segmentation is to label the object masks for training instances. As a result, we propose to prepare inexpensive, yet high quality pseudo ground truth corrected with motion cue for video object segmentation training. Our method conducts semantic segmentation using instance segmentation networks and, then, selects the segmented object of interest as the pseudo ground truth based on the motion information. Afterwards, the pseudo ground truth is exploited to finetune the pretrained objectness network to facilitate object segmentation in the remaining frames of the video. We show that the pseudo ground truth could effectively improve the segmentation performance. This straightforward unsupervised video object segmentation method is more efficient than existing methods. Experimental results on DAVIS and FBMS show that the proposed method outperforms state-of-the-art unsupervised segmentation methods on various benchmark datasets. And the category-agnostic pseudo ground truth has great potential to extend to multiple arbitrary object tracking.
LGDec 2, 2018
Analysis on Gradient Propagation in Batch Normalized Residual NetworksAbhishek Panigrahi, Yueru Chen, C. -C. Jay Kuo
We conduct mathematical analysis on the effect of batch normalization (BN) on gradient backpropogation in residual network training, which is believed to play a critical role in addressing the gradient vanishing/explosion problem, in this work. By analyzing the mean and variance behavior of the input and the gradient in the forward and backward passes through the BN and residual branches, respectively, we show that they work together to confine the gradient variance to a certain range across residual blocks in backpropagation. As a result, the gradient vanishing/explosion problem is avoided. We also show the relative importance of batch normalization w.r.t. the residual branches in residual networks.
CVNov 11, 2018
An Interpretable Generative Model for Handwritten Digit Image SynthesisYao Zhu, Saksham Suri, Pranav Kulkarni et al.
An interpretable generative model for handwritten digits synthesis is proposed in this work. Modern image generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), are trained by backpropagation (BP). The training process is complex and the underlying mechanism is difficult to explain. We propose an interpretable multi-stage PCA method to achieve the same goal and use handwritten digit images synthesis as an illustrative example. First, we derive principal-component-analysis-based (PCA-based) transform kernels at each stage based on the covariance of its inputs. This results in a sequence of transforms that convert input images of correlated pixels to spectral vectors of uncorrelated components. In other words, it is a whitening process. Then, we can synthesize an image based on random vectors and multi-stage transform kernels through a coloring process. The generative model is a feedforward (FF) design since no BP is used in model parameter determination. Its design complexity is significantly lower, and the whole design process is explainable. Finally, we design an FF generative model using the MNIST dataset, compare synthesis results with those obtained by state-of-the-art GAN and VAE methods, and show that the proposed generative model achieves comparable performance.
CVOct 5, 2018
Interpretable Convolutional Neural Networks via Feedforward DesignC. -C. Jay Kuo, Min Zhang, Siyang Li et al.
The model parameters of convolutional neural networks (CNNs) are determined by backpropagation (BP). In this work, we propose an interpretable feedforward (FF) design without any BP as a reference. The FF design adopts a data-centric approach. It derives network parameters of the current layer based on data statistics from the output of the previous layer in a one-pass manner. To construct convolutional layers, we develop a new signal transform, called the Saab (Subspace Approximation with Adjusted Bias) transform. It is a variant of the principal component analysis (PCA) with an added bias vector to annihilate activation's nonlinearity. Multiple Saab transforms in cascade yield multiple convolutional layers. As to fully-connected (FC) layers, we construct them using a cascade of multi-stage linear least squared regressors (LSRs). The classification and robustness (against adversarial attacks) performances of BP- and FF-designed CNNs applied to the MNIST and the CIFAR-10 datasets are compared. Finally, we comment on the relationship between BP and FF designs.
CVAug 6, 2018
Defense Against Adversarial Attacks with Saak TransformSibo Song, Yueru Chen, Ngai-Man Cheung et al.
Deep neural networks (DNNs) are known to be vulnerable to adversarial perturbations, which imposes a serious threat to DNN-based decision systems. In this paper, we propose to apply the lossy Saak transform to adversarially perturbed images as a preprocessing tool to defend against adversarial attacks. Saak transform is a recently-proposed state-of-the-art for computing the spatial-spectral representations of input images. Empirically, we observe that outputs of the Saak transform are very discriminative in differentiating adversarial examples from clean ones. Therefore, we propose a Saak transform based preprocessing method with three steps: 1) transforming an input image to a joint spatial-spectral representation via the forward Saak transform, 2) apply filtering to its high-frequency components, and, 3) reconstructing the image via the inverse Saak transform. The processed image is found to be robust against adversarial perturbations. We conduct extensive experiments to investigate various settings of the Saak transform and filtering functions. Without harming the decision performance on clean images, our method outperforms state-of-the-art adversarial defense methods by a substantial margin on both the CIFAR-10 and ImageNet datasets. Importantly, our results suggest that adversarial perturbations can be effectively and efficiently defended using state-of-the-art frequency analysis.
CVDec 4, 2017
A Deep Learning Approach to Drone MonitoringYueru Chen, Pranav Aggarwal, Jongmoo Choi et al.
A drone monitoring system that integrates deep-learning-based detection and tracking modules is proposed in this work. The biggest challenge in adopting deep learning methods for drone detection is the limited amount of training drone images. To address this issue, we develop a model-based drone augmentation technique that automatically generates drone images with a bounding box label on drone's location. To track a small flying drone, we utilize the residual information between consecutive image frames. Finally, we present an integrated detection and tracking system that outperforms the performance of each individual module containing detection or tracking only. The experiments show that, even being trained on synthetic data, the proposed system performs well on real world drone images with complex background. The USC drone detection and tracking dataset with user labeled bounding boxes is available to the public.
CVOct 29, 2017
A Saak Transform Approach to Efficient, Scalable and Robust Handwritten Digits RecognitionYueru Chen, Zhuwei Xu, Shanshan Cai et al.
An efficient, scalable and robust approach to the handwritten digits recognition problem based on the Saak transform is proposed in this work. First, multi-stage Saak transforms are used to extract a family of joint spatial-spectral representations of input images. Then, the Saak coefficients are used as features and fed into the SVM classifier for the classification task. In order to control the size of Saak coefficients, we adopt a lossy Saak transform that uses the principal component analysis (PCA) to select a smaller set of transform kernels. The handwritten digits recognition problem is well solved by the convolutional neural network (CNN) such as the LeNet-5. We conduct a comparative study on the performance of the LeNet-5 and the Saak-transform-based solutions in terms of scalability and robustness as well as the efficiency of lossless and lossy Saak transforms under a comparable accuracy level.
CVOct 11, 2017
On Data-Driven Saak TransformC. -C. Jay Kuo, Yueru Chen
Being motivated by the multilayer RECOS (REctified-COrrelations on a Sphere) transform, we develop a data-driven Saak (Subspace approximation with augmented kernels) transform in this work. The Saak transform consists of three steps: 1) building the optimal linear subspace approximation with orthonormal bases using the second-order statistics of input vectors, 2) augmenting each transform kernel with its negative, 3) applying the rectified linear unit (ReLU) to the transform output. The Karhunen-Loéve transform (KLT) is used in the first step. The integration of Steps 2 and 3 is powerful since they resolve the sign confusion problem, remove the rectification loss and allow a straightforward implementation of the inverse Saak transform at the same time. Multiple Saak transforms are cascaded to transform images of a larger size. All Saak transform kernels are derived from the second-order statistics of input random vectors in a one-pass feedforward manner. Neither data labels nor backpropagation is used in kernel determination. Multi-stage Saak transforms offer a family of joint spatial-spectral representations between two extremes; namely, the full spatial-domain representation and the full spectral-domain representation. We select Saak coefficients of higher discriminant power to form a feature vector for pattern recognition, and use the MNIST dataset classification problem as an illustrative example.
CVFeb 1, 2017
Design, Analysis and Application of A Volumetric Convolutional Neural NetworkXiaqing Pan, Yueru Chen, C. -C. Jay Kuo
The design, analysis and application of a volumetric convolutional neural network (VCNN) are studied in this work. Although many CNNs have been proposed in the literature, their design is empirical. In the design of the VCNN, we propose a feed-forward K-means clustering algorithm to determine the filter number and size at each convolutional layer systematically. For the analysis of the VCNN, the cause of confusing classes in the output of the VCNN is explained by analyzing the relationship between the filter weights (also known as anchor vectors) from the last fully-connected layer to the output. Furthermore, a hierarchical clustering method followed by a random forest classification method is proposed to boost the classification performance among confusing classes. For the application of the VCNN, we examine the 3D shape classification problem and conduct experiments on a popular ModelNet40 dataset. The proposed VCNN offers the state-of-the-art performance among all volume-based CNN methods.
CVJan 30, 2017
3D Shape Retrieval via Irrelevance Filtering and Similarity Ranking (IF/SR)Xiaqing Pan, Yueru Chen, C. -C. Jay Kuo
A novel solution for the content-based 3D shape retrieval problem using an unsupervised clustering approach, which does not need any label information of 3D shapes, is presented in this work. The proposed shape retrieval system consists of two modules in cascade: the irrelevance filtering (IF) module and the similarity ranking (SR) module. The IF module attempts to cluster gallery shapes that are similar to each other by examining global and local features simultaneously. However, shapes that are close in the local feature space can be distant in the global feature space, and vice versa. To resolve this issue, we propose a joint cost function that strikes a balance between two distances. Irrelevant samples that are close in the local feature space but distant in the global feature space can be removed in this stage. The remaining gallery samples are ranked in the SR module using the local feature. The superior performance of the proposed IF/SR method is demonstrated by extensive experiments conducted on the popular SHREC12 dataset.