Shenglan Liu

CV
18papers
180citations
Novelty47%
AI Score29

18 Papers

CVSep 27, 2023Code
End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning

Jinrong Zhang, Wujun Wen, Shenglan Liu et al.

The streaming temporal action segmentation (STAS) task, a supplementary task of temporal action segmentation (TAS), has not received adequate attention in the field of video understanding. Existing TAS methods are constrained to offline scenarios due to their heavy reliance on multimodal features and complete contextual information. The STAS task requires the model to classify each frame of the entire untrimmed video sequence clip by clip in time, thereby extending the applicability of TAS methods to online scenarios. However, directly applying existing TAS methods to SATS tasks results in significantly poor segmentation outcomes. In this paper, we thoroughly analyze the fundamental differences between STAS tasks and TAS tasks, attributing the severe performance degradation when transferring models to model bias and optimization dilemmas. We introduce an end-to-end streaming video temporal action segmentation model with reinforcement learning (SVTAS-RL). The end-to-end modeling method mitigates the modeling bias introduced by the change in task nature and enhances the feasibility of online solutions. Reinforcement learning is utilized to alleviate the optimization dilemma. Through extensive experiments, the SVTAS-RL model significantly outperforms existing STAS models and achieves competitive performance to the state-of-the-art TAS model on multiple datasets under the same evaluation criteria, demonstrating notable advantages on the ultra-long video dataset EGTEA. Code is available at https://github.com/Thinksky5124/SVTAS.

CVAug 18, 2022
Spatial Temporal Graph Attention Network for Skeleton-Based Action Recognition

Lianyu Hu, Shenglan Liu, Wei Feng

It's common for current methods in skeleton-based action recognition to mainly consider capturing long-term temporal dependencies as skeleton sequences are typically long (>128 frames), which forms a challenging problem for previous approaches. In such conditions, short-term dependencies are few formally considered, which are critical for classifying similar actions. Most current approaches are consisted of interleaving spatial-only modules and temporal-only modules, where direct information flow among joints in adjacent frames are hindered, thus inferior to capture short-term motion and distinguish similar action pairs. To handle this limitation, we propose a general framework, coined as STGAT, to model cross-spacetime information flow. It equips the spatial-only modules with spatial-temporal modeling for regional perception. While STGAT is theoretically effective for spatial-temporal modeling, we propose three simple modules to reduce local spatial-temporal feature redundancy and further release the potential of STGAT, which (1) narrow the scope of self-attention mechanism, (2) dynamically weight joints along temporal dimension, and (3) separate subtle motion from static features, respectively. As a robust feature extractor, STGAT generalizes better upon classifying similar actions than previous methods, witnessed by both qualitative and quantitative results. STGAT achieves state-of-the-art performance on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400. Code is released.

SDJan 24, 2019Code
Bottom-up Broadcast Neural Network For Music Genre Classification

Caifeng Liu, Lin Feng, Guochao Liu et al.

Music genre recognition based on visual representation has been successfully explored over the last years. Recently, there has been increasing interest in attempting convolutional neural networks (CNNs) to achieve the task. However, most of existing methods employ the mature CNN structures proposed in image recognition without any modification, which results in the learning features that are not adequate for music genre classification. Faced with the challenge of this issue, we fully exploit the low-level information from spectrograms of audios and develop a novel CNN architecture in this paper. The proposed CNN architecture takes the long contextual information into considerations, which transfers more suitable information for the decision-making layer. Various experiments on several benchmark datasets, including GTZAN, Ballroom, and Extended Ballroom, have verified the excellent performances of the proposed neural network. Codes and model will be available at "ttps://github.com/CaifengLiu/music-genre-classification".

CVAug 26, 2024
More Pictures Say More: Visual Intersection Network for Open Set Object Detection

Bingcheng Dong, Yuning Ding, Jinrong Zhang et al.

Open Set Object Detection has seen rapid development recently, but it continues to pose significant challenges. Language-based methods, grappling with the substantial modal disparity between textual and visual modalities, require extensive computational resources to bridge this gap. Although integrating visual prompts into these frameworks shows promise for enhancing performance, it always comes with constraints related to textual semantics. In contrast, viusal-only methods suffer from the low-quality fusion of multiple visual prompts. In response, we introduce a strong DETR-based model, Visual Intersection Network for Open Set Object Detection (VINO), which constructs a multi-image visual bank to preserve the semantic intersections of each category across all time steps. Our innovative multi-image visual updating mechanism learns to identify the semantic intersections from various visual prompts, enabling the flexible incorporation of new information and continuous optimization of feature representations. Our approach guarantees a more precise alignment between target category semantics and region semantics, while significantly reducing pre-training time and resource demands compared to language-based methods. Furthermore, the integration of a segmentation head illustrates the broad applicability of visual intersection in various visual tasks. VINO, which requires only 7 RTX4090 GPU days to complete one epoch on the Objects365v1 dataset, achieves competitive performance on par with vision-language models on benchmarks such as LVIS and ODinW35.

SIFeb 5, 2021
Self-Supervised Deep Graph Embedding with High-Order Information Fusion for Community Discovery

Shuliang Xu, Shenglan Liu, Lin Feng

Deep graph embedding is an important approach for community discovery. Deep graph neural network with self-supervised mechanism can obtain the low-dimensional embedding vectors of nodes from unlabeled and unstructured graph data. The high-order information of graph can provide more abundant structure information for the representation learning of nodes. However, most self-supervised graph neural networks only use adjacency matrix as the input topology information of graph and cannot obtain too high-order information since the number of layers of graph neural network is fairly limited. If there are too many layers, the phenomenon of over smoothing will appear. Therefore how to obtain and fuse high-order information of graph by a shallow graph neural network is an important problem. In this paper, a deep graph embedding algorithm with self-supervised mechanism for community discovery is proposed. The proposed algorithm uses self-supervised mechanism and different high-order information of graph to train multiple deep graph convolution neural networks. The outputs of multiple graph convolution neural networks are fused to extract the representations of nodes which include the attribute and structure information of a graph. In addition, data augmentation and negative sampling are introduced into the training process to facilitate the improvement of embedding result. The proposed algorithm and the comparison algorithms are conducted on the five experimental data sets. The experimental results show that the proposed algorithm outperforms the comparison algorithms on the most experimental data sets. The experimental results demonstrate that the proposed algorithm is an effective algorithm for community discovery.

LGNov 22, 2020
Angular Embedding: A New Angular Robust Principal Component Analysis

Shenglan Liu, Yang Yu

As a widely used method in machine learning, principal component analysis (PCA) shows excellent properties for dimensionality reduction. It is a serious problem that PCA is sensitive to outliers, which has been improved by numerous Robust PCA (RPCA) versions. However, the existing state-of-the-art RPCA approaches cannot easily remove or tolerate outliers by a non-iterative manner. To tackle this issue, this paper proposes Angular Embedding (AE) to formulate a straightforward RPCA approach based on angular density, which is improved for large scale or high-dimensional data. Furthermore, a trimmed AE (TAE) is introduced to deal with data with large scale outliers. Extensive experiments on both synthetic and real-world datasets with vector-level or pixel-level outliers demonstrate that the proposed AE/TAE outperforms the state-of-the-art RPCA based methods.

LGJun 29, 2020
Local Neighbor Propagation Embedding

Shenglan Liu, Yang Yu

Manifold Learning occupies a vital role in the field of nonlinear dimensionality reduction and its ideas also serve for other relevant methods. Graph-based methods such as Graph Convolutional Networks (GCN) show ideas in common with manifold learning, although they belong to different fields. Inspired by GCN, we introduce neighbor propagation into LLE and propose Local Neighbor Propagation Embedding (LNPE). With linear computational complexity increase compared with LLE, LNPE enhances the local connections and interactions between neighborhoods by extending $1$-hop neighbors into $n$-hop neighbors. The experimental results show that LNPE could obtain more faithful and robust embeddings with better topological and geometrical properties.

LGSep 16, 2019
Hierarchic Neighbors Embedding

Shenglan Liu, Yang Yu, Yang Liu et al.

Manifold learning now plays a very important role in machine learning and many relevant applications. Although its superior performance in dealing with nonlinear data distribution, data sparsity is always a thorny knot. There are few researches to well handle it in manifold learning. In this paper, we propose Hierarchic Neighbors Embedding (HNE), which enhance local connection by the hierarchic combination of neighbors. After further analyzing topological connection and reconstruction performance, three different versions of HNE are given. The experimental results show that our methods work well on both synthetic data and high-dimensional real-world tasks. HNE develops the outstanding advantages in dealing with general data. Furthermore, comparing with other popular manifold learning methods, the performance on sparse samples and weak-connected manifolds is better for HNE.

CVMay 10, 2019
A fast online cascaded regression algorithm for face alignment

Lin Feng, Caifeng Liu, Shenglan Liu et al.

Traditional face alignment based on machine learning usually tracks the localizations of facial landmarks employing a static model trained offline where all of the training data is available in advance. When new training samples arrive, the static model must be retrained from scratch, which is excessively time-consuming and memory-consuming. In many real-time applications, the training data is obtained one by one or batch by batch. It results in that the static model limits its performance on sequential images with extensive variations. Therefore, the most critical and challenging aspect in this field is dynamically updating the tracker's models to enhance predictive and generalization capabilities continuously. In order to address this question, we develop a fast and accurate online learning algorithm for face alignment. Particularly, we incorporate on-line sequential extreme learning machine into a parallel cascaded regression framework, coined incremental cascade regression(ICR). To the best of our knowledge, this is the first incremental cascaded framework with the non-linear regressor. One main advantage of ICR is that the tracker model can be fast updated in an incremental way without the entire retraining process when a new input is incoming. Experimental results demonstrate that the proposed ICR is more accurate and efficient on still or sequential images compared with the recent state-of-the-art cascade approaches. Furthermore, the incremental learning proposed in this paper can update the trained model in real time.

CVJan 11, 2019
Color Recognition for Rubik's Cube Robot

Shenglan Liu, Dong Jiang, Lin Feng et al.

In this paper, we proposed three methods to solve color recognition of Rubik's cube, which includes one offline method and two online methods. Scatter balance \& extreme learning machine (SB-ELM), a offline method, is proposed to illustrate the efficiency of training based method. We also point out the conception of color drifting which indicates offline methods are always ineffectiveness and can not work well in continuous change circumstance. By contrast, dynamic weight label propagation is proposed for labeling blocks color by known center blocks color of Rubik's cube. Furthermore, weak label hierarchic propagation, another online method, is also proposed for unknown all color information but only utilizes weak label of center block in color recognition. We finally design a Rubik's cube robot and construct a dataset to illustrate the efficiency and effectiveness of our online methods and to indicate the ineffectiveness of offline method by color drifting in our dataset.

CVNov 8, 2018
Multi-view Laplacian Eigenmaps Based on Bag-of-Neighbors For RGBD Human Emotion Recognition

Shenglan Liu, Shuai Guo, Hong Qiao et al.

Human emotion recognition is an important direction in the field of biometric and information forensics. However, most existing human emotion research are based on the single RGB view. In this paper, we introduce a RGBD video-emotion dataset and a RGBD face-emotion dataset for research. To our best knowledge, this may be the first RGBD video-emotion dataset. We propose a new supervised nonlinear multi-view laplacian eigenmaps (MvLE) approach and a multihidden-layer out-of-sample network (MHON) for RGB-D humanemotion recognition. To get better representations of RGB view and depth view, MvLE is used to map the training set of both views from original space into the common subspace. As RGB view and depth view lie in different spaces, a new distance metric bag of neighbors (BON) used in MvLE can get the similar distributions of the two views. Finally, MHON is used to get the low-dimensional representations of test data and predict their labels. MvLE can deal with the cases that RGB view and depth view have different size of features, even different number of samples and classes. And our methods can be easily extended to more than two views. The experiment results indicate the effectiveness of our methods over some state-of-art methods.

CVOct 25, 2018
Perceptual Visual Interactive Learning

Shenglan Liu, Xiang Liu, Yang Liu et al.

Supervised learning methods are widely used in machine learning. However, the lack of labels in existing data limits the application of these technologies. Visual interactive learning (VIL) compared with computers can avoid semantic gap, and solve the labeling problem of small label quantity (SLQ) samples in a groundbreaking way. In order to fully understand the importance of VIL to the interaction process, we re-summarize the interactive learning related algorithms (e.g. clustering, classification, retrieval etc.) from the perspective of VIL. Note that, perception and cognition are two main visual processes of VIL. On this basis, we propose a perceptual visual interactive learning (PVIL) framework, which adopts gestalt principle to design interaction strategy and multi-dimensionality reduction (MDR) to optimize the process of visualization. The advantage of PVIL framework is that it combines computer's sensitivity of detailed features and human's overall understanding of global tasks. Experimental results validate that the framework is superior to traditional computer labeling methods (such as label propagation) in both accuracy and efficiency, which achieves significant classification results on dense distribution and sparse classes dataset.

CVNov 12, 2017
Hand Gesture Recognition with Leap Motion

Youchen Du, Shenglan Liu, Lin Feng et al.

The recent introduction of depth cameras like Leap Motion Controller allows researchers to exploit the depth information to recognize hand gesture more robustly. This paper proposes a novel hand gesture recognition system with Leap Motion Controller. A series of features are extracted from Leap Motion tracking data, we feed these features along with HOG feature extracted from sensor images into a multi-class SVM classifier to recognize performed gesture, dimension reduction and feature weighted fusion are also discussed. Our results show that our model is much more accurate than previous work.

LGOct 30, 2017
Rough extreme learning machine: a new classification method based on uncertainty measure

Lin Feng, Shuliang Xu, Feilong Wang et al.

Extreme learning machine (ELM) is a new single hidden layer feedback neural network. The weights of the input layer and the biases of neurons in hidden layer are randomly generated, the weights of the output layer can be analytically determined. ELM has been achieved good results for a large number of classification tasks. In this paper, a new extreme learning machine called rough extreme learning machine (RELM) was proposed. RELM uses rough set to divide data into upper approximation set and lower approximation set, and the two approximation sets are utilized to train upper approximation neurons and lower approximation neurons. In addition, an attribute reduction is executed in this algorithm to remove redundant attributes. The experimental results showed, comparing with the comparison algorithms, RELM can get a better accuracy and repeatability in most cases, RELM can not only maintain the advantages of fast speed, but also effectively cope with the classification task for high-dimensional data.

CVMar 24, 2017
Feature Fusion using Extended Jaccard Graph and Stochastic Gradient Descent for Robot

Shenglan Liu, Muxin Sun, Wei Wang et al.

Robot vision is a fundamental device for human-robot interaction and robot complex tasks. In this paper, we use Kinect and propose a feature graph fusion (FGF) for robot recognition. Our feature fusion utilizes RGB and depth information to construct fused feature from Kinect. FGF involves multi-Jaccard similarity to compute a robust graph and utilize word embedding method to enhance the recognition results. We also collect DUT RGB-D face dataset and a benchmark datset to evaluate the effectiveness and efficiency of our method. The experimental results illustrate FGF is robust and effective to face and object datasets in robot applications.

CVMar 11, 2017
Neural method for Explicit Mapping of Quasi-curvature Locally Linear Embedding in image retrieval

Shenglan Liu, Jun Wu, Lin Feng et al.

This paper proposed a new explicit nonlinear dimensionality reduction using neural networks for image retrieval tasks. We first proposed a Quasi-curvature Locally Linear Embedding (QLLE) for training set. QLLE guarantees the linear criterion in neighborhood of each sample. Then, a neural method (NM) is proposed for out-of-sample problem. Combining QLLE and NM, we provide a explicit nonlinear dimensionality reduction approach for efficient image retrieval. The experimental results in three benchmark datasets illustrate that our method can get better performance than other state-of-the-art out-of-sample methods.

CVSep 24, 2016
Perceptual uniform descriptor and Ranking on manifold: A bridge between image representation and ranking for image retrieval

Shenglan Liu, Jun Wu, Lin Feng et al.

Incompatibility of image descriptor and ranking is always neglected in image retrieval. In this paper, manifold learning and Gestalt psychology theory are involved to solve the incompatibility problem. A new holistic descriptor called Perceptual Uniform Descriptor (PUD) based on Gestalt psychology is proposed, which combines color and gradient direction to imitate the human visual uniformity. PUD features in the same class images distributes on one manifold in most cases because PUD improves the visual uniformity of the traditional descriptors. Thus, we use manifold ranking and PUD to realize image retrieval. Experiments were carried out on five benchmark data sets, and the proposed method can greatly improve the accuracy of image retrieval. Our experimental results in the Ukbench and Corel-1K datasets demonstrated that N-S score reached to 3.58 (HSV 3.4) and mAP to 81.77% (ODBTC 77.9%) respectively by utilizing PUD which has only 280 dimension. The results are higher than other holistic image descriptors (even some local ones) and state-of-the-arts retrieval methods.

CVSep 24, 2016
Three Tiers Neighborhood Graph and Multi-graph Fusion Ranking for Multi-feature Image Retrieval: A Manifold Aspect

Shenglan Liu, Muxin Sun, Lin Feng et al.

Single feature is inefficient to describe content of an image, which is a shortcoming in traditional image retrieval task. We know that one image can be described by different features. Multi-feature fusion ranking can be utilized to improve the ranking list of query. In this paper, we first analyze graph structure and multi-feature fusion re-ranking from manifold aspect. Then, Three Tiers Neighborhood Graph (TTNG) is constructed to re-rank the original ranking list by single feature and to enhance precision of single feature. Furthermore, we propose Multi-graph Fusion Ranking (MFR) for multi-feature ranking, which considers the correlation of all images in multiple neighborhood graphs. Evaluations are conducted on UK-bench, Corel-1K, Corel-10K and Cifar-10 benchmark datasets. The experimental results show that our TTNG and MFR outperform than other state-of-the-art methods. For example, we achieve competitive results N-S score 3.91 and precision 65.00% on UK-bench and Corel-10K datasets respectively.