CVJul 21, 2022Code
Human-centric Image Cropping with Partition-aware and Content-preserving FeaturesBo Zhang, Li Niu, Xing Zhao et al.
Image cropping aims to find visually appealing crops in an image, which is an important yet challenging task. In this paper, we consider a specific and practical application: human-centric image cropping, which focuses on the depiction of a person. To this end, we propose a human-centric image cropping method with two novel feature designs for the candidate crop: partition-aware feature and content-preserving feature. For partition-aware feature, we divide the whole image into nine partitions based on the human bounding box and treat different partitions in a candidate crop differently conditioned on the human information. For content-preserving feature, we predict a heatmap indicating the important content to be included in a good crop, and extract the geometric relation between the heatmap and a candidate crop. Extensive experiments demonstrate that our method can perform favorably against state-of-the-art image cropping methods on human-centric image cropping task. Code is available at https://github.com/bcmi/Human-Centric-Image-Cropping.
CVAug 1, 2022
Motion-aware Memory Network for Fast Video Salient Object DetectionXing Zhao, Haoran Liang, Peipei Li et al.
Previous methods based on 3DCNN, convLSTM, or optical flow have achieved great success in video salient object detection (VSOD). However, they still suffer from high computational costs or poor quality of the generated saliency maps. To solve these problems, we design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD. Furthermore, previous methods only considered single-frame prediction without temporal association. As a result, the model may not focus on the temporal information sufficiently. Thus, we initially introduce object motion prediction between inter-frame into VSOD. Our model follows standard encoder--decoder architecture. In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames. This approach is more efficient than the optical flow-based methods. In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches. The semantic information of the high-level features is used to fuse the object details in the low-level features, and then the spatiotemporal features are obtained step by step to reconstruct the saliency maps. Moreover, inspired by the boundary supervision commonly used in image salient object detection (ISOD), we design a motion-aware loss for predicting object boundary motion and simultaneously perform multitask learning for VSOD and object motion prediction, which can further facilitate the model to extract spatiotemporal features accurately and maintain the object integrity. Extensive experiments on several datasets demonstrated the effectiveness of our method and can achieve state-of-the-art metrics on some datasets. The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
CVOct 28, 2022
Matching entropy based disparity estimation from light fieldLigen Shi, Chang Liu, Di He et al.
A major challenge for matching-based depth estimation is to prevent mismatches in occlusion and smooth regions. An effective matching window satisfying three characteristics: texture richness, disparity consistency and anti-occlusion should be able to prevent mismatches to some extent. According to these characteristics, we propose matching entropy in the spatial domain of light field to measure the amount of correct information in a matching window, which provides the criterion for matching window selection. Based on matching entropy regularization, we establish an optimization model for depth estimation with a matching cost fidelity term. To find the optimum, we propose a two-step adaptive matching algorithm. First, the region type is adaptively determined to identify occluding, occluded, smooth and textured regions. Then, the matching entropy criterion is used to adaptively select the size and shape of matching windows, as well as the visible viewpoints. The two-step process can reduce mismatches and redundant calculations by selecting effective matching windows. The experimental results on synthetic and real data show that the proposed method can effectively improve the accuracy of depth estimation in occlusion and smooth regions and has strong robustness for different noise levels. Therefore, high-precision depth estimation from 4D light field data is achieved.
IVJul 22, 2024
Iterative approach to reconstructing neural disparity fields from light-field dataLigen Shi, Chang Liu, Xing Zhao et al.
This study proposes a neural disparity field (NDF) that establishes an implicit, continuous representation of scene disparity based on a neural field and an iterative approach to address the inverse problem of NDF reconstruction from light-field data. NDF enables seamless and precise characterization of disparity variations in three-dimensional scenes and can discretize disparity at any arbitrary resolution, overcoming the limitations of traditional disparity maps that are prone to sampling errors and interpolation inaccuracies. The proposed NDF network architecture utilizes hash encoding combined with multilayer perceptrons to capture detailed disparities in texture levels, thereby enhancing its ability to represent the geometric information of complex scenes. By leveraging the spatial-angular consistency inherent in light-field data, a differentiable forward model to generate a central view image from the light-field data is developed. Based on the forward model, an optimization scheme for the inverse problem of NDF reconstruction using differentiable propagation operators is established. Furthermore, an iterative solution method is adopted to reconstruct the NDF in the optimization scheme, which does not require training datasets and applies to light-field data captured by various acquisition methods. Experimental results demonstrate that high-quality NDF can be reconstructed from light-field data using the proposed method. High-resolution disparity can be effectively recovered by NDF, demonstrating its capability for the implicit, continuous representation of scene disparities.
72.6CVApr 2
Beyond Fixed Inference: Quantitative Flow Matching for Adaptive Image DenoisingJigang Duan, Genwei Ma, Xu Jiang et al.
Diffusion and flow-based generative models have shown strong potential for image restoration. However, image denoising under unknown and varying noise conditions remains challenging, because the learned vector fields may become inconsistent across different noise levels, leading to degraded restoration quality under mismatch between training and inference. To address this issue, we propose a quantitative flow matching framework for adaptive image denoising. The method first estimates the input noise level from local pixel statistics, and then uses this quantitative estimate to adapt the inference trajectory, including the starting point, the number of integration steps, and the step-size schedule. In this way, the denoising process is better aligned with the actual corruption level of each input, reducing unnecessary computation for lightly corrupted images while providing sufficient refinement for heavily degraded ones. By coupling quantitative noise estimation with noise-adaptive flow inference, the proposed method improves both restoration accuracy and inference efficiency. Extensive experiments on natural, medical, and microscopy images demonstrate its robustness and strong generalization across diverse noise levels and imaging conditions.
41.7LGApr 2
General Explicit Network (GEN): A novel deep learning architecture for solving partial differential equationsGenwei Ma, Ting Luo, Ping Yang et al.
Machine learning, especially physics-informed neural networks (PINNs) and their neural network variants, has been widely used to solve problems involving partial differential equations (PDEs). The successful deployment of such methods beyond academic research remains limited. For example, PINN methods primarily consider discrete point-to-point fitting and fail to account for the potential properties of real solutions. The adoption of continuous activation functions in these approaches leads to local characteristics that align with the equation solutions while resulting in poor extensibility and robustness. A general explicit network (GEN) that implements point-to-function PDE solving is proposed in this paper. The "function" component can be constructed based on our prior knowledge of the original PDEs through corresponding basis functions for fitting. The experimental results demonstrate that this approach enables solutions with high robustness and strong extensibility to be obtained.
SEFeb 15, 2024
Practitioners' Challenges and Perceptions of CI Build Failure Predictions at AtlassianYang Hong, Chakkrit Tantithamthavorn, Jirat Pasuksmit et al.
Continuous Integration (CI) build failures could significantly impact the software development process and teams, such as delaying the release of new features and reducing developers' productivity. In this work, we report on an empirical study that investigates CI build failures throughout product development at Atlassian. Our quantitative analysis found that the repository dimension is the key factor influencing CI build failures. In addition, our qualitative survey revealed that Atlassian developers perceive CI build failures as challenging issues in practice. Furthermore, we found that the CI build prediction can not only provide proactive insight into CI build failures but also facilitate the team's decision-making. Our study sheds light on the challenges and expectations involved in integrating CI build prediction tools into the Bitbucket environment, providing valuable insights for enhancing CI processes.
IVApr 10, 2024
Ray-driven Spectral CT Reconstruction Based on Neural Base-Material FieldsLigen Shi, Chang Liu, Ping Yang et al.
In spectral CT reconstruction, the basis materials decomposition involves solving a large-scale nonlinear system of integral equations, which is highly ill-posed mathematically. This paper proposes a model that parameterizes the attenuation coefficients of the object using a neural field representation, thereby avoiding the complex calculations of pixel-driven projection coefficient matrices during the discretization process of line integrals. It introduces a lightweight discretization method for line integrals based on a ray-driven neural field, enhancing the accuracy of the integral approximation during the discretization process. The basis materials are represented as continuous vector-valued implicit functions to establish a neural field parameterization model for the basis materials. The auto-differentiation framework of deep learning is then used to solve the implicit continuous function of the neural base-material fields. This method is not limited by the spatial resolution of reconstructed images, and the network has compact and regular properties. Experimental validation shows that our method performs exceptionally well in addressing the spectral CT reconstruction. Additionally, it fulfils the requirements for the generation of high-resolution reconstruction images.
CVOct 28, 2025
Physics-Inspired Gaussian Kolmogorov-Arnold Networks for X-ray Scatter Correction in Cone-Beam CTXu Jiang, Huiying Pan, Ligen Shi et al.
Cone-beam CT (CBCT) employs a flat-panel detector to achieve three-dimensional imaging with high spatial resolution. However, CBCT is susceptible to scatter during data acquisition, which introduces CT value bias and reduced tissue contrast in the reconstructed images, ultimately degrading diagnostic accuracy. To address this issue, we propose a deep learning-based scatter artifact correction method inspired by physical prior knowledge. Leveraging the fact that the observed point scatter probability density distribution exhibits rotational symmetry in the projection domain. The method uses Gaussian Radial Basis Functions (RBF) to model the point scatter function and embeds it into the Kolmogorov-Arnold Networks (KAN) layer, which provides efficient nonlinear mapping capabilities for learning high-dimensional scatter features. By incorporating the physical characteristics of the scattered photon distribution together with the complex function mapping capacity of KAN, the model improves its ability to accurately represent scatter. The effectiveness of the method is validated through both synthetic and real-scan experiments. Experimental results show that the model can effectively correct the scatter artifacts in the reconstructed images and is superior to the current methods in terms of quantitative metrics.
IRApr 5, 2021
A Non-sequential Approach to Deep User Interest Model for CTR PredictionKeke Zhao, Xing Zhao, Qi Cao et al.
Click-Through Rate (CTR) prediction plays an important role in many industrial applications, and recently a lot of attention is paid to the deep interest models which use attention mechanism to capture user interests from historical behaviors. However, most current models are based on sequential models which truncate the behavior sequences by a fixed length, thus have difficulties in handling very long behavior sequences. Another big problem is that sequences with the same length can be quite different in terms of time, carrying completely different meanings. In this paper, we propose a non-sequential approach to tackle the above problems. Specifically, we first represent the behavior data in a sparse key-vector format, where the vector contains rich behavior info such as time, count and category. Next, we enhance the Deep Interest Network to take such rich information into account by a novel attention network. The sparse representation makes it practical to handle large scale long behavior sequences. Finally, we introduce a multidimensional partition framework to mine behavior interactions. The framework can partition data into custom designed time buckets to capture the interactions among information aggregated in different time buckets. Similarly, it can also partition the data into different categories and capture the interactions among them. Experiments are conducted on two public datasets: one is an advertising dataset and the other is a production recommender dataset. Our models outperform other state-of-the-art models on both datasets.
CVMar 13, 2020
Mutual Information Maximization for Effective Lip ReadingXing Zhao, Shuang Yang, Shiguang Shan et al.
Lip reading has received an increasing research interest in recent years due to the rapid development of deep learning and its widespread potential applications. One key point to obtain good performance for the lip reading task depends heavily on how effective the representation can be to capture the lip movement information and meanwhile to resist the noises resulted from the change of pose, lighting conditions, speaker's appearance and so on. Towards this target, we propose to introduce the mutual information constraints on both the local feature's level and the global sequence's level to enhance the relations of the features with the speech content. On the one hand, we constraint the features generated at each time step to enable them carry a strong relation with the speech content by imposing the local mutual information maximization constraint (LMIM), leading to improvements over the model's ability to discover fine-grained lip movements and the fine-grained differences among words with similar pronunciation, such as ``spend'' and ``spending''. On the other hand, we introduce the mutual information maximization constraint on the global sequence's level (GMIM), to make the model be able to pay more attention to discriminate key frames related with the speech content, and less to various noises appeared in the speaking process. By combining these two advantages together, the proposed method is expected to be both discriminative and robust for effective lip reading. To verify this method, we evaluate on two large-scale benchmark. We perform a detailed analysis and comparison on several aspects, including the comparison of the LMIM and GMIM with the baseline, the visualization of the learned representation and so on. The results not only prove the effectiveness of the proposed method but also report new state-of-the-art performance on both the two benchmarks.
IRMar 4, 2020
Learning to Hash with Graph Neural Networks for Recommender SystemsQiaoyu Tan, Ninghao Liu, Xing Zhao et al.
Graph representation learning has attracted much attention in supporting high quality candidate search at scale. Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous. In this work, we investigate the problem of hashing with graph neural networks (GNNs) for high quality retrieval, and propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes. Specifically, a deep hashing with GNNs (HashGNN) is presented, which consists of two components, a GNN encoder for learning node representations, and a hash layer for encoding representations to hash codes. The whole architecture is trained end-to-end by jointly optimizing two losses, i.e., reconstruction loss from reconstructing observed links, and ranking loss from preserving the relative ordering of hash codes. A novel discrete optimization strategy based on straight through estimator (STE) with guidance is proposed. The principal idea is to avoid gradient magnification in back-propagation of STE with continuous embedding guidance, in which we begin from learning an easier network that mimic the continuous embedding and let it evolve during the training until it finally goes back to STE. Comprehensive experiments over three publicly available and one real-world Alibaba company datasets demonstrate that our model not only can achieve comparable performance compared with its continuous counterpart but also runs multiple times faster during inference.
LGJan 6, 2020
Elastic Bulk Synchronous Parallel Model for Distributed Deep LearningXing Zhao, Manos Papagelis, Aijun An et al.
The bulk synchronous parallel (BSP) is a celebrated synchronization model for general-purpose parallel computing that has successfully been employed for distributed training of machine learning models. A prevalent shortcoming of the BSP is that it requires workers to wait for the straggler at every iteration. To ameliorate this shortcoming of classic BSP, we propose ELASTICBSP a model that aims to relax its strict synchronization requirement. The proposed model offers more flexibility and adaptability during the training phase, without sacrificing on the accuracy of the trained model. We also propose an efficient method that materializes the model, named ZIPLINE. The algorithm is tunable and can effectively balance the trade-off between quality of convergence and iteration throughput, in order to accommodate different environments or applications. A thorough experimental evaluation demonstrates that our proposed ELASTICBSP model converges faster and to a higher accuracy than the classic BSP. It also achieves comparable (if not higher) accuracy than the other sensible synchronization models.
LGOct 28, 2019
Attenuating Random Noise in Seismic Data by a Deep Learning ApproachXing Zhao, Ping Lu, Yanyan Zhang et al.
In the geophysical field, seismic noise attenuation has been considered as a critical and long-standing problem, especially for the pre-stack data processing. Here, we propose a model to leverage the deep-learning model for this task. Rather than directly applying an existing de-noising model from ordinary images to the seismic data, we have designed a particular deep-learning model, based on residual neural networks. It is named as N2N-Seismic, which has a strong ability to recover the seismic signals back to intact condition with the preservation of primary signals. The proposed model, achieving with great success in attenuating noise, has been tested on two different seismic datasets. Several metrics show that our method outperforms conventional approaches in terms of Signal-to-Noise-Ratio, Mean-Squared-Error, Phase Spectrum, etc. Moreover, robust tests in terms of effectively removing random noise from any dataset with strong and weak noises have been extensively scrutinized in making sure that the proposed model is able to maintain a good level of adaptation while dealing with large variations of noise characteristics and intensities.
ROAug 18, 2019
Scene Classification in Indoor Environments for Robots using Context Based Word EmbeddingsBao Xin Chen, Raghavender Sahdev, Dekun Wu et al.
Scene Classification has been addressed with numerous techniques in computer vision literature. However, with the increasing number of scene classes in datasets in the field, it has become difficult to achieve high accuracy in the context of robotics. In this paper, we implement an approach which combines traditional deep learning techniques with natural language processing methods to generate a word embedding based Scene Classification algorithm. We use the key idea that context (objects in the scene) of an image should be representative of the scene label meaning a group of objects could assist to predict the scene class. Objects present in the scene are represented by vectors and the images are re-classified based on the objects present in the scene to refine the initial classification by a Convolutional Neural Network (CNN). In our approach we address indoor Scene Classification task using a model trained with a reduced pre-processed version of the Places365 dataset and an empirical analysis is done on a real-world dataset that we built by capturing image sequences using a GoPro camera. We also report results obtained on a subset of the Places365 dataset using our approach and additionally show a deployment of our approach on a robot operating in a real-world environment.
DCAug 16, 2019
Dynamic Stale Synchronous Parallel Distributed Training for Deep LearningXing Zhao, Aijun An, Junfeng Liu et al.
Deep learning is a popular machine learning technique and has been applied to many real-world problems. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to train a large model over large datasets. A popular solution is to distribute and parallelize the training process across multiple machines using the parameter server framework. In this paper, we present a distributed paradigm on the parameter server framework called Dynamic Stale Synchronous Parallel (DSSP) which improves the state-of-the-art Stale Synchronous Parallel (SSP) paradigm by dynamically determining the staleness threshold at the run time. Conventionally to run distributed training in SSP, the user needs to specify a particular staleness threshold as a hyper-parameter. However, a user does not usually know how to set the threshold and thus often finds a threshold value through trial and error, which is time-consuming. Based on workers' recent processing time, our approach DSSP adaptively adjusts the threshold per iteration at running time to reduce the waiting time of faster workers for synchronization of the globally shared parameters, and consequently increases the frequency of parameters updates (increases iteration throughput), which speedups the convergence rate. We compare DSSP with other paradigms such as Bulk Synchronous Parallel (BSP), Asynchronous Parallel (ASP), and SSP by running deep neural networks (DNN) models over GPU clusters in both homogeneous and heterogeneous environments. The results show that in a heterogeneous environment where the cluster consists of mixed models of GPUs, DSSP converges to a higher accuracy much earlier than SSP and BSP and performs similarly to ASP. In a homogeneous distributed cluster, DSSP has more stable and slightly better performance than SSP and ASP, and converges much faster than BSP.
CVNov 9, 2015
Spectral-Spatial Classification of Hyperspectral Image Using AutoencodersZhouhan Lin, Yushi Chen, Xing Zhao et al.
Hyperspectral image (HSI) classification is a hot topic in the remote sensing community. This paper proposes a new framework of spectral-spatial feature extraction for HSI classification, in which for the first time the concept of deep learning is introduced. Specifically, the model of autoencoder is exploited in our framework to extract various kinds of features. First we verify the eligibility of autoencoder by following classical spectral information based classification and use autoencoders with different depth to classify hyperspectral image. Further in the proposed framework, we combine PCA on spectral dimension and autoencoder on the other two spatial dimensions to extract spectral-spatial information for classification. The experimental results show that this framework achieves the highest classification accuracy among all methods, and outperforms classical classifiers such as SVM and PCA-based SVM.