CVApr 19, 2023Code
Learning Temporal Distribution and Spatial Correlation Towards Universal Moving Object SegmentationGuanfang Dong, Chenqiu Zhao, Xichen Pan et al.
The goal of moving object segmentation is separating moving objects from stationary backgrounds in videos. One major challenge in this problem is how to develop a universal model for videos from various natural scenes since previous methods are often effective only in specific scenes. In this paper, we propose a method called Learning Temporal Distribution and Spatial Correlation (LTS) that has the potential to be a general solution for universal moving object segmentation. In the proposed approach, the distribution from temporal pixels is first learned by our Defect Iterative Distribution Learning (DIDL) network for a scene-independent segmentation. Notably, the DIDL network incorporates the use of an improved product distribution layer that we have newly derived. Then, the Stochastic Bayesian Refinement (SBR) Network, which learns the spatial correlation, is proposed to improve the binary mask generated by the DIDL network. Benefiting from the scene independence of the temporal distribution and the accuracy improvement resulting from the spatial correlation, the proposed approach performs well for almost all videos from diverse and complex natural scenes with fixed parameters. Comprehensive experiments on standard datasets including LASIESTA, CDNet2014, BMC, SBMI2015 and 128 real world videos demonstrate the superiority of proposed approach compared to state-of-the-art methods with or without the use of deep learning networks. To the best of our knowledge, this work has high potential to be a general solution for moving object segmentation in real world environments. The code and real-world videos can be found on GitHub https://github.com/guanfangdong/LTS-UniverisalMOS.
CVJun 4, 2022
SPGNet: Spatial Projection Guided 3D Human Pose Estimation in Low Dimensional SpaceZihan Wang, Ruimin Chen, Mengxuan Liu et al.
We propose a method SPGNet for 3D human pose estimation that mixes multi-dimensional re-projection into supervised learning. In this method, the 2D-to-3D-lifting network predicts the global position and coordinates of the 3D human pose. Then, we re-project the estimated 3D pose back to the 2D key points along with spatial adjustments. The loss functions compare the estimated 3D pose with the 3D pose ground truth, and re-projected 2D pose with the input 2D pose. In addition, we propose a kinematic constraint to restrict the predicted target with constant human bone length. Based on the estimation results for the dataset Human3.6M, our approach outperforms many state-of-the-art methods both qualitatively and quantitatively.
LGAug 11, 2023
Learning Distributions via Monte-Carlo MarginalizationChenqiu Zhao, Guanfang Dong, Anup Basu
We propose a novel method to learn intractable distributions from their samples. The main idea is to use a parametric distribution model, such as a Gaussian Mixture Model (GMM), to approximate intractable distributions by minimizing the KL-divergence. Based on this idea, there are two challenges that need to be addressed. First, the computational complexity of KL-divergence is unacceptable when the dimensions of distributions increases. The Monte-Carlo Marginalization (MCMarg) is proposed to address this issue. The second challenge is the differentiability of the optimization process, since the target distribution is intractable. We handle this problem by using Kernel Density Estimation (KDE). The proposed approach is a powerful tool to learn complex distributions and the entire process is differentiable. Thus, it can be a better substitute of the variational inference in variational auto-encoders (VAE). One strong evidence of the benefit of our method is that the distributions learned by the proposed approach can generate better images even based on a pre-trained VAE's decoder. Based on this point, we devise a distribution learning auto-encoder which is better than VAE under the same network architecture. Experiments on standard dataset and synthetic data demonstrate the efficiency of the proposed approach.
CVApr 17, 2023
Frequency Regularization: Restricting Information Redundancy of Convolutional Neural NetworksChenqiu Zhao, Guanfang Dong, Shupei Zhang et al.
Convolutional neural networks have demonstrated impressive results in many computer vision tasks. However, the increasing size of these networks raises concerns about the information overload resulting from the large number of network parameters. In this paper, we propose Frequency Regularization to restrict the non-zero elements of the network parameters in the frequency domain. The proposed approach operates at the tensor level, and can be applied to almost all network architectures. Specifically, the tensors of parameters are maintained in the frequency domain, where high frequency components can be eliminated by zigzag setting tensor elements to zero. Then, the inverse discrete cosine transform (IDCT) is used to reconstruct the spatial tensors for matrix operations during network training. Since high frequency components of images are known to be less critical, a large proportion of these parameters can be set to zero when networks are trained with the proposed frequency regularization. Comprehensive evaluations on various state-of-the-art network architectures, including LeNet, Alexnet, VGG, Resnet, ViT, UNet, GAN, and VAE, demonstrate the effectiveness of the proposed frequency regularization. For a very small accuracy decrease (less than 2\%), a LeNet5 with 0.4M parameters can be represented by only 776 float16 numbers (over 1100$\times$ reduction), and a UNet with 34M parameters can be represented by only 759 float16 numbers (over 80000$\times$ reduction). In particular, the original size of the UNet model is 366MB, we reduce it to 4.5kb.
IVOct 31, 2023
Medical Image Denosing via Explainable AI Feature Preserving LossGuanfang Dong, Anup Basu
Denoising algorithms play a crucial role in medical image processing and analysis. However, classical denoising algorithms often ignore explanatory and critical medical features preservation, which may lead to misdiagnosis and legal liabilities. In this work, we propose a new denoising method for medical images that not only efficiently removes various types of noise, but also preserves key medical features throughout the process. To achieve this goal, we utilize a gradient-based eXplainable Artificial Intelligence (XAI) approach to design a feature preserving loss function. Our feature preserving loss function is motivated by the characteristic that gradient-based XAI is sensitive to noise. Through backpropagation, medical image features before and after denoising can be kept consistent. We conducted extensive experiments on three available medical image datasets, including synthesized 13 different types of noise and artifacts. The experimental results demonstrate the superiority of our method in terms of denoising performance, model explainability, and generalization.
LGAug 29, 2023
Bridging Distribution Learning and Image Clustering in High-dimensional SpaceGuanfang Dong, Chenqiu Zhao, Anup Basu
Distribution learning focuses on learning the probability density function from a set of data samples. In contrast, clustering aims to group similar objects together in an unsupervised manner. Usually, these two tasks are considered unrelated. However, the relationship between the two may be indirectly correlated, with Gaussian Mixture Models (GMM) acting as a bridge. In this paper, we focus on exploring the correlation between distribution learning and clustering, with the motivation to fill the gap between these two fields, utilizing an autoencoder (AE) to encode images into a high-dimensional latent space. Then, Monte-Carlo Marginalization (MCMarg) and Kullback-Leibler (KL) divergence loss are used to fit the Gaussian components of the GMM and learn the data distribution. Finally, image clustering is achieved through each Gaussian component of GMM. Yet, the "curse of dimensionality" poses severe challenges for most clustering algorithms. Compared with the classic Expectation-Maximization (EM) Algorithm, experimental results show that MCMarg and KL divergence can greatly alleviate the difficulty. Based on the experimental results, we believe distribution learning can exploit the potential of GMM in image clustering within high-dimensional space.
19.0CVApr 14
SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion DiarizationFarzaneh Jafari, Stefano Berretti, Anup Basu
We introduce SEDTalker, an emotion-aware framework for speech-driven 3D facial animation that leverages frame-level speech emotion diarization to achieve fine-grained expressive control. Unlike prior approaches that rely on utterance-level or manually specified emotion labels, our method predicts temporally dense emotion categories and intensities directly from speech, enabling continuous modulation of facial expressions over time. The diarized emotion signals are encoded as learned embeddings and used to condition a speech-driven 3D animation model based on a hybrid Transformer-Mamba architecture. This design allows effective disentanglement of linguistic content and emotional style while preserving identity and temporal coherence. We evaluate our approach on a large-scale multi-corpus dataset for speech emotion diarization and on the EmoVOCA dataset for emotional 3D facial animation. Quantitative results demonstrate strong frame-level emotion recognition performance and low geometric and temporal reconstruction errors, while qualitative results show smooth emotion transitions and consistent expression control. These findings highlight the effectiveness of frame-level emotion diarization for expressive and controllable 3D talking head generation.
CVAug 3, 2024
JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba ModelFarzaneh Jafari, Stefano Berretti, Anup Basu
In recent years, the talking head generation has become a focal point for researchers. Considerable effort is being made to refine lip-sync motion, capture expressive facial expressions, generate natural head poses, and achieve high-quality video. However, no single model has yet achieved equivalence across all quantitative and qualitative metrics. We introduce Jamba, a hybrid Transformer-Mamba model, to animate a 3D face. Mamba, a pioneering Structured State Space Model (SSM) architecture, was developed to overcome the limitations of conventional Transformer architectures, particularly in handling long sequences. This challenge has constrained traditional models. Jamba combines the advantages of both the Transformer and Mamba approaches, offering a comprehensive solution. Based on the foundational Jamba block, we present JambaTalk to enhance motion variety and lip sync through multimodal integration. Extensive experiments reveal that our method achieves performance comparable or superior to state-of-the-art models.
LGAug 6, 2024
Deep Clustering via Distribution LearningGuanfang Dong, Zijie Tan, Chenqiu Zhao et al.
Distribution learning finds probability density functions from a set of data samples, whereas clustering aims to group similar data points to form clusters. Although there are deep clustering methods that employ distribution learning methods, past work still lacks theoretical analysis regarding the relationship between clustering and distribution learning. Thus, in this work, we provide a theoretical analysis to guide the optimization of clustering via distribution learning. To achieve better results, we embed deep clustering guided by a theoretical analysis. Furthermore, the distribution learning method cannot always be directly applied to data. To overcome this issue, we introduce a clustering-oriented distribution learning method called Monte-Carlo Marginalization for Clustering. We integrate Monte-Carlo Marginalization for Clustering into Deep Clustering, resulting in Deep Clustering via Distribution Learning (DCDL). Eventually, the proposed DCDL achieves promising results compared to state-of-the-art methods on popular datasets. Considering a clustering task, the new distribution learning method outperforms previous methods as well.
CVAug 25, 2023
Is Deep Learning Network Necessary for Image Generation?Chenqiu Zhao, Guanfang Dong, Anup Basu
Recently, images are considered samples from a high-dimensional distribution, and deep learning has become almost synonymous with image generation. However, is a deep learning network truly necessary for image generation? In this paper, we investigate the possibility of image generation without using a deep learning network, motivated by validating the assumption that images follow a high-dimensional distribution. Since images are assumed to be samples from such a distribution, we utilize the Gaussian Mixture Model (GMM) to describe it. In particular, we employ a recent distribution learning technique named as Monte-Carlo Marginalization to capture the parameters of the GMM based on image samples. Moreover, we also use the Singular Value Decomposition (SVD) for dimensionality reduction to decrease computational complexity. During our evaluation experiment, we first attempt to model the distribution of image samples directly to verify the assumption that images truly follow a distribution. We then use the SVD for dimensionality reduction. The principal components, rather than raw image data, are used for distribution learning. Compared to methods relying on deep learning networks, our approach is more explainable, and its performance is promising. Experiments show that our images have a lower FID value compared to those generated by variational auto-encoders, demonstrating the feasibility of image generation without deep learning networks.
MLJun 10, 2018Code
IVUS-Net: An Intravascular Ultrasound Segmentation NetworkJi Yang, Lin Tong, Mehdi Faraji et al.
IntraVascular UltraSound (IVUS) is one of the most effective imaging modalities that provides assistance to experts in order to diagnose and treat cardiovascular diseases. We address a central problem in IVUS image analysis with Fully Convolutional Network (FCN): automatically delineate the lumen and media-adventitia borders in IVUS images, which is crucial to shorten the diagnosis process or benefits a faster and more accurate 3D reconstruction of the artery. Particularly, we propose an FCN architecture, called IVUS-Net, followed by a post-processing contour extraction step, in order to automatically segments the interior (lumen) and exterior (media-adventitia) regions of the human arteries. We evaluated our IVUS-Net on the test set of a standard publicly available dataset containing 326 IVUS B-mode images with two measurements, namely Jaccard Measure (JM) and Hausdorff Distances (HD). The evaluation result shows that IVUS-Net outperforms the state-of-the-art lumen and media segmentation methods by 4% to 20% in terms of HD distance. IVUS-Net performs well on images in the test set that contain a significant amount of major artifacts such as bifurcations, shadows, and side branches that are not common in the training set. Furthermore, using a modern GPU, IVUS-Net segments each IVUS frame only in 0.15 seconds. The proposed work, to the best of our knowledge, is the first deep learning based method for segmentation of both the lumen and the media vessel walls in 20 MHz IVUS B-mode images that achieves the best results without any manual intervention. Code is available at https://github.com/Kulbear/ivus-segmentation-icsm2018
CVJun 26, 2025
Exploring Image Generation via Mutually Exclusive Probability Spaces and Local Correlation HypothesisChenqiu Zhao, Anup Basu
A common assumption in probabilistic generative models for image generation is that learning the global data distribution suffices to generate novel images via sampling. We investigate the limitation of this core assumption, namely that learning global distributions leads to memorization rather than generative behavior. We propose two theoretical frameworks, the Mutually Exclusive Probability Space (MEPS) and the Local Dependence Hypothesis (LDH), for investigation. MEPS arises from the observation that deterministic mappings (e.g. neural networks) involving random variables tend to reduce overlap coefficients among involved random variables, thereby inducing exclusivity. We further propose a lower bound in terms of the overlap coefficient, and introduce a Binary Latent Autoencoder (BL-AE) that encodes images into signed binary latent representations. LDH formalizes dependence within a finite observation radius, which motivates our $γ$-Autoregressive Random Variable Model ($γ$-ARVM). $γ$-ARVM is an autoregressive model, with a variable observation range $γ$, that predicts a histogram for the next token. Using $γ$-ARVM, we observe that as the observation range increases, autoregressive models progressively shift toward memorization. In the limit of global dependence, the model behaves as a pure memorizer when operating on the binary latents produced by our BL-AE. Comprehensive experiments and discussions support our investigation.
CVSep 1, 2023
Affine-Transformation-Invariant Image Classification by Differentiable Arithmetic Distribution ModuleZijie Tan, Guanfang Dong, Chenqiu Zhao et al.
Although Convolutional Neural Networks (CNNs) have achieved promising results in image classification, they still are vulnerable to affine transformations including rotation, translation, flip and shuffle. The drawback motivates us to design a module which can alleviate the impact from different affine transformations. Thus, in this work, we introduce a more robust substitute by incorporating distribution learning techniques, focusing particularly on learning the spatial distribution information of pixels in images. To rectify the issue of non-differentiability of prior distribution learning methods that rely on traditional histograms, we adopt the Kernel Density Estimation (KDE) to formulate differentiable histograms. On this foundation, we present a novel Differentiable Arithmetic Distribution Module (DADM), which is designed to extract the intrinsic probability distributions from images. The proposed approach is able to enhance the model's robustness to affine transformations without sacrificing its feature extraction capabilities, thus bridging the gap between traditional CNNs and distribution-based learning. We validate the effectiveness of the proposed approach through ablation study and comparative experiments with LeNet.
CVDec 21, 2021
Real-time Street Human Motion CaptureYanquan Chen, Fei Yang, Tianyu Lang et al.
In recent years, motion capture technology using computers has developed rapidly. Because of its high efficiency and excellent performance, it replaces many traditional methods and is being widely used in many fields. Our project is about street scene video human motion capturing and analysis. The primary goal of the project is to capture the human motion in a video and use the motion information for 3D animation (human) in real-time. We applied a neural network for motion capture and implement it in the unity under a street view scene. By analyzing the motion data, we will have a better estimation of the street condition, which is useful for other high-tech applications such as self-driving cars.
CVApr 16, 2021
Universal Background Subtraction based on Arithmetic Distribution Neural NetworkChenqiu Zhao, Kangkang Hu, Anup Basu
We propose a universal background subtraction framework based on the Arithmetic Distribution Neural Network (ADNN) for learning the distributions of temporal pixels. In our ADNN model, the arithmetic distribution operations are utilized to introduce the arithmetic distribution layers, including the product distribution layer and the sum distribution layer. Furthermore, in order to improve the accuracy of the proposed approach, an improved Bayesian refinement model based on neighboring information, with a GPU implementation, is incorporated. In the forward pass and backpropagation of the proposed arithmetic distribution layers, histograms are considered as probability density functions rather than matrices. Thus, the proposed approach is able to utilize the probability information of the histogram and achieve promising results with a very simple architecture compared to traditional convolutional neural networks. Evaluations using standard benchmarks demonstrate the superiority of the proposed approach compared to state-of-the-art traditional and deep learning methods. To the best of our knowledge, this is the first method to propose network layers based on arithmetic distribution operations for learning distributions during background subtraction.
CVFeb 8, 2021
Subjective and Objective Visual Quality Assessment of Textured 3D MeshesJinjiang Guo, Vincent Vidal, Irene Cheng et al.
Objective visual quality assessment of 3D models is a fundamental issue in computer graphics. Quality assessment metrics may allow a wide range of processes to be guided and evaluated, such as level of detail creation, compression, filtering, and so on. Most computer graphics assets are composed of geometric surfaces on which several texture images can be mapped to 11 make the rendering more realistic. While some quality assessment metrics exist for geometric surfaces, almost no research has been conducted on the evaluation of texture-mapped 3D models. In this context, we present a new subjective study to evaluate the perceptual quality of textured meshes, based on a paired comparison protocol. We introduce both texture and geometry distortions on a set of 5 reference models to produce a database of 136 distorted models, evaluated using two rendering protocols. Based on analysis of the results, we propose two new metrics for visual quality assessment of textured mesh, as optimized linear combinations of accurate geometry and texture quality measurements. These proposed perceptual metrics outperform their counterparts in terms of correlation with human opinion. The database, along with the associated subjective scores, will be made publicly available online.
CVMay 1, 2019
A note on 'A fully parallel 3D thinning algorithm and its applications'Tao Wang, Anup Basu
A 3D thinning algorithm erodes a 3D binary image layer by layer to extract the skeletons. This paper presents a correction to Ma and Sonka's thinning algorithm, A fully parallel 3D thinning algorithm and its applications, which fails to preserve connectivity of 3D objects. We start with Ma and Sonka's algorithm and examine its verification of connectivity preservation. Our analysis leads to a group of different deleting templates, which can preserve connectivity of 3D objects.
IVMay 1, 2019
Fully Automatic Brain Tumor Segmentation using a Normalized Gaussian Bayesian Classifier and 3D Fluid Vector FlowTao Wang, Irene Cheng, Anup Basu
Brain tumor segmentation from Magnetic Resonance Images (MRIs) is an important task to measure tumor responses to treatments. However, automatic segmentation is very challenging. This paper presents an automatic brain tumor segmentation method based on a Normalized Gaussian Bayesian classification and a new 3D Fluid Vector Flow (FVF) algorithm. In our method, a Normalized Gaussian Mixture Model (NGMM) is proposed and used to model the healthy brain tissues. Gaussian Bayesian Classifier is exploited to acquire a Gaussian Bayesian Brain Map (GBBM) from the test brain MR images. GBBM is further processed to initialize the 3D FVF algorithm, which segments the brain tumor. This algorithm has two major contributions. First, we present a NGMM to model healthy brains. Second, we extend our 2D FVF algorithm to 3D space and use it for brain tumor segmentation. The proposed method is validated on a publicly available dataset.
CVJul 17, 2018
A Fast Segmentation-free Fully Automated Approach to White Matter Injury Detection in Preterm InfantsSubhayan Mukherjee, Irene Cheng, Steven Miller et al.
White Matter Injury (WMI) is the most prevalent brain injury in the preterm neonate leading to developmental deficits. However, detecting WMI in Magnetic Resonance (MR) images of preterm neonate brains using traditional WM segmentation-based methods is difficult mainly due to lack of reliable preterm neonate brain atlases to guide segmentation. Hence, we propose a segmentation-free, fast, unsupervised, atlas-free WMI detection method. We detect the ventricles as blobs using a fast linear Maximally Stable Extremal Regions algorithm. A reference contour equidistant from the blobs and the brain-background boundary is used to identify tissue adjacent to the blobs. Assuming normal distribution of the gray-value intensity of this tissue, the outlier intensities in the entire brain region are identified as potential WMI candidates. Thereafter, false positives are discriminated using appropriate heuristics. Experiments using an expert-annotated dataset show that the proposed method runs 20 times faster than our earlier work which relied on time-consuming segmentation of the WM region, without compromising WMI detection accuracy.
CVJun 29, 2018
Simplified Active CalibrationMehdi Faraji, Anup Basu
We present a new mathematical formulation to estimate the intrinsic parameters of a camera in active or robotic platforms. We show that the focal lengths can be estimated using only one point correspondence that relates images taken before and after a degenerate rotation of the camera. The estimated focal lengths are then treated as known parameters to obtain a linear set of equations to calculate the principal point. Assuming that the principal point is close to the image center, the accuracy of the linear equations are increased by integrating the image center into the formulation. We extensively evaluate the formulations on a simulated camera, 3D scenes and real-world images. Our error analysis over simulated and real images indicates that the proposed Simplified Active Calibration method estimates the parameters of a camera with low error rates that can be used as an initial guess for further non-linear refinement procedures. Simplified Active Calibration can be employed in real-time environments for automatic calibrations given the proposed closed-form solutions.
CVJun 19, 2018
Towards the identification of Parkinson's Disease using only T1 MR ImagesSara Soltaninejad, Irene Cheng, Anup Basu
Parkinson's Disease (PD) is one of the most common types of neurological diseases caused by progressive degeneration of dopamin- ergic neurons in the brain. Even though there is no fixed cure for this neurodegenerative disease, earlier diagnosis followed by earlier treatment can help patients have a better quality of life. Magnetic Resonance Imag- ing (MRI) has been one of the most popular diagnostic tool in recent years because it avoids harmful radiations. In this paper, we investi- gate the plausibility of using MRIs for automatically diagnosing PD. Our proposed method has three main steps : 1) Preprocessing, 2) Fea- ture Extraction, and 3) Classification. The FreeSurfer library is used for the first and the second steps. For classification, three main types of classifiers, including Logistic Regression (LR), Random Forest (RF) and Support Vector Machine (SVM), are applied and their classification abil- ity is compared. The Parkinsons Progression Markers Initiative (PPMI) data set is used to evaluate the proposed method. The proposed system prove to be promising in assisting the diagnosis of PD.
CVJun 10, 2018
Segmentation of Arterial Walls in Intravascular Ultrasound Cross-Sectional Images Using Extremal Region SelectionMehdi Faraji, Irene Cheng, Iris Naudin et al.
Intravascular Ultrasound (IVUS) is an intra-operative imaging modality that facilitates observing and appraising the vessel wall structure of the human coronary arteries. Segmentation of arterial wall boundaries from the IVUS images is not only crucial for quantitative analysis of the vessel walls and plaque characteristics, but is also necessary for generating 3D reconstructed models of the artery. The aim of this study is twofold. Firstly, we investigate the feasibility of using a recently proposed region detector, namely Extremal Region of Extremum Level (EREL) to delineate the luminal and media-adventitia borders in IVUS frames acquired by 20 MHz probes. Secondly, we propose a region selection strategy to label two ERELs as lumen and media based on the stability of their textural information. We extensively evaluated our selection strategy on the test set of a standard publicly available dataset containing 326 IVUS B-mode images. We showed that in the best case, the average Hausdorff Distances (HD) between the extracted ERELs and the actual lumen and media were $0.22$ mm and $0.45$ mm, respectively. The results of our experiments revealed that our selection strategy was able to segment the lumen with $\le 0.3$ mm HD to the gold standard even though the images contained major artifacts such as bifurcations, shadows, and side branches. Moreover, when there was no artifact, our proposed method was able to delineate media-adventitia boundaries with $0.31$ mm HD to the gold standard. Furthermore, our proposed segmentation method runs in time that is linear in the number of pixels in each frame. Based on the results of this work, by using a 20 MHz IVUS probe with controlled pullback, not only can we now analyze the internal structure of human arteries more accurately, but also segment each frame during the pullback procedure because of the low run time of our proposed segmentation method.
CVJun 10, 2018
A Simplified Active Calibration algorithm for Focal Length EstimationMehdi Faraji, Anup Basu
We introduce new linear mathematical formulations to calculate the focal length of a camera in an active platform. Through mathematical derivations, we show that the focal lengths in each direction can be estimated using only one point correspondence that relates images taken before and after a degenerate rotation of the camera. The new formulations will be beneficial in robotic and dynamic surveillance environments when the camera needs to be calibrated while it freely moves and zooms. By establishing a correspondence between only two images taken after slightly panning and tilting the camera and a reference image, our proposed Simplified Calibration Method is able to calculate the focal length of the camera. We extensively evaluate the derived formulations on a simulated camera, 3D scenes and real-world images. Our error analysis over simulated and real images indicates that the proposed Simplified Active Calibration formulation estimates the parameters of a camera with low error rates.
HCNov 30, 2017
Investigation of Gaze Patterns in Multi View Laparoscopic SurgeryNavaneeth Kamballur Kottayil, Rositsa Bogdanova, Irene Cheng et al.
Laparoscopic Surgery (LS) is a modern surgical technique whereby the surgery is performed through an incision with tools and camera as opposed to conventional open surgery. This promises minimal recovery times and less hemorrhaging. Multi view LS is the latest development in the field, where the system uses multiple cameras to give the surgeon more information about the surgical site, potentially making the surgery easier. In this publication, we study the gaze patterns of a high performing subject in a multi-view LS environment and compare it with that of a novice to detect the differences between the gaze behavior. This was done by conducting a user study with 20 university students with varying levels of expertise in Multi-view LS. The subjects performed an laparoscopic task in simulation with three cameras (front/top/side). The subjects were then separated as high and low performers depending on the performance times and their data was analyzed. Our results show statistically significant differences between the two behaviors. This opens up new areas from of training novices to Multi-view LS to making smart displays that guide your shows the optimum view depending on the situation.
MMNov 30, 2017
A Color Intensity Invariant Low Level Feature Optimization Framework for Image Quality AssessmentNavaneeth K. Kottayil, Irene Cheng, Frederic Dufaux et al.
Image Quality Assessment (IQA) algorithms evaluate the perceptual quality of an image using evaluation scores that assess the similarity or difference between two images. We propose a new low-level feature based IQA technique, which applies filter-bank decomposition and center-surround methodology. Differing from existing methods, our model incorporates color intensity adaptation and frequency scaling optimization at each filter-bank level and spatial orientation to extract and enhance perceptually significant features. Our computational model exploits the concept of object detection and encapsulates characteristics proposed in other IQA algorithms in a unified architecture. We also propose a systematic approach to review the evolution of IQA algorithms using unbiased test datasets, instead of looking at individual scores in isolation. Experimental results demonstrate the feasibility of our approach.
CVNov 28, 2017
Highlighting objects of interest in an image by integrating saliency and depthSubhayan Mukherjee, Irene Cheng, Anup Basu
Stereo images have been captured primarily for 3D reconstruction in the past. However, the depth information acquired from stereo can also be used along with saliency to highlight certain objects in a scene. This approach can be used to make still images more interesting to look at, and highlight objects of interest in the scene. We introduce this novel direction in this paper, and discuss the theoretical framework behind the approach. Even though we use depth from stereo in this work, our approach is applicable to depth data acquired from any sensor modality. Experimental results on both indoor and outdoor scenes demonstrate the benefits of our algorithm.
CVNov 28, 2017
Entropy-difference based stereo error detectionSubhayan Mukherjee, Irene Cheng, Ram Mohana Reddy Guddeti et al.
Stereo depth estimation is error-prone; hence, effective error detection methods are desirable. Most such existing methods depend on characteristics of the stereo matching cost curve, making them unduly dependent on functional details of the matching algorithm. As a remedy, we propose a novel error detection approach based solely on the input image and its depth map. Our assumption is that, entropy of any point on an image will be significantly higher than the entropy of its corresponding point on the image's depth map. In this paper, we propose a confidence measure, Entropy-Difference (ED) for stereo depth estimates and a binary classification method to identify incorrect depths. Experiments on the Middlebury dataset show the effectiveness of our method. Our proposed stereo confidence measure outperforms 17 existing measures in all aspects except occlusion detection. Established metrics such as precision, accuracy, recall, and area-under-curve are used to demonstrate the effectiveness of our method.