CVAug 24, 2023Code
StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map ConstructionTianyuan Yuan, Yicheng Liu, Yue Wang et al.
High-Definition (HD) maps are essential for the safety of autonomous driving systems. While existing techniques employ camera images and onboard sensors to generate vectorized high-precision maps, they are constrained by their reliance on single-frame input. This approach limits their stability and performance in complex scenarios such as occlusions, largely due to the absence of temporal information. Moreover, their performance diminishes when applied to broader perception ranges. In this paper, we present StreamMapNet, a novel online mapping pipeline adept at long-sequence temporal modeling of videos. StreamMapNet employs multi-point attention and temporal information which empowers the construction of large-range local HD maps with high stability and further addresses the limitations of existing methods. Furthermore, we critically examine widely used online HD Map construction benchmark and datasets, Argoverse2 and nuScenes, revealing significant bias in the existing evaluation protocols. We propose to resplit the benchmarks according to geographical spans, promoting fair and precise evaluations. Experimental results validate that StreamMapNet significantly outperforms existing methods across all settings while maintaining an online inference speed of $14.2$ FPS. Our code is available at https://github.com/yuantianyuan01/StreamMapNet.
CVMay 2, 2022Code
MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D QueriesTianyuan Zhang, Xuanyao Chen, Yue Wang et al.
Accurate and consistent 3D tracking from multiple cameras is a key component in a vision-based autonomous driving system. It involves modeling 3D dynamic objects in complex scenes across multiple cameras. This problem is inherently challenging due to depth estimation, visual occlusions, appearance ambiguity, etc. Moreover, objects are not consistently associated across time and cameras. To address that, we propose an end-to-end \textbf{MU}lti-camera \textbf{TR}acking framework called MUTR3D. In contrast to prior works, MUTR3D does not explicitly rely on the spatial and appearance similarity of objects. Instead, our method introduces \textit{3D track query} to model spatial and appearance coherent track for each object that appears in multiple cameras and multiple frames. We use camera transformations to link 3D trackers with their observations in 2D images. Each tracker is further refined according to the features that are obtained from camera images. MUTR3D uses a set-to-set loss to measure the difference between the predicted tracking results and the ground truths. Therefore, it does not require any post-processing such as non-maximum suppression and/or bounding box association. MUTR3D outperforms state-of-the-art methods by 5.3 AMOTA on the nuScenes dataset. Code is available at: \url{https://github.com/a1600012888/MUTR3D}.
CVApr 27, 2023
Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous DrivingXiaoyu Tian, Tao Jiang, Longfei Yun et al.
Robotic perception requires the modeling of both 3D geometry and semantics. Existing methods typically focus on estimating 3D bounding boxes, neglecting finer geometric details and struggling to handle general, out-of-vocabulary objects. 3D occupancy prediction, which estimates the detailed occupancy states and semantics of a scene, is an emerging task to overcome these limitations. To support 3D occupancy prediction, we develop a label generation pipeline that produces dense, visibility-aware labels for any given scene. This pipeline comprises three stages: voxel densification, occlusion reasoning, and image-guided voxel refinement. We establish two benchmarks, derived from the Waymo Open Dataset and the nuScenes Dataset, namely Occ3D-Waymo and Occ3D-nuScenes benchmarks. Furthermore, we provide an extensive analysis of the proposed dataset with various baseline models. Lastly, we propose a new model, dubbed Coarse-to-Fine Occupancy (CTF-Occ) network, which demonstrates superior performance on the Occ3D benchmarks. The code, data, and benchmarks are released at https://tsinghua-mars-lab.github.io/Occ3D/.
CVMar 28, 2023
Facial recognition technology and human raters can predict political orientation from images of expressionless faces even when controlling for demographics and self-presentationMichal Kosinski, Poruz Khambatta, Yilun Wang · stanford
Carefully standardized facial images of 591 participants were taken in the laboratory, while controlling for self-presentation, facial expression, head orientation, and image properties. They were presented to human raters and a facial recognition algorithm: both humans (r=.21) and the algorithm (r=.22) could predict participants' scores on a political orientation scale (Cronbach's alpha=.94) decorrelated with age, gender, and ethnicity. These effects are on par with how well job interviews predict job success, or alcohol drives aggressiveness. Algorithm's predictive accuracy was even higher (r=.31) when it leveraged information on participants' age, gender, and ethnicity. Moreover, the associations between facial appearance and political orientation seem to generalize beyond our sample: The predictive model derived from standardized images (while controlling for age, gender, and ethnicity) could predict political orientation (r=.13) from naturalistic images of 3,401 politicians from the U.S., UK, and Canada. The analysis of facial features associated with political orientation revealed that conservatives tended to have larger lower faces. The predictability of political orientation from standardized images has critical implications for privacy, the regulation of facial recognition technology, and understanding the origins and consequences of political orientation.
CVJun 17, 2022
VectorMapNet: End-to-end Vectorized HD Map LearningYicheng Liu, Tianyuan Yuan, Yue Wang et al.
Autonomous driving systems require High-Definition (HD) semantic maps to navigate around urban roads. Existing solutions approach the semantic mapping problem by offline manual annotation, which suffers from serious scalability issues. Recent learning-based methods produce dense rasterized segmentation predictions to construct maps. However, these predictions do not include instance information of individual map elements and require heuristic post-processing to obtain vectorized maps. To tackle these challenges, we introduce an end-to-end vectorized HD map learning pipeline, termed VectorMapNet. VectorMapNet takes onboard sensor observations and predicts a sparse set of polylines in the bird's-eye view. This pipeline can explicitly model the spatial relation between map elements and generate vectorized maps that are friendly to downstream autonomous driving tasks. Extensive experiments show that VectorMapNet achieve strong map learning performance on both nuScenes and Argoverse2 dataset, surpassing previous state-of-the-art methods by 14.2 mAP and 14.6mAP. Qualitatively, VectorMapNet is capable of generating comprehensive maps and capturing fine-grained details of road geometry. To the best of our knowledge, VectorMapNet is the first work designed towards end-to-end vectorized map learning from onboard observations. Our project website is available at \url{https://tsinghua-mars-lab.github.io/vectormapnet/}.
CVMar 20, 2022
FUTR3D: A Unified Sensor Fusion Framework for 3D DetectionXuanyao Chen, Tianyuan Zhang, Yue Wang et al.
Sensor fusion is an essential topic in many perception systems, such as autonomous driving and robotics. Existing multi-modal 3D detection models usually involve customized designs depending on the sensor combinations or setups. In this work, we propose the first unified end-to-end sensor fusion framework for 3D detection, named FUTR3D, which can be used in (almost) any sensor configuration. FUTR3D employs a query-based Modality-Agnostic Feature Sampler (MAFS), together with a transformer decoder with a set-to-set loss for 3D detection, thus avoiding using late fusion heuristics and post-processing tricks. We validate the effectiveness of our framework on various combinations of cameras, low-resolution LiDARs, high-resolution LiDARs, and Radars. On NuScenes dataset, FUTR3D achieves better performance over specifically designed methods across different sensor combinations. Moreover, FUTR3D achieves great flexibility with different sensor configurations and enables low-cost autonomous driving. For example, only using a 4-beam LiDAR with cameras, FUTR3D (58.0 mAP) achieves on par performance with state-of-the-art 3D detection model CenterPoint (56.6 mAP) using a 32-beam LiDAR.
CVAug 2, 2022
ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent QueriesJunru Gu, Chenxu Hu, Tianyuan Zhang et al.
Perception and prediction are two separate modules in the existing autonomous driving systems. They interact with each other via hand-picked features such as agent bounding boxes and trajectories. Due to this separation, prediction, as a downstream module, only receives limited information from the perception module. To make matters worse, errors from the perception modules can propagate and accumulate, adversely affecting the prediction results. In this work, we propose ViP3D, a query-based visual trajectory prediction pipeline that exploits rich information from raw videos to directly predict future trajectories of agents in a scene. ViP3D employs sparse agent queries to detect, track, and predict throughout the pipeline, making it the first fully differentiable vision-based trajectory prediction approach. Instead of using historical feature maps and trajectories, useful information from previous timestamps is encoded in agent queries, which makes ViP3D a concise streaming prediction method. Furthermore, extensive experimental results on the nuScenes dataset show the strong vision-based prediction performance of ViP3D over traditional pipelines and previous end-to-end models.
CVApr 17, 2023
Neural Map Prior for Autonomous DrivingXuan Xiong, Yicheng Liu, Tianyuan Yuan et al.
High-definition (HD) semantic maps are crucial in enabling autonomous vehicles to navigate urban environments. The traditional method of creating offline HD maps involves labor-intensive manual annotation processes, which are not only costly but also insufficient for timely updates. Recent studies have proposed an alternative approach that generates local maps using online sensor observations. However, this approach is limited by the sensor's perception range and its susceptibility to occlusions. In this study, we propose Neural Map Prior (NMP), a neural representation of global maps. This representation automatically updates itself and improves the performance of local map inference. Specifically, we utilize two approaches to achieve this. Firstly, to integrate a strong map prior into local map inference, we apply cross-attention, a mechanism that dynamically identifies correlations between current and prior features. Secondly, to update the global neural map prior, we utilize a learning-based fusion module that guides the network in fusing features from previous traversals. Our experimental results, based on the nuScenes dataset, demonstrate that our framework is highly compatible with various map segmentation and detection architectures. It significantly improves map prediction performance, even in challenging weather conditions and situations with a longer perception range. To the best of our knowledge, this is the first learning-based system for creating a global map prior.
SEJan 20Code
Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMsGuangba Yu, Zirui Wang, Yujie Huang et al.
The democratization of open-source Large Language Models (LLMs) allows users to fine-tune and deploy models on local infrastructure but exposes them to a First Mile deployment landscape. Unlike black-box API consumption, the reliability of user-managed orchestration remains a critical blind spot. To bridge this gap, we conduct the first large-scale empirical study of 705 real-world failures from the open-source DeepSeek, Llama, and Qwen ecosystems. Our analysis reveals a paradigm shift: white-box orchestration relocates the reliability bottleneck from model algorithmic defects to the systemic fragility of the deployment stack. We identify three key phenomena: (1) Diagnostic Divergence: runtime crashes distinctively signal infrastructure friction, whereas incorrect functionality serves as a signature for internal tokenizer defects. (2) Systemic Homogeneity: Root causes converge across divergent series, confirming reliability barriers are inherent to the shared ecosystem rather than specific architectures. (3) Lifecycle Escalation: Barriers escalate from intrinsic configuration struggles during fine-tuning to compounded environmental incompatibilities during inference. Supported by our publicly available dataset, these insights provide actionable guidance for enhancing the reliability of the LLM landscape.
ITJul 5, 2010Code
Sparse Signal Reconstruction via Iterative Support DetectionYilun Wang, Wotao Yin
We present a novel sparse signal reconstruction method "ISD", aiming to achieve fast reconstruction and a reduced requirement on the number of measurements compared to the classical l_1 minimization approach. ISD addresses failed reconstructions of l_1 minimization due to insufficient measurements. It estimates a support set I from a current reconstruction and obtains a new reconstruction by solving the minimization problem \min{\sum_{i\not\in I}|x_i|:Ax=b}, and it iterates these two steps for a small number of times. ISD differs from the orthogonal matching pursuit (OMP) method, as well as its variants, because (i) the index set I in ISD is not necessarily nested or increasing and (ii) the minimization problem above updates all the components of x at the same time. We generalize the Null Space Property to Truncated Null Space Property and present our analysis of ISD based on the latter. We introduce an efficient implementation of ISD, called threshold--ISD, for recovering signals with fast decaying distributions of nonzeros from compressive sensing measurements. Numerical experiments show that threshold--ISD has significant advantages over the classical l_1 minimization approach, as well as two state--of--the--art algorithms: the iterative reweighted l_1 minimization algorithm (IRL1) and the iterative reweighted least--squares algorithm (IRLS). MATLAB code is available for download from http://www.caam.rice.edu/~optimization/L1/ISD/.
CVJun 2, 2025
Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text BenchmarkShuyu Yang, Yilun Wang, Yaxiong Wang et al.
Video anomaly retrieval aims to localize anomalous events in videos using natural language queries to facilitate public safety. However, existing datasets suffer from severe limitations: (1) data scarcity due to the long-tail nature of real-world anomalies, and (2) privacy constraints that impede large-scale collection. To address the aforementioned issues in one go, we introduce SVTA (Synthetic Video-Text Anomaly benchmark), the first large-scale dataset for cross-modal anomaly retrieval, leveraging generative models to overcome data availability challenges. Specifically, we collect and generate video descriptions via the off-the-shelf LLM (Large Language Model) covering 68 anomaly categories, e.g., throwing, stealing, and shooting. These descriptions encompass common long-tail events. We adopt these texts to guide the video generative model to produce diverse and high-quality videos. Finally, our SVTA involves 41,315 videos (1.36M frames) with paired captions, covering 30 normal activities, e.g., standing, walking, and sports, and 68 anomalous events, e.g., falling, fighting, theft, explosions, and natural disasters. We adopt three widely-used video-text retrieval baselines to comprehensively test our SVTA, revealing SVTA's challenging nature and its effectiveness in evaluating a robust cross-modal retrieval method. SVTA eliminates privacy risks associated with real-world anomaly collection while maintaining realistic scenarios. The dataset demo is available at: [https://svta-mm.github.io/SVTA.github.io/].
PMMar 4, 2024
RVRAE: A Dynamic Factor Model Based on Variational Recurrent Autoencoder for Stock Returns PredictionYilun Wang, Shengjie Guo
In recent years, the dynamic factor model has emerged as a dominant tool in economics and finance, particularly for investment strategies. This model offers improved handling of complex, nonlinear, and noisy market conditions compared to traditional static factor models. The advancement of machine learning, especially in dealing with nonlinear data, has further enhanced asset pricing methodologies. This paper introduces a groundbreaking dynamic factor model named RVRAE. This model is a probabilistic approach that addresses the temporal dependencies and noise in market data. RVRAE ingeniously combines the principles of dynamic factor modeling with the variational recurrent autoencoder (VRAE) from deep learning. A key feature of RVRAE is its use of a prior-posterior learning method. This method fine-tunes the model's learning process by seeking an optimal posterior factor model informed by future data. Notably, RVRAE is adept at risk modeling in volatile stock markets, estimating variances from latent space distributions while also predicting returns. Our empirical tests with real stock market data underscore RVRAE's superior performance compared to various established baseline methods.
CVOct 13, 2021
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D QueriesYue Wang, Vitor Guizilini, Tianyuan Zhang et al.
We introduce a framework for multi-camera 3D object detection. In contrast to existing works, which estimate 3D bounding boxes directly from monocular images or use depth prediction networks to generate input for 3D object detection from 2D information, our method manipulates predictions directly in 3D space. Our architecture extracts 2D features from multiple camera images and then uses a sparse set of 3D object queries to index into these 2D features, linking 3D positions to multi-view images using camera transformation matrices. Finally, our model makes a bounding box prediction per object query, using a set-to-set loss to measure the discrepancy between the ground-truth and the prediction. This top-down approach outperforms its bottom-up counterpart in which object bounding box prediction follows per-pixel depth estimation, since it does not suffer from the compounding error introduced by a depth prediction model. Moreover, our method does not require post-processing such as non-maximum suppression, dramatically improving inference speed. We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.
CVJul 13, 2021
HDMapNet: An Online HD Map Construction and Evaluation FrameworkQi Li, Yue Wang, Yilun Wang et al.
Constructing HD semantic maps is a central component of autonomous driving. However, traditional pipelines require a vast amount of human efforts and resources in annotating and maintaining the semantics in the map, which limits its scalability. In this paper, we introduce the problem of HD semantic map learning, which dynamically constructs the local semantics based on onboard sensor observations. Meanwhile, we introduce a semantic map learning method, dubbed HDMapNet. HDMapNet encodes image features from surrounding cameras and/or point clouds from LiDAR, and predicts vectorized map elements in the bird's-eye view. We benchmark HDMapNet on nuScenes dataset and show that in all settings, it performs better than baseline methods. Of note, our camera-LiDAR fusion-based HDMapNet outperforms existing methods by more than 50% in all metrics. In addition, we develop semantic-level and instance-level metrics to evaluate the map learning performance. Finally, we showcase our method is capable of predicting a locally consistent map. By introducing the method and metrics, we invite the community to study this novel map learning problem.
CVSep 7, 2017
Fine-Grained Car Detection for Visual Census EstimationTimnit Gebru, Jonathan Krause, Yilun Wang et al.
Targeted socioeconomic policies require an accurate understanding of a country's demographic makeup. To that end, the United States spends more than 1 billion dollars a year gathering census data such as race, gender, education, occupation and unemployment rates. Compared to the traditional method of collecting surveys across many years which is costly and labor intensive, data-driven, machine learning driven approaches are cheaper and faster--with the potential ability to detect trends in close to real time. In this work, we leverage the ubiquity of Google Street View images and develop a computer vision pipeline to predict income, per capita carbon emission, crime rates and other city attributes from a single source of publicly available visual data. We first detect cars in 50 million images across 200 of the largest US cities and train a model to predict demographic attributes using the detected cars. To facilitate our work, we have collected the largest and most challenging fine-grained dataset reported to date consisting of over 2600 classes of cars comprised of images from Google Street View and other web sources, classified by car experts to account for even the most subtle of visual differences. We use this data to construct the largest scale fine-grained detection system reported to date. Our prediction results correlate well with ground truth income data (r=0.82), Massachusetts department of vehicle registration, and sources investigating crime rates, income segregation, per capita carbon emission, and other market research. Finally, we learn interesting relationships between cars and neighborhoods allowing us to perform the first large scale sociological analysis of cities using computer vision techniques.
NAJun 25, 2017
Enhanced joint sparsity via Iterative Support DetectionYaru Fan, Yilun Wang, Tingzhu Huang
Joint sparsity has attracted considerable attention in recent years in many fields including sparse signal recovery in compressed sensing (CS), statistics, and machine learning. Traditional convex models suffer from the suboptimal performance though enjoying tractable computation. In this paper, we propose a new non-convex joint sparsity model, and develop a corresponding multi-stage adaptive convex relaxation algorithm. This method extends the idea of iterative support detection (ISD) from the single vector estimation to the multi-vector estimation by considering the joint sparsity prior. We provide some preliminary theoretical analysis including convergence analysis and a sufficient recovery condition. Numerical experiments from both compressive sensing and feature learning show the better performance of the proposed method in comparison with several state-of-the-art alternatives. Moreover, we demonstrate that the extension of ISD from the single vector to multi-vector estimation is not trivial. In particular, while ISD does not work well for reconstructing the signal channel sparse Bernoulli signal, it does achieve significantly improved performance when recovering the multi-channel sparse Bernoulli signal thanks to its ability of natural incorporation of the joint sparsity structure.
CVFeb 22, 2017
Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the USTimnit Gebru, Jonathan Krause, Yilun Wang et al.
The United States spends more than $1B each year on initiatives such as the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed half a decade. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may provide a cheaper and faster alternative. Here, we present a method that determines socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to accurately estimate income, race, education, and voting patterns, with single-precinct resolution. (The average US precinct contains approximately 1000 people.) The resulting associations are surprisingly simple and powerful. For instance, if the number of sedans encountered during a 15-minute drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next Presidential election (88% chance); otherwise, it is likely to vote Republican (82%). Our results suggest that automated systems for monitoring demographic trends may effectively complement labor-intensive approaches, with the potential to detect trends with fine spatial resolution, in close to real time.
SIOct 17, 2016
Psycho-Demographic Analysis of the Facebook Rainbow CampaignYilun Wang, Himabindu Lakkaraju, Michal Kosinski et al.
Over the past decade, online social media has had a tremendous impact on the way people engage in social activism. For instance, about 26M Facebook users expressed their support in upholding the cause of marriage equality by overlaying their profile pictures with rainbow-colored filters. Similarly, hundreds of thousands of users changed their profile pictures to a black dot condemning incidents of sexual violence in India. This act of demonstrating support for social causes by changing online profile pictures is being referred to as pictivism. In this paper, we analyze the psycho-demographic profiles, social networking behavior, and personal interests of users who participated in the Facebook Rainbow campaign. Our study is based on a sample of about 800K detailed profiles of Facebook users combining questionnaire-based psychological scores with Facebook profile data. Our analysis provides detailed insights into psycho-demographic profiles of the campaign participants. We found that personality traits such as openness and neuroticism are both positively associated with the likelihood of supporting the campaign, while conscientiousness exhibited a negative correlation. We also observed that females, religious disbelievers, democrats and adults in the age group of 20 to 30 years are more likely to be a part of the campaign. Our research further confirms the findings of several previous studies which suggest that a user is more likely to participate in an online campaign if a large fraction of his/her friends are already doing so. We also developed machine learning models for predicting campaign participation. Users' personal interests, approximated by Facebook user like activity, turned out to be the best indicator of campaign participation. Our results demonstrated that a predictive model which leverages the aforementioned features accurately identifies campaign participants (AUC=0.76).
CVMar 26, 2016
Support Driven Wavelet Frame-based Image DeblurringLiangtian He, Yilun Wang, Zhaoyin Xiang
The wavelet frame systems have been playing an active role in image restoration and many other image processing fields over the past decades, owing to the good capability of sparsely approximating piece-wise smooth functions such as images. In this paper, we propose a novel wavelet frame based sparse recovery model called \textit{Support Driven Sparse Regularization} (SDSR) for image deblurring, where the partial support information of frame coefficients is attained via a self-learning strategy and exploited via the proposed truncated $\ell_0$ regularization. Moreover, the state-of-the-art image restoration methods can be naturally incorporated into our proposed wavelet frame based sparse recovery framework. In particular, in order to achieve reliable support estimation of the frame coefficients, we make use of the state-of-the-art image restoration result such as that from the IDD-BM3D method as the initial reference image for support estimation. Our extensive experimental results have shown convincing improvements over existing state-of-the-art deblurring methods.
CVOct 10, 2015
Wavelet Frame Based Image Restoration Using Sparsity, Nonlocal and Support Prior of Frame CoefficientsLiangtian He, Yilun Wang
The wavelet frame systems have been widely investigated and applied for image restoration and many other image processing problems over the past decades, attributing to their good capability of sparsely approximating piece-wise smooth functions such as images. Most wavelet frame based models exploit the $l_1$ norm of frame coefficients for a sparsity constraint in the past. The authors in \cite{ZhangY2013, Dong2013} proposed an $l_0$ minimization model, where the $l_0$ norm of wavelet frame coefficients is penalized instead, and have demonstrated that significant improvements can be achieved compared to the commonly used $l_1$ minimization model. Very recently, the authors in \cite{Chen2015} proposed $l_0$-$l_2$ minimization model, where the nonlocal prior of frame coefficients is incorporated. This model proved to outperform the single $l_0$ minimization based model in terms of better recovered image quality. In this paper, we propose a truncated $l_0$-$l_2$ minimization model which combines sparsity, nonlocal and support prior of the frame coefficients. The extensive experiments have shown that the recovery results from the proposed regularization method performs better than existing state-of-the-art wavelet frame based methods, in terms of edge enhancement and texture preserving performance.
CVJun 27, 2015
A Novel Approach for Stable Selection of Informative Redundant Features from High Dimensional fMRI DataYilun Wang, Zhiqiang Li, Yifeng Wang et al.
Feature selection is among the most important components because it not only helps enhance the classification accuracy, but also or even more important provides potential biomarker discovery. However, traditional multivariate methods is likely to obtain unstable and unreliable results in case of an extremely high dimensional feature space and very limited training samples, where the features are often correlated or redundant. In order to improve the stability, generalization and interpretations of the discovered potential biomarker and enhance the robustness of the resultant classifier, the redundant but informative features need to be also selected. Therefore we introduced a novel feature selection method which combines a recent implementation of the stability selection approach and the elastic net approach. The advantage in terms of better control of false discoveries and missed discoveries of our approach, and the resulted better interpretability of the obtained potential biomarker is verified in both synthetic and real fMRI experiments. In addition, we are among the first to demonstrate the robustness of feature selection benefiting from the incorporation of stability selection and also among the first to demonstrate the possible unrobustness of the classical univariate two-sample t-test method. Specifically, we show the robustness of our feature selection results in existence of noisy (wrong) training labels, as well as the robustness of the resulted classifier based on our feature selection results in the existence of data variation, demonstrated by a multi-center attention-deficit/hyperactivity disorder (ADHD) fMRI data.
CVJun 7, 2015
Randomized Structural Sparsity based Support Identification with Applications to Locating Activated or Discriminative Brain Areas: A Multi-center Reproducibility StudyYilun Wang, Sheng Zhang, Junjie Zheng et al.
In this paper, we focus on how to locate the relevant or discriminative brain regions related with external stimulus or certain mental decease, which is also called support identification, based on the neuroimaging data. The main difficulty lies in the extremely high dimensional voxel space and relatively few training samples, easily resulting in an unstable brain region discovery (or called feature selection in context of pattern recognition). When the training samples are from different centers and have betweencenter variations, it will be even harder to obtain a reliable and consistent result. Corresponding, we revisit our recently proposed algorithm based on stability selection and structural sparsity. It is applied to the multi-center MRI data analysis for the first time. A consistent and stable result is achieved across different centers despite the between-center data variation while many other state-of-the-art methods such as two sample t-test fail. Moreover, we have empirically showed that the performance of this algorithm is robust and insensitive to several of its key parameters. In addition, the support identification results on both functional MRI and structural MRI are interpretable and can be the potential biomarkers.
CVApr 27, 2015
Linear Spatial Pyramid Matching Using Non-convex and non-negative Sparse Coding for Image ClassificationChengqiang Bao, Liangtian He, Yilun Wang
Recently sparse coding have been highly successful in image classification mainly due to its capability of incorporating the sparsity of image representation. In this paper, we propose an improved sparse coding model based on linear spatial pyramid matching(SPM) and Scale Invariant Feature Transform (SIFT ) descriptors. The novelty is the simultaneous non-convex and non-negative characters added to the sparse coding model. Our numerical experiments show that the improved approach using non-convex and non-negative sparse coding is superior than the original ScSPM[1] on several typical databases.
MLOct 27, 2014
Sensitivity Analysis for Computationally Expensive Models using Optimization and Objective-oriented Surrogate ApproximationsYilun Wang, Christine A. Shoemaker
In this paper, we focus on developing efficient sensitivity analysis methods for a computationally expensive objective function $f(x)$ in the case that the minimization of it has just been performed. Here "computationally expensive" means that each of its evaluation takes significant amount of time, and therefore our main goal to use a small number of function evaluations of $f(x)$ to further infer the sensitivity information of these different parameters. Correspondingly, we consider the optimization procedure as an adaptive experimental design and re-use its available function evaluations as the initial design points to establish a surrogate model $s(x)$ (or called response surface). The sensitivity analysis is performed on $s(x)$, which is an lieu of $f(x)$. Furthermore, we propose a new local multivariate sensitivity measure, for example, around the optimal solution, for high dimensional problems. Then a corresponding "objective-oriented experimental design" is proposed in order to make the generated surrogate $s(x)$ better suitable for the accurate calculation of the proposed specific local sensitivity quantities. In addition, we demonstrate the better performance of the Gaussian radial basis function interpolator over Kriging in our cases, which are of relatively high dimensionality and few experimental design points. Numerical experiments demonstrate that the optimization procedure and the "objective-oriented experimental design" behavior much better than the classical Latin Hypercube Design. In addition, the performance of Kriging is not as good as Gaussian RBF, especially in the case of high dimensional problems.
MLOct 23, 2014
A General Stochastic Algorithmic Framework for Minimizing Expensive Black Box Objective Functions Based on Surrogate Models and Sensitivity AnalysisYilun Wang, Christine A. Shoemaker
We are focusing on bound constrained global optimization problems, whose objective functions are computationally expensive black-box functions and have multiple local minima. The recently popular Metric Stochastic Response Surface (MSRS) algorithm proposed by \cite{Regis2007SRBF} based on adaptive or sequential learning based on response surfaces is revisited and further extended for better performance in case of higher dimensional problems. Specifically, we propose a new way to generate the candidate points which the next function evaluation point is picked from according to the metric criteria, based on a new definition of distance, and prove the global convergence of the corresponding. Correspondingly, a more adaptive implementation of MSRS, named "SO-SA", is presented. "SO-SA" is is more likely to perturb those most sensitive coordinates when generating the candidate points, instead of perturbing all coordinates simultaneously. Numerical experiments on both synthetic problems and real problems demonstrate the advantages of our new algorithm, compared with many state of the art alternatives.}
CVOct 17, 2014
Randomized Structural Sparsity via Constrained Block Subsampling for Improved Sensitivity of Discriminative Voxel IdentificationYilun Wang, Junjie Zheng, Sheng Zhang et al.
In this paper, we consider voxel selection for functional Magnetic Resonance Imaging (fMRI) brain data with the aim of finding a more complete set of probably correlated discriminative voxels, thus improving interpretation of the discovered potential biomarkers. The main difficulty in doing this is an extremely high dimensional voxel space and few training samples, resulting in unreliable feature selection. In order to deal with the difficulty, stability selection has received a great deal of attention lately, especially due to its finite sample control of false discoveries and transparent principle for choosing a proper amount of regularization. However, it fails to make explicit use of the correlation property or structural information of these discriminative features and leads to large false negative rates. In other words, many relevant but probably correlated discriminative voxels are missed. Thus, we propose a new variant on stability selection "randomized structural sparsity", which incorporates the idea of structural sparsity. Numerical experiments demonstrate that our method can be superior in controlling for false negatives while also keeping the control of false positives inherited from stability selection.
LGJun 16, 2014
Multi-stage Multi-task feature learning via adaptive thresholdYaru Fan, Yilun Wang
Multi-task feature learning aims to identity the shared features among tasks to improve generalization. It has been shown that by minimizing non-convex learning models, a better solution than the convex alternatives can be obtained. Therefore, a non-convex model based on the capped-$\ell_{1},\ell_{1}$ regularization was proposed in \cite{Gong2013}, and a corresponding efficient multi-stage multi-task feature learning algorithm (MSMTFL) was presented. However, this algorithm harnesses a prescribed fixed threshold in the definition of the capped-$\ell_{1},\ell_{1}$ regularization and the lack of adaptivity might result in suboptimal performance. In this paper we propose to employ an adaptive threshold in the capped-$\ell_{1},\ell_{1}$ regularized formulation, where the corresponding variant of MSMTFL will incorporate an additional component to adaptively determine the threshold value. This variant is expected to achieve a better feature selection performance over the original MSMTFL algorithm. In particular, the embedded adaptive threshold component comes from our previously proposed iterative support detection (ISD) method \cite{Wang2010}. Empirical studies on both synthetic and real-world data sets demonstrate the effectiveness of this new variant over the original MSMTFL.
CVJun 11, 2014
Truncated Nuclear Norm Minimization for Image Restoration Based On Iterative Support DetectionYilun Wang, Xinhua Su
Recovering a large matrix from limited measurements is a challenging task arising in many real applications, such as image inpainting, compressive sensing and medical imaging, and this kind of problems are mostly formulated as low-rank matrix approximation problems. Due to the rank operator being non-convex and discontinuous, most of the recent theoretical studies use the nuclear norm as a convex relaxation and the low-rank matrix recovery problem is solved through minimization of the nuclear norm regularized problem. However, a major limitation of nuclear norm minimization is that all the singular values are simultaneously minimized and the rank may not be well approximated \cite{hu2012fast}. Correspondingly, in this paper, we propose a new multi-stage algorithm, which makes use of the concept of Truncated Nuclear Norm Regularization (TNNR) proposed in \citep{hu2012fast} and Iterative Support Detection (ISD) proposed in \citep{wang2010sparse} to overcome the above limitation. Besides matrix completion problems considered in \citep{hu2012fast}, the proposed method can be also extended to the general low-rank matrix recovery problems. Extensive experiments well validate the superiority of our new algorithms over other state-of-the-art methods.
CVFeb 17, 2014
FTVd is beyond Fast Total Variation regularized DeconvolutionYilun Wang
In this paper, we revisit the "FTVd" algorithm for Fast Total Variation Regularized Deconvolution, which has been widely used in the past few years. Both its original version implemented in the MATLAB software FTVd 3.0 and its related variant implemented in the latter version FTVd 4.0 are considered \cite{Wang08FTVdsoftware}. We propose that the intermediate results during the iterations are the solutions of a series of combined Tikhonov and total variation regularized image deconvolution models and therefore some of them often have even better image quality than the final solution, which is corresponding to the pure total variation regularized model.