Xiaoming Wang

CV
h-index27
24papers
1,187citations
Novelty47%
AI Score56

24 Papers

CVMay 27, 2022
Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images

Zhi Tian, Xiangxiang Chu, Xiaoming Wang et al.

We present a simple yet effective fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes, termed FCOS-LiDAR. Unlike the dominant methods that use the bird-eye view (BEV), our proposed detector detects objects from the range view (RV, a.k.a. range image) of the LiDAR points. Due to the range view's compactness and compatibility with the LiDAR sensors' sampling process on self-driving cars, the range view-based object detector can be realized by solely exploiting the vanilla 2D convolutions, departing from the BEV-based methods which often involve complicated voxelization operations and sparse convolutions. For the first time, we show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors while being significantly faster and simpler. More importantly, almost all previous range view-based detectors only focus on single-frame point clouds, since it is challenging to fuse multi-frame point clouds into a single range view. In this work, we tackle this challenging issue with a novel range view projection mechanism, and for the first time demonstrate the benefits of fusing multi-frame point clouds for a range-view based detector. Extensive experiments on nuScenes show the superiority of our proposed method and we believe that our work can be strong evidence that an RV-based 3D detector can compare favourably with the current mainstream BEV-based detectors.

NANov 2, 2012
Efficient and Long-Time Accurate Second-Order Methods for Stokes-Darcy System

Wenbin Chen, Max Gunzburger, Dong Sun et al.

We propose and study two second-order in time implicit-explicit (IMEX) methods for the coupled Stokes-Darcy system that governs flows in karst aquifers. The first is a combination of a second-order backward differentiation formula and the second-order Gear's extrapolation approach. The second is a combination of the second-order Adams-Moulton and second-order Adams-Bashforth methods. Both algorithms only require the solution of two decoupled problems at each time step, one Stokes and the other Darcy. Hence, these schemes are very efficient and can be easily implemented using legacy codes. We establish the unconditional and uniform in time stability for both schemes. The uniform in time stability leads to uniform in time control of the error which is highly desirable for modeling physical processes, e.g., contaminant sequestration and release, that occur over very long time scales. Error estimates for fully-discretized schemes using finite element spatial discretizations are derived. Numerical examples are provided that illustrate the accuracy, efficiency, and long-time stability of the two schemes.

NAJun 8, 2016
Convergence Analysis and Error Estimates for a Second Order Accurate Finite Element Method for the Cahn-Hilliard-Navier-Stokes System

Amanda E. Diegel, Cheng Wang, Xiaoming Wang et al.

In this paper, we present a novel second order in time mixed finite element scheme for the Cahn-Hilliard-Navier-Stokes equations with matched densities. The scheme combines a standard second order Crank-Nicholson method for the Navier-Stokes equations and a modification to the Crank-Nicholson method for the Cahn-Hilliard equation. In particular, a second order Adams-Bashforth extrapolation and a trapezoidal rule are included to help preserve the energy stability natural to the Cahn-Hilliard equation. We show that our scheme is unconditionally energy stable with respect to a modification of the continuous free energy of the PDE system. Specifically, the discrete phase variable is shown to be bounded in $\ell^\infty \left(0,T;L^\infty\right)$ and the discrete chemical potential bounded in $\ell^\infty \left(0,T;L^2\right)$, for any time and space step sizes, in two and three dimensions, and for any finite final time $T$. We subsequently prove that these variables along with the fluid velocity converge with optimal rates in the appropriate energy norms in both two and three dimensions.

AIJul 18, 2024Code
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

Guoli Yin, Haoping Bai, Shuang Ma et al.

Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern where failures stem from. Additionally, setting up these environments requires considerable effort, and issues of unreliability and reproducibility sometimes arise, especially in interactive tasks. To address these limitations, we introduce the Massive Multitask Agent Understanding (MMAU) benchmark, featuring comprehensive offline tasks that eliminate the need for complex environment setups. It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics, and covers five essential capabilities: Understanding, Reasoning, Planning, Problem-solving, and Self-correction. With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents. By testing 18 representative models on MMAU, we provide deep and insightful analyses. Ultimately, MMAU not only sheds light on the capabilities and limitations of LLM agents but also enhances the interpretability of their performance. Datasets and evaluation scripts of MMAU are released at https://github.com/apple/axlearn/tree/main/docs/research/mmau.

NAJul 25, 2014
A second order in time, uniquely solvable, unconditionally stable numerical scheme for Cahn-Hilliard-Navier-Stokes equation

Daozhi Han, Xiaoming Wang

We propose a novel second order in time numerical scheme for Cahn-Hilliard-Navier- Stokes phase field model with matched density. The scheme is based on second order convex-splitting for the Cahn-Hilliard equation and pressure-projection for the Navier-Stokes equation. We show that the scheme is mass-conservative, satisfies a modified energy law and is therefore unconditionally stable. Moreover, we prove that the scheme is uncondition- ally uniquely solvable at each time step by exploring the monotonicity associated with the scheme. Thanks to the weak coupling of the scheme, we design an efficient Picard iteration procedure to further decouple the computation of Cahn-Hilliard equation and Navier-Stokes equation. We implement the scheme by the mixed finite element method. Ample numerical experiments are performed to validate the accuracy and efficiency of the numerical scheme.

NAMay 22, 2011
Long time stability of a classical efficient scheme for two dimensional Navier-Stokes equations

Sigal Gottlieb, Florentina Tone, Cheng Wang et al.

We prove that a popular classical implicit-explicit scheme for the 2D incompressible Navier--Stokes equations that treats the viscous term implicitly while the nonlinear advection term explicitly is long time stable provided that the time step is sufficiently small in the case with periodic boundary conditions. The long time stability in the $L^2$ and $H^1$ norms further leads to the convergence of the global attractors and invariant measures of the scheme to those of the NSE itself at vanishing time step. Both semi-discrete in time and fully discrete schemes with either Galerkin Fourier spectral or collocation Fourier spectral methods are considered.

NAAug 27, 2011
An efficient second order in time scheme for approximating long time statistical properties of the two dimensional Navier-Stokes equations

Xiaoming Wang

We investigate the long tim behavior of the following efficient second order in time scheme for the 2D Navier-Stokes equation in a periodic box: $$ \frac{3ω^{n+1}-4ω^n+ω^{n-1}}{2k} + \nabla^\perp(2ψ^n-ψ^{n-1})\cdot\nabla(2ω^n-ω^{n-1}) - νΔω^{n+1} = f^{n+1}, \quad -Δψ^n = \om^n. $$ The scheme is a combination of a 2nd order in time backward-differentiation (BDF) and a special explicit Adams-Bashforth treatment of the advection term. Therefore only a linear constant coefficient Poisson type problem needs to be solved at each time step. We prove uniform in time bounds on this scheme in $\dL2$, $\dH1$ and $\dot{H}^2_{per}$ provided that the time-step is sufficiently small. These time uniform estimates further lead to the convergence of long time statistics (stationary statistical properties) of the scheme to that of the NSE itself at vanishing time-step. Fully discrete schemes with either Galerkin Fourier or collocation Fourier spectral method are also discussed.

NAOct 18, 2016
Uniquely solvable and energy stable decoupled schemes for Cahn-Hilliard-Stokes-Darcy system for two-phase flows in karstic geometry

Wenbin Chen, Daozhi Han, Xiaoming Wang

We propose and analyze two novel decoupled numerical schemes for solving the Cahn-Hilliard-Stokes-Darcy (CHSD) model for two-phase flows in karstic geometry. In the first numerical scheme, we explore a fractional step method (operator splitting) to decouple the phase-field (Cahn-Hilliard equation) from the velocity field (Stokes-Darcy fluid equations). To further decouple the Stokes-Darcy system, we introduce a first order pressure stabilization term in the Darcy solver in the second numerical scheme so that the Stokes system is decoupled from the Darcy system and hence the CHSD system can be solved in a fully decoupled manner. We show that both decoupled numerical schemes are uniquely solvable, energy stable, and mass conservative. Ample numerical results are presented to demonstrate the accuracy and efficiency of our schemes.

NAFeb 11, 2019
Positivity-preserving, energy stable numerical schemes for the Cahn-Hilliard equation with logarithmic potential

Wenbin Chen, Cheng Wang, Xiaoming Wang et al.

We present and analyze finite difference numerical schemes for the Allen Cahn/Cahn-Hilliard equation with a logarithmic Flory Huggins energy potential. Both the first order and second order accurate temporal algorithms are considered. In the first order scheme, we treat the nonlinear logarithmic terms and the surface diffusion term implicitly, and update the linear expansive term and the mobility explicitly. We provide a theoretical justification that, this numerical algorithm has a unique solution such that the positivity is always preserved for the logarithmic arguments. In particular, our analysis reveals a subtle fact: the singular nature of the logarithmic term around the values of $-1$ and 1 prevents the numerical solution reaching these singular values, so that the numerical scheme is always well-defined as long as the numerical solution stays similarly bounded at the previous time step. Furthermore, an unconditional energy stability of the numerical scheme is derived, without any restriction for the time step size. The unique solvability and the positivity-preserving property for the second order scheme are proved using similar ideas, in which the singular nature of the logarithmic term plays an essential role. For both the first and second order accurate schemes, we are able to derive an optimal rate convergence analysis, which gives the full order error estimate. The case with a non-constant mobility is analyzed as well. We also describe a practical and efficient multigrid solver for the proposed numerical schemes, and present some numerical results, which demonstrate the robustness of the numerical schemes.

81.6NAMay 7
Long-time stability of implicit-explicit Runge-Kutta methods for two-dimensional incompressible flows

Hong-lin Liao, Xiaoming Wang, Xuping Wang et al.

High-order adaptive time-stepping algorithms are of significant practical value and theoretical interest for accelerating long-time fluid-flow simulations and resolving complex dynamical behaviors. While several high-order implicit-explicit schemes have been proposed in the literature, their long-time stability properties remain largely unexplored. We develop a family of long-time stable implicit-explicit Runge-Kutta (IERK) methods, up to fourth-order temporal accuracy, for the two-dimensional incompressible Navier-Stokes equations in vorticity-stream function formulation. By combining a convolution-type Hölder inequality with a damping-type multistage Grönwall inequality, we establish a unified analytical framework that proves long-time stability in both the $L^2$ and $H^1$ norms. A key component of the analysis is a mathematical-induction argument that ensures stage-wise boundedness of the vorticity in the $H^δ$ norm for some $δ>0$. To the best of our knowledge, this is the first work to establish large-time stability results for high-order IERK algorithms for the two-dimensional incompressible Navier-Stokes equations. Our IERK schemes employ stiffly accurate diagonally implicit Runge-Kutta approximations for the linear diffusive term together with explicit Runge-Kutta approximations for the nonlinear advection term. By exploiting the specific structure of the Navier-Stokes model, we derive a reduced set of order conditions-requiring only 5 and 11 conditions for the third- and fourth-order methods, respectively, in contrast to the classical 6 and 18-allowing the construction of a parameterized family of efficient schemes. These IERK methods are particularly well suited for adaptive time-stepping, as they permit significantly enlarged step sizes in actual computations.

CVJul 2, 2022
Golfer: Trajectory Prediction with Masked Goal Conditioning MnM Network

Xiaocheng Tang, Soheil Sadeghi Eshkevari, Haoyu Chen et al.

Transformers have enabled breakthroughs in NLP and computer vision, and have recently began to show promising performance in trajectory prediction for Autonomous Vehicle (AV). How to efficiently model the interactive relationships between the ego agent and other road and dynamic objects remains challenging for the standard attention module. In this work we propose a general Transformer-like architectural module MnM network equipped with novel masked goal conditioning training procedures for AV trajectory prediction. The resulted model, named golfer, achieves state-of-the-art performance, winning the 2nd place in the 2022 Waymo Open Dataset Motion Prediction Challenge and ranked 1st place according to minADE.

42.8CVMay 18
Unleashing Vision Transformer Potential In Image Quality Assessment via Global-Local Adaptive Interaction

Yu Li, Puchao Zhou, Yachun Mi et al.

In the field of Blind Image Quality Assessment (BIQA), accurately predicting the perceptual quality of authentically distorted images remains highly challenging due to the diverse and complex distortions present in natural environments. Although existing methods have achieved notable accuracy, their scalability is often constrained by the high cost of subjective annotation and the limited size of available datasets. Recent advances in large-scale pre-trained vision models have introduced powerful semantic and representational capabilities, yet their application to IQA tasks is hindered by substantial computational demands and suboptimal fine-tuning efficiency. To overcome these limitations, we introduce the Global-Local Interaction Adapter (GLIA), a novel framework that effectively harnesses pre-trained Vision Transformers through a dual-stream feature extraction mechanism coupled with interactive global-local fusion. By jointly retaining global semantic information and fine-grained local details, our approach delivers superior prediction accuracy and robustness while requiring significantly fewer trainable parameters. Extensive experiments on multiple benchmarks validate the effectiveness and superiority of our approach.

54.5HCMar 24
IntentWeave: A Progressive Entry Ladder for Multi-Surface Browser Agents in Cloud Portals

Wanying Mo, Jijia Lai, Xiaoming Wang

Browser agents built on LLMs can act in web interfaces, yet most remain confined to a single chat surface (e.g., a sidebar). This mismatch with real browsing can increase context-switching and reduce user control. We introduce \textbf{IntentWeave}, a design space of ten spatial paradigms for embedding agentic assistance across a browser, organized as a progressive entry ladder from micro-interventions to dedicated workspaces. We implement IntentWeave as a browser-extension prototype on the Alibaba Cloud website and compare three entry strategies in a within-subjects study (N=16). Workspace-heavy strategies reduced completion time but lowered perceived control; micro-only strategies preserved control but were often insufficient; a mixed sidecar approach achieved the highest satisfaction. We conclude with guidance for escalating and retreating agent surfaces without disrupting user agency.

CVJan 29Code
Dynamic Topology Awareness: Breaking the Granularity Rigidity in Vision-Language Navigation

Jiankun Peng, Jianyuan Guo, Ying Xu et al.

Vision-Language Navigation in Continuous Environments (VLN-CE) presents a core challenge: grounding high-level linguistic instructions into precise, safe, and long-horizon spatial actions. Explicit topological maps have proven to be a vital solution for providing robust spatial memory in such tasks. However, existing topological planning methods suffer from a "Granularity Rigidity" problem. Specifically, these methods typically rely on fixed geometric thresholds to sample nodes, which fails to adapt to varying environmental complexities. This rigidity leads to a critical mismatch: the model tends to over-sample in simple areas, causing computational redundancy, while under-sampling in high-uncertainty regions, increasing collision risks and compromising precision. To address this, we propose DGNav, a framework for Dynamic Topological Navigation, introducing a context-aware mechanism to modulate map density and connectivity on-the-fly. Our approach comprises two core innovations: (1) A Scene-Aware Adaptive Strategy that dynamically modulates graph construction thresholds based on the dispersion of predicted waypoints, enabling "densification on demand" in challenging environments; (2) A Dynamic Graph Transformer that reconstructs graph connectivity by fusing visual, linguistic, and geometric cues into dynamic edge weights, enabling the agent to filter out topological noise and enhancing instruction adherence. Extensive experiments on the R2R-CE and RxR-CE benchmarks demonstrate DGNav exhibits superior navigation performance and strong generalization capabilities. Furthermore, ablation studies confirm that our framework achieves an optimal trade-off between navigation efficiency and safe exploration. The code is available at https://github.com/shannanshouyin/DGNav.

IVFeb 14, 2024
Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

Jiancheng Yang, Rui Shi, Liang Jin et al. · harvard

Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website. As an independent contribution, we have also extended our previous internal baseline by incorporating recent advancements in large-scale pretrained networks and point-based rib segmentation techniques. The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis.

30.8LGApr 27
Fed-DLoRA: Efficient Wireless Federated Learning with Dynamic Low-Rank Adaptation

Huaicheng Li, Junhui Zhao, Haoyu Quan et al.

Federated learning (FL) offers a promising distributed learning paradigm for internet of vehicles (IoV) applications. However, it faces challenges from communication overhead and dynamic environments. Model compression techniques reduce computing and communication burden yet create trade-offs between compression ratios and vehicle participation strategies. In this paper, we propose a lightweight FL algorithm named federated learning with dynamic low-rank adaptation (Fed-DLoRA), which is combined with low-rank adaptation (LoRA) to effectively reduce parameters and communication costs while enhancing training efficiency. The convergence analysis of Fed-DLoRA is conducted through stochastic gradient descent optimization coupled with singular value decomposition. This analysis establishes the theoretical relationships among LoRA rank, vehicular scheduling strategies and the model's convergence characteristics. Building on these insights, we formulate a joint optimization problem aimed at maximizing system performance. To address this problem, we propose an adaptive rank, bandwidth and vehicle selection (ARBVS) algorithm that integrates enumeration with greedy optimization strategies. The algorithm provides efficient rank selection and resource scheduling strategies for each FL communication round, thereby achieving effective performance improvements for the FL system. Experimental results demonstrate that Fed-DLoRA achieves superior performance compared to conventional federated learning approaches, exhibiting enhanced accuracy, faster convergence, and improved communication efficiency.

CVFeb 5
PatchFlow: Leveraging a Flow-Based Model with Patch Features

Boxiang Zhang, Baijian Yang, Xiaoming Wang et al.

Die casting plays a crucial role across various industries due to its ability to craft intricate shapes with high precision and smooth surfaces. However, surface defects remain a major issue that impedes die casting quality control. Recently, computer vision techniques have been explored to automate and improve defect detection. In this work, we combine local neighbor-aware patch features with a normalizing flow model and bridge the gap between the generic pretrained feature extractor and industrial product images by introducing an adapter module to increase the efficiency and accuracy of automated anomaly detection. Compared to state-of-the-art methods, our approach reduces the error rate by 20\% on the MVTec AD dataset, achieving an image-level AUROC of 99.28\%. Our approach has also enhanced performance on the VisA dataset , achieving an image-level AUROC of 96.48\%. Compared to the state-of-the-art models, this represents a 28.2\% reduction in error. Additionally, experiments on a proprietary die casting dataset yield an accuracy of 95.77\% for anomaly detection, without requiring any anomalous samples for training. Our method illustrates the potential of leveraging computer vision and deep learning techniques to advance inspection capabilities for the die casting industry

DATA-ANJul 16, 2025
Neural Network-Guided Symbolic Regression for Interpretable Descriptor Discovery in Perovskite Catalysts

Yeming Xian, Xiaoming Wang, Yanfa Yan

Understanding and predicting the activity of oxide perovskite catalysts for the oxygen evolution reaction (OER) requires descriptors that are both accurate and physically interpretable. While symbolic regression (SR) offers a path to discover such formulas, its performance degrades with high-dimensional inputs and small datasets. We present a two-phase framework that combines neural networks (NN), feature importance analysis, and symbolic regression (SR) to discover interpretable descriptors for OER activity in oxide perovskites. In Phase I, using a small dataset and seven structural features, we reproduce and improve the known μ/t descriptor by engineering composite features and applying symbolic regression, achieving training and validation MAEs of 22.8 and 20.8 meV, respectively. In Phase II, we expand to 164 features, reduce dimensionality, and identify LUMO energy as a key electronic descriptor. A final formula using μ/t, μ/RA, and LUMO energy achieves improved accuracy (training and validation MAEs of 22.1 and 20.6 meV) with strong physical interpretability. Our results demonstrate that NN-guided symbolic regression enables accurate, interpretable, and physically meaningful descriptor discovery in data-scarce regimes, indicating interpretability need not sacrifice accuracy for materials informatics.

CVDec 15, 2023
Style Generation in Robot Calligraphy with Deep Generative Adversarial Networks

Xiaoming Wang, Zhiguo Gong

Robot calligraphy is an emerging exploration of artificial intelligence in the fields of art and education. Traditional calligraphy generation researches mainly focus on methods such as tool-based image processing, generative models, and style transfer. Unlike the English alphabet, the number of Chinese characters is tens of thousands, which leads to difficulties in the generation of a style consistent Chinese calligraphic font with over 6000 characters. Due to the lack of high-quality data sets, formal definitions of calligraphy knowledge, and scientific art evaluation methods, The results generated are frequently of low quality and falls short of professional-level requirements. To address the above problem, this paper proposes an automatic calligraphy generation model based on deep generative adversarial networks (deepGAN) that can generate style calligraphy fonts with professional standards. The key highlights of the proposed method include: (1) The datasets use a high-precision calligraphy synthesis method to ensure its high quality and sufficient quantity; (2) Professional calligraphers are invited to conduct a series of Turing tests to evaluate the gap between model generation results and human artistic level; (3) Experimental results indicate that the proposed model is the state-of-the-art among current calligraphy generation methods. The Turing tests and similarity evaluations validate the effectiveness of the proposed method.

CVMay 26, 2023
Study of Subjective and Objective Quality Assessment of Mobile Cloud Gaming Videos

Avinab Saha, Yu-Chih Chen, Chase Davis et al.

We present the outcomes of a recent large-scale subjective study of Mobile Cloud Gaming Video Quality Assessment (MCG-VQA) on a diverse set of gaming videos. Rapid advancements in cloud services, faster video encoding technologies, and increased access to high-speed, low-latency wireless internet have all contributed to the exponential growth of the Mobile Cloud Gaming industry. Consequently, the development of methods to assess the quality of real-time video feeds to end-users of cloud gaming platforms has become increasingly important. However, due to the lack of a large-scale public Mobile Cloud Gaming Video dataset containing a diverse set of distorted videos with corresponding subjective scores, there has been limited work on the development of MCG-VQA models. Towards accelerating progress towards these goals, we created a new dataset, named the LIVE-Meta Mobile Cloud Gaming (LIVE-Meta-MCG) video quality database, composed of 600 landscape and portrait gaming videos, on which we collected 14,400 subjective quality ratings from an in-lab subjective study. Additionally, to demonstrate the usefulness of the new resource, we benchmarked multiple state-of-the-art VQA algorithms on the database. The new database will be made publicly available on our website: \url{https://live.ece.utexas.edu/research/LIVE-Meta-Mobile-Cloud-Gaming/index.html}

IVMay 3, 2023
GAMIVAL: Video Quality Prediction on Mobile Cloud Gaming Content

Yu-Chih Chen, Avinab Saha, Chase Davis et al.

The mobile cloud gaming industry has been rapidly growing over the last decade. When streaming gaming videos are transmitted to customers' client devices from cloud servers, algorithms that can monitor distorted video quality without having any reference video available are desirable tools. However, creating No-Reference Video Quality Assessment (NR VQA) models that can accurately predict the quality of streaming gaming videos rendered by computer graphics engines is a challenging problem, since gaming content generally differs statistically from naturalistic videos, often lacks detail, and contains many smooth regions. Until recently, the problem has been further complicated by the lack of adequate subjective quality databases of mobile gaming content. We have created a new gaming-specific NR VQA model called the Gaming Video Quality Evaluator (GAMIVAL), which combines and leverages the advantages of spatial and temporal gaming distorted scene statistics models, a neural noise model, and deep semantic features. Using a support vector regression (SVR) as a regressor, GAMIVAL achieves superior performance on the new LIVE-Meta Mobile Cloud Gaming (LIVE-Meta MCG) video quality database.

CRMay 27, 2020
Security Improvements of Several Basic Quantum Private Query Protocols with O(log N) Communication Complexity

Fang Yu, Daowen Qiu, Xiaoming Wang et al.

New quantum private database (with N elements) query protocols are presented and analyzed. Protocols preserve O(logN) communication complexity of known protocols for the same task, but achieve several significant improvements in security, especially concerning user privacy. For example, the randomized form of our protocol has a cheat-sensitive property - it allows the user to detect a dishonest database with a nonzero probability, while the phase-encoded private query protocols for the same task do not have such a property. Moreover, when the database performs the computational basis measurement, a particular projective measurement which can cause a significant loss of user privacy in the previous private query protocols with O(logN) communication complexity, at most half of the user privacy could leak to such a database in our protocol, while in the QPQ protocol, the entire user privacy could leak out. In addition, it is proved here that for large N, the user could detect a cheating via the computational basis measurement, with a probability close to 1/2 using O(\sqrt{N}) special queries. Finally, it is shown here, for both forms of our protocol, basic and randomized, how a dishonest database has to act in case it could not learn user's queries.

NIMay 2, 2020
Smart, Adaptive Energy Optimization for Mobile Web Interactions

Jie Ren, Lu Yuan, Petteri Nurmi et al.

Web technology underpins many interactive mobile applications. However, energy-efficient mobile web interactions is an outstanding challenge. Given the increasing diversity and complexity of mobile hardware, any practical optimization scheme must work for a wide range of users, mobile platforms and web workloads. This paper presents CAMEL , a novel energy optimization system for mobile web interactions. CAMEL leverages machine learning techniques to develop a smart, adaptive scheme to judiciously trade performance for reduced power consumption. Unlike prior work, C AMEL directly models how a given web content affects the user expectation and uses this to guide energy optimization. It goes further by employing transfer learning and conformal predictions to tune a previously learned model in the end-user environment and improve it over time. We apply CAMEL to Chromium and evaluate it on four distinct mobile systems involving 1,000 testing webpages and 30 users. Compared to four state-of-the-art web-event optimizers, CAMEL delivers 22% more energy savings, but with 49% fewer violations on the quality of user experience, and exhibits orders of magnitudes less overhead when targeting a new computing environment.

CVFeb 21, 2015
Study on Sparse Representation based Classification for Biometric Verification

Zengxi Huang, Yiguang Liu, Xiaoming Wang et al.

In this paper, we propose a multimodal verification system integrating face and ear based on sparse representation based classification (SRC). The face and ear query samples are first encoded separately to derive sparsity-based match scores, and which are then combined with sum-rule fusion for verification. Apart from validating the encouraging performance of SRC-based multimodal verification, this paper also dedicates to provide a clear understanding about the characteristics of SRC-based biometric verification. To this end, two sparsity-based metrics, i.e. spare coding error (SCE) and sparse contribution rate (SCR), are involved, together with face and ear unimodal SRC-based verification. As for the issue that SRC-based biometric verification may suffer from heavy computational burden and verification accuracy degradation with increase of enrolled subjects, we argue that it could be properly resolved by exploiting small random dictionary for sparsity-based score computation, which consists of training samples from a limited number of randomly selected subjects. Experimental results demonstrate the superiority of SRC-based multimodal verification compared to the state-of-the-art multimodal methods like likelihood ratio (LLR), support vector machine (SVM), and the sum-rule fusion methods using cosine similarity, meanwhile the idea of using small random dictionary is feasible in both effectiveness and efficiency.