CVAug 28, 2023
Total Selfie: Generating Full-Body SelfiesBowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman et al. · uw
We present a method to generate full-body selfies from photographs originally taken at arms length. Because self-captured photos are typically taken close up, they have limited field of view and exaggerated perspective that distorts facial shapes. We instead seek to generate the photo some one else would take of you from a few feet away. Our approach takes as input four selfies of your face and body, a background image, and generates a full-body selfie in a desired target pose. We introduce a novel diffusion-based approach to combine all of this information into high-quality, well-composed photos of you with the desired pose and background.
CVAug 25, 2022
Learning Continuous Implicit Representation for Near-Periodic PatternsBowei Chen, Tiancheng Zhi, Martial Hebert et al.
Near-Periodic Patterns (NPP) are ubiquitous in man-made scenes and are composed of tiled motifs with appearance differences caused by lighting, defects, or design elements. A good NPP representation is useful for many applications including image completion, segmentation, and geometric remapping. But representing NPP is challenging because it needs to maintain global consistency (tiled motifs layout) while preserving local variations (appearance differences). Methods trained on general scenes using a large dataset or single-image optimization struggle to satisfy these constraints, while methods that explicitly model periodicity are not robust to periodicity detection errors. To address these challenges, we learn a neural implicit representation using a coordinate-based MLP with single image optimization. We design an input feature warping module and a periodicity-guided patch loss to handle both global consistency and local variations. To further improve the robustness, we introduce a periodicity proposal module to search and use multiple candidate periodicities in our pipeline. We demonstrate the effectiveness of our method on more than 500 images of building facades, friezes, wallpapers, ground, and Mondrian patterns on single and multi-planar scenes.
CVSep 30, 2024
Inverse Painting: Reconstructing The Painting ProcessBowei Chen, Yifan Wang, Brian Curless et al. · uw
Given an input painting, we reconstruct a time-lapse video of how it may have been painted. We formulate this as an autoregressive image generation problem, in which an initially blank "canvas" is iteratively updated. The model learns from real artists by training on many painting videos. Our approach incorporates text and region understanding to define a set of painting "instructions" and updates the canvas with a novel diffusion-based renderer. The method extrapolates beyond the limited, acrylic style paintings on which it has been trained, showing plausible results for a wide range of artistic styles and genres.
LGFeb 14, 2024Code
EcoVal: An Efficient Data Valuation Framework for Machine LearningAyush K Tarun, Vikram S Chundawat, Murari Mandal et al.
Quantifying the value of data within a machine learning workflow can play a pivotal role in making more strategic decisions in machine learning initiatives. The existing Shapley value based frameworks for data valuation in machine learning are computationally expensive as they require considerable amount of repeated training of the model to obtain the Shapley value. In this paper, we introduce an efficient data valuation framework EcoVal, to estimate the value of data for machine learning models in a fast and practical manner. Instead of directly working with individual data sample, we determine the value of a cluster of similar data points. This value is further propagated amongst all the member cluster points. We show that the overall value of the data can be determined by estimating the intrinsic and extrinsic value of each data. This is enabled by formulating the performance of a model as a \textit{production function}, a concept which is popularly used to estimate the amount of output based on factors like labor and capital in a traditional free economic market. We provide a formal proof of our valuation technique and elucidate the principles and mechanisms that enable its accelerated performance. We demonstrate the real-world applicability of our method by showcasing its effectiveness for both in-distribution and out-of-sample data. This work addresses one of the core challenges of efficient data valuation at scale in machine learning models. The code is available at \underline{https://github.com/respai-lab/ecoval}.
CVJul 29, 2024
Learning Feature-Preserving Portrait Editing from Generated PairsBowei Chen, Tiancheng Zhi, Peihao Zhu et al.
Portrait editing is challenging for existing techniques due to difficulties in preserving subject features like identity. In this paper, we propose a training-based method leveraging auto-generated paired data to learn desired editing while ensuring the preservation of unchanged subject features. Specifically, we design a data generation process to create reasonably good training pairs for desired editing at low cost. Based on these pairs, we introduce a Multi-Conditioned Diffusion Model to effectively learn the editing direction and preserve subject features. During inference, our model produces accurate editing mask that can guide the inference process to further preserve detailed subject features. Experiments on costume editing and cartoon expression editing show that our method achieves state-of-the-art quality, quantitatively and qualitatively.
DBNov 10, 2025
Cortex AISQL: A Production SQL Engine for Unstructured DataPaweł Liskowski, Benjamin Han, Paritosh Aggarwal et al.
Snowflake's Cortex AISQL is a production SQL engine that integrates native semantic operations directly into SQL. This integration allows users to write declarative queries that combine relational operations with semantic reasoning, enabling them to query both structured and unstructured data effortlessly. However, making semantic operations efficient at production scale poses fundamental challenges. Semantic operations are more expensive than traditional SQL operations, possess distinct latency and throughput characteristics, and their cost and selectivity are unknown during query compilation. Furthermore, existing query engines are not designed to optimize semantic operations. The AISQL query execution engine addresses these challenges through three novel techniques informed by production deployment data from Snowflake customers. First, AI-aware query optimization treats AI inference cost as a first-class optimization objective, reasoning about large language model (LLM) cost directly during query planning to achieve 2-8$\times$ speedups. Second, adaptive model cascades reduce inference costs by routing most rows through a fast proxy model while escalating uncertain cases to a powerful oracle model, achieving 2-6$\times$ speedups while maintaining 90-95% of oracle model quality. Third, semantic join query rewriting lowers the quadratic time complexity of join operations to linear through reformulation as multi-label classification tasks, achieving 15-70$\times$ speedups with often improved prediction quality. AISQL is deployed in production at Snowflake, where it powers diverse customer workloads across analytics, search, and content understanding.
CVSep 29, 2025
Aligning Visual Foundation Encoders to Tokenizers for Diffusion ModelsBowei Chen, Sai Bi, Hao Tan et al.
In this work, we propose aligning pretrained visual encoders to serve as tokenizers for latent diffusion models in image generation. Unlike training a variational autoencoder (VAE) from scratch, which primarily emphasizes low-level details, our approach leverages the rich semantic structure of foundation encoders. We introduce a three-stage alignment strategy: (1) freeze the encoder and train an adapter and a decoder to establish a semantic latent space; (2) jointly optimize all components with an additional semantic preservation loss, enabling the encoder to capture perceptual details while retaining high-level semantics; and (3) refine the decoder for improved reconstruction quality. This alignment yields semantically rich image tokenizers that benefit diffusion models. On ImageNet 256$\times$256, our tokenizer accelerates the convergence of diffusion models, reaching a gFID of 1.90 within just 64 epochs, and improves generation both with and without classifier-free guidance. Scaling to LAION, a 2B-parameter text-to-image model trained with our tokenizer consistently outperforms FLUX VAE under the same training steps. Overall, our method is simple, scalable, and establishes a semantically grounded paradigm for continuous tokenizer design.
GNDec 19, 2023
SRNI-CAR: A comprehensive dataset for analyzing the Chinese automotive marketRuixin Ding, Bowei Chen, James M. Wilson et al.
The automotive industry plays a critical role in the global economy, and particularly important is the expanding Chinese automobile market due to its immense scale and influence. However, existing automotive sector datasets are limited in their coverage, failing to adequately consider the growing demand for more and diverse variables. This paper aims to bridge this data gap by introducing a comprehensive dataset spanning the years from 2016 to 2022, encompassing sales data, online reviews, and a wealth of information related to the Chinese automotive industry. This dataset serves as a valuable resource, significantly expanding the available data. Its impact extends to various dimensions, including improving forecasting accuracy, expanding the scope of business applications, informing policy development and regulation, and advancing academic research within the automotive sector. To illustrate the dataset's potential applications in both business and academic contexts, we present two application examples. Our developed dataset enhances our understanding of the Chinese automotive market and offers a valuable tool for researchers, policymakers, and industry stakeholders worldwide.
CVMay 29, 2025
Generating Fit Check Videos with a Handheld CameraBowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman et al. · uw
Self-captured full-body videos are popular, but most deployments require mounted cameras, carefully-framed shots, and repeated practice. We propose a more convenient solution that enables full-body video capture using handheld mobile devices. Our approach takes as input two static photos (front and back) of you in a mirror, along with an IMU motion reference that you perform while holding your mobile phone, and synthesizes a realistic video of you performing a similar target motion. We enable rendering into a new scene, with consistent illumination and shadows. We propose a novel video diffusion-based model to achieve this. Specifically, we propose a parameter-free frame generation strategy, as well as a multi-reference attention mechanism, that effectively integrate appearance information from both the front and back selfies into the video diffusion model. Additionally, we introduce an image-based fine-tuning strategy to enhance frame sharpness and improve the generation of shadows and reflections, achieving a more realistic human-scene composition.
CVAug 10, 2021
DVM-CAR: A large-scale automotive dataset for visual marketing research and applicationsJingmin Huang, Bowei Chen, Lan Luo et al.
There is a growing interest in product aesthetics analytics and design. However, the lack of available large-scale data that covers various variables and information is one of the biggest challenges faced by analysts and researchers. In this paper, we present our multidisciplinary initiative of developing a comprehensive automotive dataset from different online sources and formats. Specifically, the created dataset contains 1.4 million images from 899 car models and their corresponding model specifications and sales information over more than ten years in the UK market. Our work makes significant contributions to: (i) research and applications in the automotive industry; (ii) big data creation and sharing; (iii) database design; and (iv) data fusion. Apart from our motivation, technical details and data structure, we further present three simple examples to demonstrate how our data can be used in business research and applications.
LGApr 29, 2021
Learning Robust Variational Information Bottleneck with ReferenceWeizhu Qian, Bowei Chen, Xiaowei Huang
We propose a new approach to train a variational information bottleneck (VIB) that improves its robustness to adversarial perturbations. Unlike the traditional methods where the hard labels are usually used for the classification task, we refine the categorical class information in the training phase with soft labels which are obtained from a pre-trained reference neural network and can reflect the likelihood of the original class labels. We also relax the Gaussian posterior assumption in the VIB implementation by using the mutual information neural estimation. Extensive experiments have been performed with the MNIST and CIFAR-10 datasets, and the results show that our proposed approach significantly outperforms the benchmarked models.
LGJul 1, 2020
Multi-Task Variational Information BottleneckWeizhu Qian, Bowei Chen, Yichao Zhang et al.
Multi-task learning (MTL) is an important subject in machine learning and artificial intelligence. Its applications to computer vision, signal processing, and speech recognition are ubiquitous. Although this subject has attracted considerable attention recently, the performance and robustness of the existing models to different tasks have not been well balanced. This article proposes an MTL model based on the architecture of the variational information bottleneck (VIB), which can provide a more effective latent representation of the input features for the downstream tasks. Extensive observations on three public data sets under adversarial attacks show that the proposed model is competitive to the state-of-the-art algorithms concerning the prediction accuracy. Experimental results suggest that combining the VIB and the task-dependent uncertainties is a very effective way to abstract valid information from the input features for accomplishing multiple tasks.
CYMay 26, 2019
A hybrid model for predicting human physical activity status from lifelogging dataJi Ni, Bowei Chen, Nigel M. Allinson et al.
One trend in the recent healthcare transformations is people are encouraged to monitor and manage their health based on their daily diets and physical activity habits. However, much attention of the use of operational research and analytical models in healthcare has been paid to the systematic level such as country or regional policy making or organisational issues. This paper proposes a model concerned with healthcare analytics at the individual level, which can predict human physical activity status from sequential lifelogging data collected from wearable sensors. The model has a two-stage hybrid structure (in short, MOGP-HMM) -- a multi-objective genetic programming (MOGP) algorithm in the first stage to reduce the dimensions of lifelogging data and a hidden Markov model (HMM) in the second stage for activity status prediction over time. It can be used as a decision support tool to provide real-time monitoring, statistical analysis and personalized advice to individuals, encouraging positive attitudes towards healthy lifestyles. We validate the model with the real data collected from a group of participants in the UK, and compare it with other popular two-stage hybrid models. Our experimental results show that the MOGP-HMM can achieve comparable performance. To the best of our knowledge, this is the very first study that uses the MOGP in the hybrid two-stage structure for individuals' activity status prediction. It fits seamlessly with the current trend in the UK healthcare transformation of patient empowerment as well as contributing to a strategic development for more efficient and cost-effective provision of healthcare.
CPApr 29, 2019
Incorporating prior financial domain knowledge into neural networks for implied volatility surface predictionYu Zheng, Yongxin Yang, Bowei Chen
In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface.
MMAug 1, 2017
MM2RTB: Bringing Multimedia Metrics to Real-Time BiddingXiang Chen, Bowei Chen, Mohan Kankanhalli
In display advertising, users' online ad experiences are important for the advertising effectiveness. However, users have not been well accommodated in real-time bidding (RTB). This further influences their site visits and perception of the displayed banner ads. In this paper, we propose a novel computational framework which brings multimedia metrics, like the contextual relevance, the visual saliency and the ad memorability into RTB to improve the users' ad experiences as well as maintain the benefits of the publisher and the advertiser. We aim at developing a vigorous ecosystem by optimizing the trade-offs among all stakeholders. The framework considers the scenario of a webpage with multiple ad slots. Our experimental results show that the benefits of the advertiser and the user can be significantly improved if the publisher would slightly sacrifice his short-term revenue. The improved benefits will increase the advertising requests (demand) and the site visits (supply), which can further boost the publisher's revenue in the long run.
GTMay 30, 2017
Optimizing Trade-offs Among Stakeholders in Real-Time Bidding by Incorporating Multimedia MetricsXiang Chen, Bowei Chen, Mohan Kankanhalli
Displaying banner advertisements (in short, ads) on webpages has usually been discussed as an Internet economics topic where a publisher uses auction models to sell an online user's page view to advertisers and the one with the highest bid can have her ad displayed to the user. This is also called \emph{real-time bidding} (RTB) and the ad displaying process ensures that the publisher's benefit is maximized or there is an equilibrium in ad auctions. However, the benefits of the other two stakeholders -- the advertiser and the user -- have been rarely discussed. In this paper, we propose a two-stage computational framework that selects a banner ad based on the optimized trade-offs among all stakeholders. The first stage is still auction based and the second stage re-ranks ads by considering the benefits of all stakeholders. Our metric variables are: the publisher's revenue, the advertiser's utility, the ad memorability, the ad click-through rate (CTR), the contextual relevance, and the visual saliency. To the best of our knowledge, this is the first work that optimizes trade-offs among all stakeholders in RTB by incorporating multimedia metrics. An algorithm is also proposed to determine the optimal weights of the metric variables. We use both ad auction datasets and multimedia datasets to validate the proposed framework. Our experimental results show that the publisher can significantly improve the other stakeholders' benefits by slightly reducing her revenue in the short-term. In the long run, advertisers and users will be more engaged, the increased demand of advertising and the increased supply of page views can then boost the publisher's revenue.
GTJul 18, 2013
Multi-keyword multi-click advertisement option contracts for sponsored searchBowei Chen, Jun Wang, Ingemar J. Cox et al.
In sponsored search, advertisement (abbreviated ad) slots are usually sold by a search engine to an advertiser through an auction mechanism in which advertisers bid on keywords. In theory, auction mechanisms have many desirable economic properties. However, keyword auctions have a number of limitations including: the uncertainty in payment prices for advertisers; the volatility in the search engine's revenue; and the weak loyalty between advertiser and search engine. In this paper we propose a special ad option that alleviates these problems. In our proposal, an advertiser can purchase an option from a search engine in advance by paying an upfront fee, known as the option price. He then has the right, but no obligation, to purchase among the pre-specified set of keywords at the fixed cost-per-clicks (CPCs) for a specified number of clicks in a specified period of time. The proposed option is closely related to a special exotic option in finance that contains multiple underlying assets (multi-keyword) and is also multi-exercisable (multi-click). This novel structure has many benefits: advertisers can have reduced uncertainty in advertising; the search engine can improve the advertisers' loyalty as well as obtain a stable and increased expected revenue over time. Since the proposed ad option can be implemented in conjunction with the existing keyword auctions, the option price and corresponding fixed CPCs must be set such that there is no arbitrage between the two markets. Option pricing methods are discussed and our experimental results validate the development. Compared to keyword auctions, a search engine can have an increased expected revenue by selling an ad option.