CVJul 24, 2022Code
PatchRD: Detail-Preserving Shape Completion by Learning Patch Retrieval and DeformationBo Sun, Vladimir G. Kim, Noam Aigerman et al.
This paper introduces a data-driven shape completion approach that focuses on completing geometric details of missing regions of 3D shapes. We observe that existing generative methods lack the training data and representation capacity to synthesize plausible, fine-grained details with complex geometry and topology. Our key insight is to copy and deform patches from the partial input to complete missing regions. This enables us to preserve the style of local geometric features, even if it drastically differs from the training data. Our fully automatic approach proceeds in two stages. First, we learn to retrieve candidate patches from the input shape. Second, we select and deform some of the retrieved candidates to seamlessly blend them into the complete shape. This method combines the advantages of the two most common completion methods: similarity-based single-instance completion, and completion by learning a shape space. We leverage repeating patterns by retrieving patches from the partial input, and learn global structural priors by using a neural network to guide the retrieval and deformation steps. Experimental results show our approach considerably outperforms baselines across multiple datasets and shape categories. Code and data are available at https://github.com/GitBoSun/PatchRD.
CVSep 20, 2024
Imagine yourself: Tuning-Free Personalized Image GenerationZecheng He, Bo Sun, Felix Juefei-Xu et al. · meta-ai
Diffusion models have demonstrated remarkable efficacy across various image-to-image tasks. In this research, we introduce Imagine yourself, a state-of-the-art model designed for personalized image generation. Unlike conventional tuning-based personalization techniques, Imagine yourself operates as a tuning-free model, enabling all users to leverage a shared framework without individualized adjustments. Moreover, previous work met challenges balancing identity preservation, following complex prompts and preserving good visual quality, resulting in models having strong copy-paste effect of the reference images. Thus, they can hardly generate images following prompts that require significant changes to the reference image, \eg, changing facial expression, head and body poses, and the diversity of the generated images is low. To address these limitations, our proposed method introduces 1) a new synthetic paired data generation mechanism to encourage image diversity, 2) a fully parallel attention architecture with three text encoders and a fully trainable vision encoder to improve the text faithfulness, and 3) a novel coarse-to-fine multi-stage finetuning methodology that gradually pushes the boundary of visual quality. Our study demonstrates that Imagine yourself surpasses the state-of-the-art personalization model, exhibiting superior capabilities in identity preservation, visual quality, and text alignment. This model establishes a robust foundation for various personalization applications. Human evaluation results validate the model's SOTA superiority across all aspects (identity preservation, text faithfulness, and visual appeal) compared to the previous personalization models.
CLJun 4
CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent ExperimentsJiaju Chen, Bo Sun, Yuxuan Lu et al.
Multi-agent systems (MAS) built on large language models have shown growing promise, with their effectiveness resting on agents' ability to coordinate through text-based channels much as human teams do. Yet recent study suggests that MAS often falter not because agents lack individual task-solving ability, but because they lack collaborative competence: the capacity to establish common ground, maintain shared task understanding, balance individual and collective incentives, and repair misalignment as interaction unfolds. Decades of research in Computer-Supported Cooperative Work have characterized these requirements for human teams coordinating under constrained communication, yet existing MAS evaluations focus mainly on task outcomes or single-agent proficiency in reasoning, planning, and tool use. To enable a systematic analysis of agents' collaborative competence in MAS, we introduce CollabSim, a configurable simulation framework that combines a theory-grounded definition of collaborative capabilities, controlled manipulation of interaction conditions, and action-level probing of agents' internal states. Experiments across four LLMs show that CollabSim can capture condition effects, separate model performance patterns, and reveal task-dependent effects of agent design.
GTMay 12
Risk-Sensitive Online Selection with Bounded AdaptivityHossein Nekouyan, Bo Sun, Raouf Boutaba et al.
Designing randomized online algorithms that perform reliably not only in expectation but also under unfavorable realizations of randomness is a fundamental challenge in online decision-making. In this paper, we study this challenge in online adversarial selection, where a decision maker allocates $k$ units of a resource to sequentially arriving buyers through posted prices. We focus on two intertwined considerations that are often overlooked simultaneously: tail-risk sensitivity and bounded adaptivity, where tail risk is measured using conditional value-at-risk (CVaR) and bounded adaptivity limits the number of allowable policy updates over time. Our main contribution is a correlated posted-price mechanism that uses a single random seed to coordinate pricing decisions across time. This correlation induces a monotonic ordering of pricing profiles across sample paths, improving lower-tail performance while respecting the adaptivity constraint. More broadly, our results highlight correlation as a mechanism for controlling tail risk in randomized online algorithms. Using this framework, we derive competitive guarantees for several regimes of the problem under both static and dynamic pricing. Our analysis develops a risk-sensitive randomized online primal-dual framework tailored to CVaR objectives and reveals a systematic trade-off between allowable adaptivity, risk sensitivity, and competitive performance. Experiments on real airline pricing data further illustrate the empirical impact of correlated pricing on welfare concentration and tail behavior.
LGNov 12, 2022
Online Search with Predictions: Pareto-optimal Algorithm and its Applications in Energy MarketsRussell Lee, Bo Sun, Mohammad Hajiesmaili et al.
This paper develops learning-augmented algorithms for energy trading in volatile electricity markets. The basic problem is to sell (or buy) $k$ units of energy for the highest revenue (lowest cost) over uncertain time-varying prices, which can framed as a classic online search problem in the literature of competitive analysis. State-of-the-art algorithms assume no knowledge about future market prices when they make trading decisions in each time slot, and aim for guaranteeing the performance for the worst-case price sequence. In practice, however, predictions about future prices become commonly available by leveraging machine learning. This paper aims to incorporate machine-learned predictions to design competitive algorithms for online search problems. An important property of our algorithms is that they achieve performances competitive with the offline algorithm in hindsight when the predictions are accurate (i.e., consistency) and also provide worst-case guarantees when the predictions are arbitrarily wrong (i.e., robustness). The proposed algorithms achieve the Pareto-optimal trade-off between consistency and robustness, where no other algorithms for online search can improve on the consistency for a given robustness. Further, we extend the basic online search problem to a more general inventory management setting that can capture storage-assisted energy trading in electricity markets. In empirical evaluations using traces from real-world applications, our learning-augmented algorithms improve the average empirical performance compared to benchmark algorithms, while also providing improved worst-case performance.
CVApr 20, 2023
GenCorres: Consistent Shape Matching via Coupled Implicit-Explicit Shape Generative ModelsHaitao Yang, Xiangru Huang, Bo Sun et al.
This paper introduces GenCorres, a novel unsupervised joint shape matching (JSM) approach. Our key idea is to learn a mesh generator to fit an unorganized deformable shape collection while constraining deformations between adjacent synthetic shapes to preserve geometric structures such as local rigidity and local conformality. GenCorres presents three appealing advantages over existing JSM techniques. First, GenCorres performs JSM among a synthetic shape collection whose size is much bigger than the input shapes and fully leverages the datadriven power of JSM. Second, GenCorres unifies consistent shape matching and pairwise matching (i.e., by enforcing deformation priors between adjacent synthetic shapes). Third, the generator provides a concise encoding of consistent shape correspondences. However, learning a mesh generator from an unorganized shape collection is challenging, requiring a good initialization. GenCorres addresses this issue by learning an implicit generator from the input shapes, which provides intermediate shapes between two arbitrary shapes. We introduce a novel approach for computing correspondences between adjacent implicit surfaces, which we use to regularize the implicit generator. Synthetic shapes of the implicit generator then guide initial fittings (i.e., via template-based deformation) for learning the mesh generator. Experimental results show that GenCorres considerably outperforms state-of-the-art JSM techniques. The synthetic shapes of GenCorres also achieve salient performance gains against state-of-the-art deformable shape generators.
CVSep 8, 2023
Weakly Supervised Point Clouds Transformer for 3D Object DetectionZuojin Tang, Bo Sun, Tongwei Ma et al. · pku
The annotation of 3D datasets is required for semantic-segmentation and object detection in scene understanding. In this paper we present a framework for the weakly supervision of a point clouds transformer that is used for 3D object detection. The aim is to decrease the required amount of supervision needed for training, as a result of the high cost of annotating a 3D datasets. We propose an Unsupervised Voting Proposal Module, which learns randomly preset anchor points and uses voting network to select prepared anchor points of high quality. Then it distills information into student and teacher network. In terms of student network, we apply ResNet network to efficiently extract local characteristics. However, it also can lose much global information. To provide the input which incorporates the global and local information as the input of student networks, we adopt the self-attention mechanism of transformer to extract global features, and the ResNet layers to extract region proposals. The teacher network supervises the classification and regression of the student network using the pre-trained model on ImageNet. On the challenging KITTI datasets, the experimental results have achieved the highest level of average precision compared with the most recent weakly supervised 3D object detectors.
CVApr 4, 2023
LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene GenerationHaitao Yang, Zaiwei Zhang, Xiangru Huang et al.
Bird's-Eye View (BEV) features are popular intermediate scene representations shared by the 3D backbone and the detector head in LiDAR-based object detectors. However, little research has been done to investigate how to incorporate additional supervision on the BEV features to improve proposal generation in the detector head, while still balancing the number of powerful 3D layers and efficient 2D network operations. This paper proposes a novel scene representation that encodes both the semantics and geometry of the 3D environment in 2D, which serves as a dense supervision signal for better BEV feature learning. The key idea is to use auxiliary networks to predict a combination of explicit and implicit semantic probabilities by exploiting their complementary properties. Extensive experiments show that our simple yet effective design can be easily integrated into most state-of-the-art 3D object detectors and consistently improves upon baseline models.
CLApr 15, 2022
Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error RecognitionBo Sun, Baoxin Wang, Wanxiang Che et al.
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors. These errors have been studied extensively and are relatively simple for humans. On the contrary, Chinese semantic errors are understudied and more complex that humans cannot easily recognize. The task of this paper is Chinese Semantic Error Recognition (CSER), a binary classification task to determine whether a sentence contains semantic errors. The current research has no effective method to solve this task. In this paper, we inherit the model structure of BERT and design several syntax-related pre-training tasks so that the model can learn syntactic knowledge. Our pre-training tasks consider both the directionality of the dependency structure and the diversity of the dependency relationship. Due to the lack of a published dataset for CSER, we build a high-quality dataset for CSER for the first time named Corpus of Chinese Linguistic Semantic Acceptability (CoCLSA). The experimental results on the CoCLSA show that our methods outperform universal pre-trained models and syntax-infused models.
CVSep 29, 2024Code
PPLNs: Parametric Piecewise Linear Networks for Event-Based Temporal Modeling and BeyondChen Song, Zhenxiao Liang, Bo Sun et al.
We present Parametric Piecewise Linear Networks (PPLNs) for temporal vision inference. Motivated by the neuromorphic principles that regulate biological neural behaviors, PPLNs are ideal for processing data captured by event cameras, which are built to simulate neural activities in the human retina. We discuss how to represent the membrane potential of an artificial neuron by a parametric piecewise linear function with learnable coefficients. This design echoes the idea of building deep models from learnable parametric functions recently popularized by Kolmogorov-Arnold Networks (KANs). Experiments demonstrate the state-of-the-art performance of PPLNs in event-based and image-based vision applications, including steering prediction, human pose estimation, and motion deblurring. The source code of our implementation is available at https://github.com/chensong1995/PPLN.
CVNov 17, 2023
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human ExpressionAnimesh Sinha, Bo Sun, Anmol Kalia et al.
We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images significantly differ from photorealistic samples typically generated by large-scale LDMs. We start with a competent text-to-image model, like Emu, and show that relying on prompt engineering with a photorealistic model to generate stickers leads to poor prompt alignment and scene diversity. To overcome these drawbacks, we first finetune Emu on millions of sticker-like images collected using weak supervision to elicit diversity. Next, we curate human-in-the-loop (HITL) Alignment and Style datasets from model generations, and finetune to improve prompt alignment and style alignment respectively. Sequential finetuning on these datasets poses a tradeoff between better style alignment and prompt alignment gains. To address this tradeoff, we propose a novel fine-tuning method called Style Tailoring, which jointly fits the content and style distribution and achieves best tradeoff. Evaluation results show our method improves visual quality by 14%, prompt alignment by 16.2% and scene diversity by 15.3%, compared to prompt engineering the base Emu model for stickers generation.
DSOct 31, 2023
Online Conversion with Switching Costs: Robust and Learning-Augmented AlgorithmsAdam Lechowicz, Nicolas Christianson, Bo Sun et al.
We introduce and study online conversion with switching costs, a family of online problems that capture emerging problems at the intersection of energy and sustainability. In this problem, an online player attempts to purchase (alternatively, sell) fractional shares of an asset during a fixed time horizon with length $T$. At each time step, a cost function (alternatively, price function) is revealed, and the player must irrevocably decide an amount of asset to convert. The player also incurs a switching cost whenever their decision changes in consecutive time steps, i.e., when they increase or decrease their purchasing amount. We introduce competitive (robust) threshold-based algorithms for both the minimization and maximization variants of this problem, and show they are optimal among deterministic online algorithms. We then propose learning-augmented algorithms that take advantage of untrusted black-box advice (such as predictions from a machine learning model) to achieve significantly better average-case performance without sacrificing worst-case competitive guarantees. Finally, we empirically evaluate our proposed algorithms using a carbon-aware EV charging case study, showing that our algorithms substantially improve on baseline methods for this problem.
LGOct 17, 2023
Online Algorithms with Uncertainty-Quantified PredictionsBo Sun, Jerry Huang, Nicolas Christianson et al.
The burgeoning field of algorithms with predictions studies the problem of using possibly imperfect machine learning predictions to improve online algorithm performance. While nearly all existing algorithms in this framework make no assumptions on prediction quality, a number of methods providing uncertainty quantification (UQ) on machine learning models have been developed in recent years, which could enable additional information about prediction quality at decision time. In this work, we investigate the problem of optimally utilizing uncertainty-quantified predictions in the design of online algorithms. In particular, we study two classic online problems, ski rental and online search, where the decision-maker is provided predictions augmented with UQ describing the likelihood of the ground truth falling within a particular range of values. We demonstrate that non-trivial modifications to algorithm design are needed to fully leverage the UQ predictions. Moreover, we consider how to utilize more general forms of UQ, proposing an online learning framework that learns to exploit UQ to make decisions in multi-instance settings.
CVJan 26, 2023
The Projection-Enhancement Network (PEN)Christopher Z. Eddy, Austin Naylor, Bo Sun
Contemporary approaches to instance segmentation in cell science use 2D or 3D convolutional networks depending on the experiment and data structures. However, limitations in microscopy systems or efforts to prevent phototoxicity commonly require recording sub-optimally sampled data regimes that greatly reduces the utility of such 3D data, especially in crowded environments with significant axial overlap between objects. In such regimes, 2D segmentations are both more reliable for cell morphology and easier to annotate. In this work, we propose the Projection Enhancement Network (PEN), a novel convolutional module which processes the sub-sampled 3D data and produces a 2D RGB semantic compression, and is trained in conjunction with an instance segmentation network of choice to produce 2D segmentations. Our approach combines augmentation to increase cell density using a low-density cell image dataset to train PEN, and curated datasets to evaluate PEN. We show that with PEN, the learned semantic representation in CellPose encodes depth and greatly improves segmentation performance in comparison to maximum intensity projection images as input, but does not similarly aid segmentation in region-based networks like Mask-RCNN. Finally, we dissect the segmentation strength against cell density of PEN with CellPose on disseminated cells from side-by-side spheroids. We present PEN as a data-driven solution to form compressed representations of 3D data that improve 2D segmentations from instance segmentation networks.
DSAug 14, 2024
Learning-Augmented Competitive Algorithms for Spatiotemporal Online Allocation with Deadline ConstraintsAdam Lechowicz, Nicolas Christianson, Bo Sun et al.
We introduce and study spatiotemporal online allocation with deadline constraints ($\mathsf{SOAD}$), a new online problem motivated by emerging challenges in sustainability and energy. In $\mathsf{SOAD}$, an online player completes a workload by allocating and scheduling it on the points of a metric space $(X, d)$ while subject to a deadline $T$. At each time step, a service cost function is revealed that represents the cost of servicing the workload at each point, and the player must irrevocably decide the current allocation of work to points. Whenever the player moves this allocation, they incur a movement cost defined by the distance metric $d(\cdot, \ \cdot)$ that captures, e.g., an overhead cost. $\mathsf{SOAD}$ formalizes the open problem of combining general metrics and deadline constraints in the online algorithms literature, unifying problems such as metrical task systems and online search. We propose a competitive algorithm for $\mathsf{SOAD}$ along with a matching lower bound establishing its optimality. Our main algorithm, \textsc{ST-CLIP}, is a learning-augmented algorithm that takes advantage of predictions (e.g., forecasts of relevant costs) and achieves an optimal consistency-robustness trade-off. We evaluate our proposed algorithms in a simulated case study of carbon-aware spatiotemporal workload management, an application in sustainable computing that schedules a delay-tolerant batch compute job on a distributed network of data centers. In these experiments, we show that \textsc{ST-CLIP} substantially improves on heuristic baseline methods.
CVMay 21
Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous DrivingJiahao Wang, Bo Sun, Yijing Bai et al.
Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. In contrast, in-the-wild data from sources like dashcams offers immense scale and diversity, capturing critical long-tail scenarios and novel environments. However, this unstructured, in-the-wild video data is incompatible with ADS expecting structured, multi-modal sensor inputs for validation and training. To bridge this data gap, we propose Sensor2Sensor, a novel generative modeling paradigm that translates in-the-wild monocular dashcam videos into a high-fidelity, multi-modal sensor suite (AV logs) comprising multi-view camera images and LiDAR point clouds. A core challenge is the lack of paired training data. We address this by converting real AV logs into dashcam-style videos via 4D Gaussian Splatting (4DGS) reconstruction and novel-view rendering. Sensor2Sensor then utilizes a diffusion architecture to perform the generative conversion. We perform comprehensive quantitative evaluations on the fidelity and realism of the generated sensor data. We demonstrate Sensor2Sensor's practical utility by converting challenging in-the-wild internet and dashcam footage into realistic, multi-modal data formats, further unlocking vast external data sources for AV development.
CVMar 10, 2021Code
FSCE: Few-Shot Object Detection via Contrastive Proposal EncodingBo Sun, Banghuai Li, Shengcai Cai et al.
Emerging interests have been brought to recognize previously unseen objects given very few training examples, known as few-shot object detection (FSOD). Recent researches demonstrate that good feature embedding is the key to reach favorable few-shot learning performance. We observe object proposals with different Intersection-of-Union (IoU) scores are analogous to the intra-image augmentation used in contrastive approaches. And we exploit this analogy and incorporate supervised contrastive learning to achieve more robust objects representations in FSOD. We present Few-Shot object detection via Contrastive proposals Encoding (FSCE), a simple yet effective approach to learning contrastive-aware object proposal encodings that facilitate the classification of detected objects. We notice the degradation of average precision (AP) for rare objects mainly comes from misclassifying novel instances as confusable classes. And we ease the misclassification issues by promoting instance level intra-class compactness and inter-class variance via our contrastive proposal encoding loss (CPE loss). Our design outperforms current state-of-the-art works in any shot and all data splits, with up to +8.8% on standard benchmark PASCAL VOC and +2.7% on challenging COCO benchmark. Code is available at: https: //github.com/MegviiDetection/FSCE
HCJan 3, 2014Code
Flight trajectory recreation and playback system of aerial mission based on ossimplanetWu Wu, Jiulin Hu, Xiaofang Huang et al.
Recreation of flight trajectory is important among research areas. The design of a flight trajectory recreation and playback system is presented in this paper. Rather than transferring the flight data to diagram, graph and table, flight data is visualized on the 3D global of ossimPlanet. ossimPlanet is an open-source 3D global geo-spatial viewer and the system realization is based on analysis it. Users are allowed to choose their interested flight of aerial mission. The aerial photographs and corresponding configuration files in which flight data is included would be read in. And the flight statuses would be stored. The flight trajectory is then recreated. Users can view the photographs and flight trajectory marks on the correct positions of 3D global. The scene along flight trajectory is also simulated at the plane's eye point. This paper provides a more intuitive way for recreation of flight trajectory. The cost is decreased remarkably and security is ensured by secondary development on open-source platform.
HCMay 9
Fatigue-Related Reaction Time Forecasting via EEG Functional Connectivity in Sustained Attention TaskBo Sun, Liang Ma
Mental fatigue related behavioral performance decline precipitates catastrophic accidents in sustained attention tasks. While existing neurophysiological systems effectively detect current behavioral performance, they often lack the capability to forecast behavioral lapses with sufficient temporal lead time for intervention. This study proposes a novel model for the reaction time (RT) forecasting using EEG functional connectivity features. Thirty participants engaged in a sustained Psychomotor Vigilance Test (PVT) with concurrent 30-channel EEG recording. Mutual information (MI) between electrodes was calculated as functional connectivity features. Random Forest regression model (RF) was trained to predict single-trial RTs across forecasting horizons ranging from 0 to 20 seconds. The model demonstrated robust predictive validity, achieving a Root Mean Square Error (RMSE) of 23.75 ms for immediate detection and maintaining high accuracy (RMSE = 24.07 ms) across different forecasting horizons. Interpretability analysis via SHAP and Linear Mixed Effects model further support the validity of the proposed model and revealed distinct temporal biomarkers. This study validates the feasibility of forecasting behavioral performance 20 seconds in advance, offering a promising methodology for proactive fatigue management in safety-critical systems.
CVNov 21, 2023
Instance-aware 3D Semantic Segmentation powered by Shape Generators and ClassifiersBo Sun, Qixing Huang, Xiangru Huang
Existing 3D semantic segmentation methods rely on point-wise or voxel-wise feature descriptors to output segmentation predictions. However, these descriptors are often supervised at point or voxel level, leading to segmentation models that can behave poorly at instance-level. In this paper, we proposed a novel instance-aware approach for 3D semantic segmentation. Our method combines several geometry processing tasks supervised at instance-level to promote the consistency of the learned feature representation. Specifically, our methods use shape generators and shape classifiers to perform shape reconstruction and classification tasks for each shape instance. This enforces the feature representation to faithfully encode both structural and local shape information, with an awareness of shape instances. In the experiments, our method significantly outperform existing approaches in 3D semantic segmentation on several public benchmarks, such as Waymo Open Dataset, SemanticKITTI and ScanNetV2.
CVMar 30, 2025
MoCha: Towards Movie-Grade Talking Character SynthesisCong Wei, Bo Sun, Haoyu Ma et al.
Recent advancements in video generation have achieved impressive motion realism, yet they often overlook character-driven storytelling, a crucial task for automated film, animation generation. We introduce Talking Characters, a more realistic task to generate talking character animations directly from speech and text. Unlike talking head, Talking Characters aims at generating the full portrait of one or more characters beyond the facial region. In this paper, we propose MoCha, the first of its kind to generate talking characters. To ensure precise synchronization between video and speech, we propose a speech-video window attention mechanism that effectively aligns speech and video tokens. To address the scarcity of large-scale speech-labeled video datasets, we introduce a joint training strategy that leverages both speech-labeled and text-labeled video data, significantly improving generalization across diverse character actions. We also design structured prompt templates with character tags, enabling, for the first time, multi-character conversation with turn-based dialogue-allowing AI-generated characters to engage in context-aware conversations with cinematic coherence. Extensive qualitative and quantitative evaluations, including human preference studies and benchmark comparisons, demonstrate that MoCha sets a new standard for AI-generated cinematic storytelling, achieving superior realism, expressiveness, controllability and generalization.
CLJun 5, 2025
OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior SimulationZiyi Wang, Yuxuan Lu, Wenbo Li et al. · gatech, microsoft-research
Can large language models (LLMs) accurately simulate the next web action of a specific user? While LLMs have shown promising capabilities in generating ``believable'' human behaviors, evaluating their ability to mimic real user behaviors remains an open challenge, largely due to the lack of high-quality, publicly available datasets that capture both the observable actions and the internal reasoning of an actual human user. To address this gap, we introduce OPERA, a novel dataset of Observation, Persona, Rationale, and Action collected from real human participants during online shopping sessions. OPERA is the first public dataset that comprehensively captures: user personas, browser observations, fine-grained web actions, and self-reported just-in-time rationales. We developed both an online questionnaire and a custom browser plugin to gather this dataset with high fidelity. Using OPERA, we establish the first benchmark to evaluate how well current LLMs can predict a specific user's next action and rationale with a given persona and <observation, action, rationale> history. This dataset lays the groundwork for future research into LLM agents that aim to act as personalized digital twins for human.
LGNov 8, 2025
Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on DouyinLin Guan, Jia-Qi Yang, Zhishan Zhao et al.
Short-video recommenders such as Douyin must exploit extremely long user histories without breaking latency or cost budgets. We present an end-to-end system that scales long-sequence modeling to 10k-length histories in production. First, we introduce Stacked Target-to-History Cross Attention (STCA), which replaces history self-attention with stacked cross-attention from the target to the history, reducing complexity from quadratic to linear in sequence length and enabling efficient end-to-end training. Second, we propose Request Level Batching (RLB), a user-centric batching scheme that aggregates multiple targets for the same user/request to share the user-side encoding, substantially lowering sequence-related storage, communication, and compute without changing the learning objective. Third, we design a length-extrapolative training strategy -- train on shorter windows, infer on much longer ones -- so the model generalizes to 10k histories without additional training cost. Across offline and online experiments, we observe predictable, monotonic gains as we scale history length and model capacity, mirroring the scaling law behavior observed in large language models. Deployed at full traffic on Douyin, our system delivers significant improvements on key engagement metrics while meeting production latency, demonstrating a practical path to scaling end-to-end long-sequence recommendation to the 10k regime.
DSFeb 21, 2024
Chasing Convex Functions with Long-term ConstraintsAdam Lechowicz, Nicolas Christianson, Bo Sun et al.
We introduce and study a family of online metric problems with long-term constraints. In these problems, an online player makes decisions $\mathbf{x}_t$ in a metric space $(X,d)$ to simultaneously minimize their hitting cost $f_t(\mathbf{x}_t)$ and switching cost as determined by the metric. Over the time horizon $T$, the player must satisfy a long-term demand constraint $\sum_{t} c(\mathbf{x}_t) \geq 1$, where $c(\mathbf{x}_t)$ denotes the fraction of demand satisfied at time $t$. Such problems can find a wide array of applications to online resource allocation in sustainable energy/computing systems. We devise optimal competitive and learning-augmented algorithms for the case of bounded hitting cost gradients and weighted $\ell_1$ metrics, and further show that our proposed algorithms perform well in numerical experiments.
CLMay 20, 2025
Unifying Inference-Time Planning Language GenerationPrabhu Prakash Kagitha, Bo Sun, Ishan Desai et al.
A line of work in planning uses LLM not to generate a plan, but to generate a formal representation in some planning language, which can be input into a symbolic solver to deterministically find a plan. While showing improved trust and promising performance, dozens of recent publications have proposed scattered methods on a variety of benchmarks under different experimental settings. We attempt to unify the inference-time LLM-as-formalizer methodology for classical planning by proposing a unifying framework based on intermediate representations. We thus systematically evaluate more than a dozen pipelines that subsume most existing work, while proposing novel ones that involve syntactically similar but high resource intermediate languages (such as a Python wrapper of PDDL). We provide recipes for planning language generation pipelines, draw a series of conclusions showing the efficacy of their various components, and evidence their robustness against problem complexity.
CVJan 24, 2025
Glissando-Net: Deep sinGLe vIew category level poSe eStimation ANd 3D recOnstructionBo Sun, Hao Kang, Li Guan et al.
We present a deep learning model, dubbed Glissando-Net, to simultaneously estimate the pose and reconstruct the 3D shape of objects at the category level from a single RGB image. Previous works predominantly focused on either estimating poses(often at the instance level), or reconstructing shapes, but not both. Glissando-Net is composed of two auto-encoders that are jointly trained, one for RGB images and the other for point clouds. We embrace two key design choices in Glissando-Net to achieve a more accurate prediction of the 3D shape and pose of the object given a single RGB image as input. First, we augment the feature maps of the point cloud encoder and decoder with transformed feature maps from the image decoder, enabling effective 2D-3D interaction in both training and prediction. Second, we predict both the 3D shape and pose of the object in the decoder stage. This way, we better utilize the information in the 3D point clouds presented only in the training stage to train the network for more accurate prediction. We jointly train the two encoder-decoders for RGB and point cloud data to learn how to pass latent features to the point cloud decoder during inference. In testing, the encoder of the 3D point cloud is discarded. The design of Glissando-Net is inspired by codeSLAM. Unlike codeSLAM, which targets 3D reconstruction of scenes, we focus on pose estimation and shape reconstruction of objects, and directly predict the object pose and a pose invariant 3D reconstruction without the need of the code optimization step. Extensive experiments, involving both ablation studies and comparison with competing methods, demonstrate the efficacy of our proposed method, and compare favorably with the state-of-the-art.
CLOct 16, 2025
DPRF: A Generalizable Dynamic Persona Refinement Framework for Optimizing Behavior Alignment Between Personalized LLM Role-Playing Agents and HumansBingsheng Yao, Bo Sun, Yuanzhe Dong et al.
The emerging large language model role-playing agents (LLM RPAs) aim to simulate individual human behaviors, but the persona fidelity is often undermined by manually-created profiles (e.g., cherry-picked information and personality characteristics) without validating the alignment with the target individuals. To address this limitation, our work introduces the Dynamic Persona Refinement Framework (DPRF). DPRF aims to optimize the alignment of LLM RPAs' behaviors with those of target individuals by iteratively identifying the cognitive divergence, either through free-form or theory-grounded, structured analysis, between generated behaviors and human ground truth, and refining the persona profile to mitigate these divergences. We evaluate DPRF with five LLMs on four diverse behavior-prediction scenarios: formal debates, social media posts with mental health issues, public interviews, and movie reviews. DPRF can consistently improve behavioral alignment considerably over baseline personas and generalizes across models and scenarios. Our work provides a robust methodology for creating high-fidelity persona profiles and enhancing the validity of downstream applications, such as user simulation, social studies, and personalized AI.
CVOct 7, 2025
Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation ModelsJiahao Wang, Zhenpei Yang, Yijing Bai et al.
Recent advances in generative models have sparked exciting new possibilities in the field of autonomous vehicles. Specifically, video generation models are now being explored as controllable virtual testing environments. Simultaneously, end-to-end (E2E) driving models have emerged as a streamlined alternative to conventional modular autonomous driving systems, gaining popularity for their simplicity and scalability. However, the application of these techniques to simulation and planning raises important questions. First, while video generation models can generate increasingly realistic videos, can these videos faithfully adhere to the specified conditions and be realistic enough for E2E autonomous planner evaluation? Second, given that data is crucial for understanding and controlling E2E planners, how can we gain deeper insights into their biases and improve their ability to generalize to out-of-distribution scenarios? In this work, we bridge the gap between the driving models and generative world models (Drive&Gen) to address these questions. We propose novel statistical measures leveraging E2E drivers to evaluate the realism of generated videos. By exploiting the controllability of the video generation model, we conduct targeted experiments to investigate distribution gaps affecting E2E planner performance. Finally, we show that synthetic data produced by the video generation model offers a cost-effective alternative to real-world data collection. This synthetic data effectively improves E2E model generalization beyond existing Operational Design Domains, facilitating the expansion of autonomous vehicle services into new operational contexts.
LGSep 24, 2025
Consistent Estimation of Numerical Distributions under Local Differential Privacy by Wavelet ExpansionPuning Zhao, Zhikun Zhang, Bo Sun et al.
Distribution estimation under local differential privacy (LDP) is a fundamental and challenging task. Significant progresses have been made on categorical data. However, due to different evaluation metrics, these methods do not work well when transferred to numerical data. In particular, we need to prevent the probability mass from being misplaced far away. In this paper, we propose a new approach that express the sample distribution using wavelet expansions. The coefficients of wavelet series are estimated under LDP. Our method prioritizes the estimation of low-order coefficients, in order to ensure accurate estimation at macroscopic level. Therefore, the probability mass is prevented from being misplaced too far away from its ground truth. We establish theoretical guarantees for our methods. Experiments show that our wavelet expansion method significantly outperforms existing solutions under Wasserstein and KS distances.
LGJul 21, 2025
Competitive Algorithms for Multi-Agent Ski-Rental ProblemsXuchuang Wang, Bo Sun, Hedyeh Beyhaghi et al.
This paper introduces a novel multi-agent ski-rental problem that generalizes the classical ski-rental dilemma to a group setting where agents incur individual and shared costs. In our model, each agent can either rent at a fixed daily cost, or purchase a pass at an individual cost, with an additional third option of a discounted group pass available to all. We consider scenarios in which agents' active days differ, leading to dynamic states as agents drop out of the decision process. To address this problem from different perspectives, we define three distinct competitive ratios: overall, state-dependent, and individual rational. For each objective, we design and analyze optimal deterministic and randomized policies. Our deterministic policies employ state-aware threshold functions that adapt to the dynamic states, while our randomized policies sample and resample thresholds from tailored state-aware distributions. The analysis reveals that symmetric policies, in which all agents use the same threshold, outperform asymmetric ones. Our results provide competitive ratio upper and lower bounds and extend classical ski-rental insights to multi-agent settings, highlighting both theoretical and practical implications for group decision-making under uncertainty.
CVJun 24, 2025
SceneCrafter: Controllable Multi-View Driving Scene EditingZehao Zhu, Yuliang Zou, Chiyu Max Jiang et al.
Simulation is crucial for developing and evaluating autonomous vehicle (AV) systems. Recent literature builds on a new generation of generative models to synthesize highly realistic images for full-stack simulation. However, purely synthetically generated scenes are not grounded in reality and have difficulty in inspiring confidence in the relevance of its outcomes. Editing models, on the other hand, leverage source scenes from real driving logs, and enable the simulation of different traffic layouts, behaviors, and operating conditions such as weather and time of day. While image editing is an established topic in computer vision, it presents fresh sets of challenges in driving simulation: (1) the need for cross-camera 3D consistency, (2) learning ``empty street" priors from driving data with foreground occlusions, and (3) obtaining paired image tuples of varied editing conditions while preserving consistent layout and geometry. To address these challenges, we propose SceneCrafter, a versatile editor for realistic 3D-consistent manipulation of driving scenes captured from multiple cameras. We build on recent advancements in multi-view diffusion models, using a fully controllable framework that scales seamlessly to multi-modality conditions like weather, time of day, agent boxes and high-definition maps. To generate paired data for supervising the editing model, we propose a novel framework on top of Prompt-to-Prompt to generate geometrically consistent synthetic paired data with global edits. We also introduce an alpha-blending framework to synthesize data with local edits, leveraging a model trained on empty street priors through novel masked training and multi-view repaint paradigm. SceneCrafter demonstrates powerful editing capabilities and achieves state-of-the-art realism, controllability, 3D consistency, and scene editing quality compared to existing baselines.
DSFeb 6, 2025
Knowing When to Stop Matters: A Unified Algorithm for Online Conversion under Horizon UncertaintyYanzhao Wang, Hasti Nourmohammadi Sigaroudi, Bo Sun et al.
This paper investigates the online conversion problem, which involves sequentially trading a divisible resource (e.g., energy) under dynamically changing prices to maximize profit. A key challenge in online conversion is managing decisions under horizon uncertainty, where the duration of trading is either known, revealed partway, or entirely unknown. We propose a unified algorithm that achieves optimal competitive guarantees across these horizon models, accounting for practical constraints such as box constraints, which limit the maximum allowable trade per step. Additionally, we extend the algorithm to a learning-augmented version, leveraging horizon predictions to adaptively balance performance: achieving near-optimal results when predictions are accurate while maintaining strong guarantees when predictions are unreliable. These results advance the understanding of online conversion under various degrees of horizon uncertainty and provide more practical strategies to address real world constraints.
LGDec 20, 2024
Multi-Source Unsupervised Domain Adaptation with Prototype AggregationMin Huang, Zifeng Xie, Bo Sun et al.
Multi-source domain adaptation (MSDA) plays an important role in industrial model generalization. Recent efforts on MSDA focus on enhancing multi-domain distributional alignment while omitting three issues, e.g., the class-level discrepancy quantification, the unavailability of noisy pseudo-label, and source transferability discrimination, potentially resulting in suboptimal adaption performance. Therefore, we address these issues by proposing a prototype aggregation method that models the discrepancy between source and target domains at the class and domain levels. Our method achieves domain adaptation based on a group of prototypes (i.e., representative feature embeddings). A similarity score-based strategy is designed to quantify the transferability of each domain. At the class level, our method quantifies class-specific cross-domain discrepancy according to reliable target pseudo-labels. At the domain level, our method establishes distributional alignment between noisy pseudo-labeled target samples and the source domain prototypes. Therefore, adaptation at the class and domain levels establishes a complementary mechanism to obtain accurate predictions. The results on three standard benchmarks demonstrate that our method outperforms most state-of-the-art methods. In addition, we provide further elaboration of the proposed method in light of the interpretable results obtained from the analysis experiments.
LGJun 26, 2024
Near-Optimal Consistency-Robustness Trade-Offs for Learning-Augmented Online Knapsack ProblemsMohammadreza Daneshvaramoli, Helia Karisani, Adam Lechowicz et al.
This paper introduces a family of learning-augmented algorithms for online knapsack problems that achieve near Pareto-optimal consistency-robustness trade-offs through a simple combination of trusted learning-augmented and worst-case algorithms. Our approach relies on succinct, practical predictions -- single values or intervals estimating the minimum value of any item in an offline solution. Additionally, we propose a novel fractional-to-integral conversion procedure, offering new insights for online algorithm design.
CVJun 17, 2024
TutteNet: Injective 3D Deformations by Composition of 2D Mesh DeformationsBo Sun, Thibault Groueix, Chen Song et al.
This work proposes a novel representation of injective deformations of 3D space, which overcomes existing limitations of injective methods: inaccuracy, lack of robustness, and incompatibility with general learning and optimization frameworks. The core idea is to reduce the problem to a deep composition of multiple 2D mesh-based piecewise-linear maps. Namely, we build differentiable layers that produce mesh deformations through Tutte's embedding (guaranteed to be injective in 2D), and compose these layers over different planes to create complex 3D injective deformations of the 3D volume. We show our method provides the ability to efficiently and accurately optimize and learn complex deformations, outperforming other injective approaches. As a main application, we produce complex and artifact-free NeRF and SDF deformations.
CVJan 19, 2024
360ORB-SLAM: A Visual SLAM System for Panoramic Images with Depth Completion NetworkYichen Chen, Yiqi Pan, Ruyu Liu et al.
To enhance the performance and effect of AR/VR applications and visual assistance and inspection systems, visual simultaneous localization and mapping (vSLAM) is a fundamental task in computer vision and robotics. However, traditional vSLAM systems are limited by the camera's narrow field-of-view, resulting in challenges such as sparse feature distribution and lack of dense depth information. To overcome these limitations, this paper proposes a 360ORB-SLAM system for panoramic images that combines with a depth completion network. The system extracts feature points from the panoramic image, utilizes a panoramic triangulation module to generate sparse depth information, and employs a depth completion network to obtain a dense panoramic depth map. Experimental results on our novel panoramic dataset constructed based on Carla demonstrate that the proposed method achieves superior scale accuracy compared to existing monocular SLAM methods and effectively addresses the challenges of feature association and scale ambiguity. The integration of the depth completion network enhances system stability and mitigates the impact of dynamic elements on SLAM performance.
LGMay 22, 2023
Time Fairness in Online Knapsack ProblemsAdam Lechowicz, Rik Sengupta, Bo Sun et al.
The online knapsack problem is a classic problem in the field of online algorithms. Its canonical version asks how to pack items of different values and weights arriving online into a capacity-limited knapsack so as to maximize the total value of the admitted items. Although optimal competitive algorithms are known for this problem, they may be fundamentally unfair, i.e., individual items may be treated inequitably in different ways. We formalize a practically-relevant notion of time fairness which effectively models a trade off between static and dynamic pricing in a motivating application such as cloud resource allocation, and show that existing algorithms perform poorly under this metric. We propose a parameterized deterministic algorithm where the parameter precisely captures the Pareto-optimal trade-off between fairness (static pricing) and competitiveness (dynamic pricing). We show that randomization is theoretically powerful enough to be simultaneously competitive and fair; however, it does not work well in experiments. To further improve the trade-off between fairness and competitiveness, we develop a nearly-optimal learning-augmented algorithm which is fair, consistent, and robust (competitive), showing substantial performance improvements in numerical experiments.
CLMay 9, 2023
CSED: A Chinese Semantic Error Diagnosis CorpusBo Sun, Baoxin Wang, Yixuan Wang et al.
Recently, much Chinese text error correction work has focused on Chinese Spelling Check (CSC) and Chinese Grammatical Error Diagnosis (CGED). In contrast, little attention has been paid to the complicated problem of Chinese Semantic Error Diagnosis (CSED), which lacks relevant datasets. The study of semantic errors is important because they are very common and may lead to syntactic irregularities or even problems of comprehension. To investigate this, we build the CSED corpus, which includes two datasets. The one is for the CSED-Recognition (CSED-R) task. The other is for the CSED-Correction (CSED-C) task. Our annotation guarantees high-quality data through quality assurance mechanisms. Our experiments show that powerful pre-trained models perform poorly on this corpus. We also find that the CSED task is challenging, as evidenced by the fact that even humans receive a low score. This paper proposes syntax-aware models to specifically adapt to the CSED task. The experimental results show that the introduction of the syntax-aware approach is meaningful.
LGSep 3, 2021
Pareto-Optimal Learning-Augmented Algorithms for Online Conversion ProblemsBo Sun, Russell Lee, Mohammad Hajiesmaili et al.
This paper leverages machine-learned predictions to design competitive algorithms for online conversion problems with the goal of improving the competitive ratio when predictions are accurate (i.e., consistency), while also guaranteeing a worst-case competitive ratio regardless of the prediction quality (i.e., robustness). We unify the algorithmic design of both integral and fractional conversion problems, which are also known as the 1-max-search and one-way trading problems, into a class of online threshold-based algorithms (OTA). By incorporating predictions into design of OTA, we achieve the Pareto-optimal trade-off of consistency and robustness, i.e., no online algorithm can achieve a better consistency guarantee given for a robustness guarantee. We demonstrate the performance of OTA using numerical experiments on Bitcoin conversion.
CVAug 21, 2021
ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape GeneratorsQixing Huang, Xiangru Huang, Bo Sun et al.
This paper introduces an unsupervised loss for training parametric deformation shape generators. The key idea is to enforce the preservation of local rigidity among the generated shapes. Our approach builds on an approximation of the as-rigid-as possible (or ARAP) deformation energy. We show how to develop the unsupervised loss via a spectral decomposition of the Hessian of the ARAP energy. Our loss nicely decouples pose and shape variations through a robust norm. The loss admits simple closed-form expressions. It is easy to train and can be plugged into any standard generation models, e.g., variational auto-encoder (VAE) and auto-decoder (AD). Experimental results show that our approach outperforms existing shape generation approaches considerably on public benchmark datasets of various shape categories such as human, animal and bone.
IVMar 26, 2021
Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGBBo Sun, Junchi Yan, Xiao Zhou et al.
To reconstruct spectral signals from multi-channel observations, in particular trichromatic RGBs, has recently emerged as a promising alternative to traditional scanning-based spectral imager. It has been proven that the reconstruction accuracy relies heavily on the spectral response of the RGB camera in use. To improve accuracy, data-driven algorithms have been proposed to retrieve the best response curves of existing RGB cameras, or even to design brand new three-channel response curves. Instead, this paper explores the filter-array based color imaging mechanism of existing RGB cameras, and proposes to design the IR-cut filter properly for improved spectral recovery, which stands out as an in-between solution with better trade-off between reconstruction accuracy and implementation complexity. We further propose a deep learning based spectral reconstruction method, which allows to recover the illumination spectrum as well. Experiment results with both synthetic and real images under daylight illumination have shown the benefits of our IR-cut filter tuning method and our illumination-aware spectral reconstruction method.
LGDec 2, 2020
Learning Order Parameters from Videos of Dynamical Phases for Skyrmions with Neural NetworksWeidi Wang, Zeyuan Wang, Yinghui Zhang et al.
The ability to recognize dynamical phenomena (e.g., dynamical phases) and dynamical processes in physical events from videos, then to abstract physical concepts and reveal physical laws, lies at the core of human intelligence. The main purposes of this paper are to use neural networks for classifying the dynamical phases of some videos and to demonstrate that neural networks can learn physical concepts from them. To this end, we employ multiple neural networks to recognize the static phases (image format) and dynamical phases (video format) of a particle-based skyrmion model. Our results show that neural networks, without any prior knowledge, can not only correctly classify these phases, but also predict the phase boundaries which agree with those obtained by simulation. We further propose a parameter visualization scheme to interpret what neural networks have learned. We show that neural networks can learn two order parameters from videos of dynamical phases and predict the critical values of two order parameters. Finally, we demonstrate that only two order parameters are needed to identify videos of skyrmion dynamical phases. It shows that this parameter visualization scheme can be used to determine how many order parameters are needed to fully recognize the input phases. Our work sheds light on the future use of neural networks in discovering new physical concepts and revealing unknown yet physical laws from videos.
CVJun 10, 2020
H3DNet: 3D Object Detection Using Hybrid Geometric PrimitivesZaiwei Zhang, Bo Sun, Haitao Yang et al.
We introduce H3DNet, which takes a colorless 3D point cloud as input and outputs a collection of oriented object bounding boxes (or BB) and their semantic labels. The critical idea of H3DNet is to predict a hybrid set of geometric primitives, i.e., BB centers, BB face centers, and BB edge centers. We show how to convert the predicted geometric primitives into object proposals by defining a distance function between an object and the geometric primitives. This distance function enables continuous optimization of object proposals, and its local minimums provide high-fidelity object proposals. H3DNet then utilizes a matching and refinement module to classify object proposals into detected objects and fine-tune the geometric parameters of the detected objects. The hybrid set of geometric primitives not only provides more accurate signals for object detection than using a single type of geometric primitives, but it also provides an overcomplete set of constraints on the resulting 3D layout. Therefore, H3DNet can tolerate outliers in predicted geometric primitives. Our model achieves state-of-the-art 3D detection results on two large datasets with real 3D scans, ScanNet and SUN RGB-D.
CVMay 4, 2019
Oriented Point Sampling for Plane Detection in Unorganized Point CloudsBo Sun, Philippos Mordohai
Plane detection in 3D point clouds is a crucial pre-processing step for applications such as point cloud segmentation, semantic mapping and SLAM. In contrast to many recent plane detection methods that are only applicable on organized point clouds, our work is targeted to unorganized point clouds that do not permit a 2D parametrization. We compare three methods for detecting planes in point clouds efficiently. One is a novel method proposed in this paper that generates plane hypotheses by sampling from a set of points with estimated normals. We named this method Oriented Point Sampling (OPS) to contrast with more conventional techniques that require the sampling of three unoriented points to generate plane hypotheses. We also implemented an efficient plane detection method based on local sampling of three unoriented points and compared it with OPS and the 3D-KHT algorithm, which is based on octrees, on the detection of planes on 10,000 point clouds from the SUN RGB-D dataset.
CVNov 30, 2018
Adversarial Defense by Stratified Convolutional Sparse CodingBo Sun, Nian-hsuan Tsai, Fangchen Liu et al.
We propose an adversarial defense method that achieves state-of-the-art performance among attack-agnostic adversarial defense methods while also maintaining robustness to input resolution, scale of adversarial perturbation, and scale of dataset size. Based on convolutional sparse coding, we construct a stratified low-dimensional quasi-natural image space that faithfully approximates the natural image space while also removing adversarial perturbations. We introduce a novel Sparse Transformation Layer (STL) in between the input image and the first layer of the neural network to efficiently project images into our quasi-natural image space. Our experiments show state-of-the-art performance of our method compared to other attack-agnostic adversarial defense methods in various adversarial settings.
CVNov 23, 2018
MVPNet: Multi-View Point Regression Networks for 3D Object Reconstruction from A Single ImageJinglu Wang, Bo Sun, Yan Lu
In this paper, we address the problem of reconstructing an object's surface from a single image using generative networks. First, we represent a 3D surface with an aggregation of dense point clouds from multiple views. Each point cloud is embedded in a regular 2D grid aligned on an image plane of a viewpoint, making the point cloud convolution-favored and ordered so as to fit into deep network architectures. The point clouds can be easily triangulated by exploiting connectivities of the 2D grids to form mesh-based surfaces. Second, we propose an encoder-decoder network that generates such kind of multiple view-dependent point clouds from a single image by regressing their 3D coordinates and visibilities. We also introduce a novel geometric loss that is able to interpret discrepancy over 3D surfaces as opposed to 2D projective planes, resorting to the surface discretization on the constructed meshes. We demonstrate that the multi-view point regression network outperforms state-of-the-art methods with a significant improvement on challenging datasets.
CRMay 7, 2018
Stay On-Topic: Generating Context-specific Fake Restaurant ReviewsMika Juuti, Bo Sun, Tatsuya Mori et al.
Automatically generated fake restaurant reviews are a threat to online review systems. Recent research has shown that users have difficulties in detecting machine-generated fake reviews hiding among real restaurant reviews. The method used in this work (char-LSTM ) has one drawback: it has difficulties staying in context, i.e. when it generates a review for specific target entity, the resulting review may contain phrases that are unrelated to the target, thus increasing its detectability. In this work, we present and evaluate a more sophisticated technique based on neural machine translation (NMT) with which we can generate reviews that stay on-topic. We test multiple variants of our technique using native English speakers on Amazon Mechanical Turk. We demonstrate that reviews generated by the best variant have almost optimal undetectability (class-averaged F-score 47%). We conduct a user study with skeptical users and show that our method evades detection more frequently compared to the state-of-the-art (average evasion 3.2/4 vs 1.5/4) with statistical significance, at level α = 1% (Section 4.3). We develop very effective detection tools and reach average F-score of 97% in classifying these. Although fake reviews are very effective in fooling people, effective automatic detection is still feasible.
CRSep 22, 2017
On Equivalence of Known Families of APN Functions in Small DimensionsBo Sun
In this extended abstract, we computationally check and list the CCZ-inequivalent APN functions from infinite families on $\mathbb{F}_2^n$ for n from 6 to 11. These functions are selected with simplest coefficients from CCZ-inequivalent classes. This work can simplify checking CCZ-equivalence between any APN function and infinite APN families.
MLAug 18, 2017
The Stochastic Replica Approach to Machine Learning: Stability and Parameter OptimizationPatrick Chao, Tahereh Mazaheri, Bo Sun et al.
We introduce a statistical physics inspired supervised machine learning algorithm for classification and regression problems. The method is based on the invariances or stability of predicted results when known data is represented as expansions in terms of various stochastic functions. The algorithm predicts the classification/regression values of new data by combining (via voting) the outputs of these numerous linear expansions in randomly chosen functions. The few parameters (typically only one parameter is used in all studied examples) that this model has may be automatically optimized. The algorithm has been tested on 10 diverse training data sets of various types and feature space dimensions. It has been shown to consistently exhibit high accuracy and readily allow for optimization of parameters, while simultaneously avoiding pitfalls of existing algorithms such as those associated with class imbalance. We very briefly speculate on whether spatial coordinates in physical theories may be viewed as emergent "features" that enable a robust machine learning type description of data with generic low order smooth functions.
CRFeb 10, 2017
A Study on the Vulnerabilities of Mobile Apps associated with Software ModulesTakuya Watanabe, Mitsuaki Akiyama, Fumihiro Kanei et al.
This paper reports a large-scale study that aims to understand how mobile application (app) vulnerabilities are associated with software libraries. We analyze both free and paid apps. Studying paid apps was quite meaningful because it helped us understand how differences in app development/maintenance affect the vulnerabilities associated with libraries. We analyzed 30k free and paid apps collected from the official Android marketplace. Our extensive analyses revealed that approximately 70%/50% of vulnerabilities of free/paid apps stem from software libraries, particularly from third-party libraries. Somewhat paradoxically, we found that more expensive/popular paid apps tend to have more vulnerabilities. This comes from the fact that more expensive/popular paid apps tend to have more functionality, i.e., more code and libraries, which increases the probability of vulnerabilities. Based on our findings, we provide suggestions to stakeholders of mobile app distribution ecosystems.