CLAug 22, 2022
Type-enriched Hierarchical Contrastive Strategy for Fine-Grained Entity TypingXinyu Zuo, Haijin Liang, Ning Jing et al. · pku
Fine-grained entity typing (FET) aims to deduce specific semantic types of the entity mentions in text. Modern methods for FET mainly focus on learning what a certain type looks like. And few works directly model the type differences, that is, let models know the extent that one type is different from others. To alleviate this problem, we propose a type-enriched hierarchical contrastive strategy for FET. Our method can directly model the differences between hierarchical types and improve the ability to distinguish multi-grained similar types. On the one hand, we embed type into entity contexts to make type information directly perceptible. On the other hand, we design a constrained contrastive strategy on the hierarchical structure to directly model the type differences, which can simultaneously perceive the distinguishability between types at different granularity. Experimental results on three benchmarks, BBN, OntoNotes, and FIGER show that our method achieves significant performance on FET by effectively modeling type differences.
LGDec 29, 2025
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at MetaGang Liao, Hongsen Qin, Ying Wang et al.
Making deep learning recommendation model (DLRM) training and inference fast and efficient is important. However, this presents three key system challenges - model architecture diversity, kernel primitive diversity, and hardware generation and architecture heterogeneity. This paper presents KernelEvolve-an agentic kernel coding framework-to tackle heterogeneity at-scale for DLRM. KernelEvolve is designed to take kernel specifications as input and automate the process of kernel generation and optimization for recommendation model across heterogeneous hardware architectures. KernelEvolve does so by operating at multiple programming abstractions, from Triton and CuTe DSL to low-level hardware agnostic languages, spanning the full hardware-software optimization stack. The kernel optimization process is described as graph-based search with selection policy, universal operator, fitness function, and termination rule, dynamically adapts to runtime execution context through retrieval-augmented prompt synthesis. We designed, implemented, and deployed KernelEvolve to optimize a wide variety of production recommendation models across generations of NVIDIA and AMD GPUs, as well as Meta's AI accelerators. We validate KernelEvolve on the publicly-available KernelBench suite, achieving 100% pass rate on all 250 problems across three difficulty levels, and 160 PyTorch ATen operators across three heterogeneous hardware platforms, demonstrating 100% correctness. KernelEvolve reduces development time from weeks to hours and achieves substantial performance improvements over PyTorch baselines across diverse production use cases and for heterogeneous AI systems at-scale. Beyond performance efficiency improvements, KernelEvolve significantly mitigates the programmability barrier for new AI hardware by enabling automated kernel generation for in-house developed AI hardware.
CVDec 17, 2023Code
Primitive-based 3D Human-Object Interaction Modelling and ProgrammingSiqi Liu, Yong-Lu Li, Zhou Fang et al.
Embedding Human and Articulated Object Interaction (HAOI) in 3D is an important direction for a deeper human activity understanding. Different from previous works that use parametric and CAD models to represent humans and objects, in this work, we propose a novel 3D geometric primitive-based language to encode both humans and objects. Given our new paradigm, humans and objects are all compositions of primitives instead of heterogeneous entities. Thus, mutual information learning may be achieved between the limited 3D data of humans and different object categories. Moreover, considering the simplicity of the expression and the richness of the information it contains, we choose the superquadric as the primitive representation. To explore an effective embedding of HAOI for the machine, we build a new benchmark on 3D HAOI consisting of primitives together with their images and propose a task requiring machines to recover 3D HAOI using primitives from images. Moreover, we propose a baseline of single-view 3D reconstruction on HAOI. We believe this primitive-based 3D HAOI representation would pave the way for 3D HAOI studies. Our code and data are available at https://mvig-rhos.com/p3haoi.
CVSep 29, 2025Code
UI-UG: A Unified MLLM for UI Understanding and GenerationHao Yang, Weijie Qiu, Ru Zhang et al.
Although Multimodal Large Language Models (MLLMs) have been widely applied across domains, they are still facing challenges in domain-specific tasks, such as User Interface (UI) understanding accuracy and UI generation quality. In this paper, we introduce UI-UG (a unified MLLM for UI Understanding and Generation), integrating both capabilities. For understanding tasks, we employ Supervised Fine-tuning (SFT) combined with Group Relative Policy Optimization (GRPO) to enhance fine-grained understanding on the modern complex UI data. For generation tasks, we further use Direct Preference Optimization (DPO) to make our model generate human-preferred UIs. In addition, we propose an industrially effective workflow, including the design of an LLM-friendly domain-specific language (DSL), training strategies, rendering processes, and evaluation metrics. In experiments, our model achieves state-of-the-art (SOTA) performance on understanding tasks, outperforming both larger general-purpose MLLMs and similarly-sized UI-specialized models. Our model is also on par with these larger MLLMs in UI generation performance at a fraction of the computational cost. We also demonstrate that integrating understanding and generation tasks can improve accuracy and quality for both tasks. Code and Model: https://github.com/neovateai/UI-UG
CVDec 23, 2023Code
PACE: A Large-Scale Dataset with Pose Annotations in Cluttered EnvironmentsYang You, Kai Xiong, Zhening Yang et al.
We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the development and evaluation of pose estimation methods in cluttered scenarios. PACE provides a large-scale real-world benchmark for both instance-level and category-level settings. The benchmark consists of 55K frames with 258K annotations across 300 videos, covering 238 objects from 43 categories and featuring a mix of rigid and articulated items in cluttered scenes. To annotate the real-world data efficiently, we develop an innovative annotation system with a calibrated 3-camera setup. Additionally, we offer PACE-Sim, which contains 100K photo-realistic simulated frames with 2.4M annotations across 931 objects. We test state-of-the-art algorithms in PACE along two tracks: pose estimation, and object pose tracking, revealing the benchmark's challenges and research opportunities. Our benchmark code and data is available on https://github.com/qq456cvb/PACE.
HCMar 13
Memory Printer: Exploring Everyday Reminiscing by Combining Slow Design with Generative AI-based Image CreationZhou Fang, Janet Yi-Ching Huang
Generative Artificial Intelligence (GAI) offers new opportunities for reconstructing these unrecorded memory scenes, yet existing web-based tools undermine users' sense of agency through disengaging and unpredictable interactions. In this work, we advance three design arguments about how slow, tangible interaction can reshape human-AI relationships by making temporality, embodied agency, and generative processes experientially legible. We instantiate these arguments by presenting Memory Printer, a tangible design that combines silk-screen printing metaphors with text-to-image generation. The design features layered reconstruction that decomposes image generation into incremental steps, a physical wooden scraper enabling embodied control over image revelation, and built-in printing that produces tangible photos. We examine these arguments through a comparative study with 24 participants, exploring how participants engage with, interpret, and respond to this interaction stance. The study surfaces both opportunities -- such as vivid memory evocation, heightened sense of control, and creative exploration -- and critical tensions, including risks of false memory formation, algorithmic bias, and data privacy. Together, these findings articulate important boundaries for deploying generative AI in emotionally sensitive contexts.
ROMar 18
ProbeFlow: Training-Free Adaptive Flow Matching for Vision-Language-Action ModelsZhou Fang, Jiaqi Wang, Yi Zhou et al.
Recent Vision-Language-Action (VLA) models equipped with Flow Matching (FM) action heads achieve state-of-the-art performance in complex robot manipulation. However, the multi-step iterative ODE solving required by FM introduces inference latency that precludes responsive physical control. While current acceleration efforts optimize the Vision-Language Model (VLM) backbone, the action head bottleneck remains overlooked. To address this, we propose ProbeFlow, a training-free adaptive inference framework tai- lored for continuous robotic control. By evaluating geometric trajectory complexity via the cosine similarity between initial and lookahead velocity vectors, ProbeFlow dynamically sched- ules integration steps to prune redundant network evaluations. On the MetaWorld benchmark, it accelerates action decoding by 14.8x (reducing average steps from N = 50 to 2.6) and cuts end-to-end system latency by 2.8x without compromising the manipulation success rate. On the long-horizon LIBERO benchmark, the probe automatically allocates a denser schedule to navigate semantic bottlenecks, effectively resolving the flow solver delay. Real-world physical deployments confirm that ProbeFlow successfully mitigates action decoding latency while ensuring execution stability, offering a highly practical solution for low-latency continuous generative policies.
CLMar 14, 2025
Semantic and Contextual Modeling for Malicious Comment Detection with BERT-BiLSTMZhou Fang, Hanlu Zhang, Jacky He et al.
This study aims to develop an efficient and accurate model for detecting malicious comments, addressing the increasingly severe issue of false and harmful content on social media platforms. We propose a deep learning model that combines BERT and BiLSTM. The BERT model, through pre-training, captures deep semantic features of text, while the BiLSTM network excels at processing sequential data and can further model the contextual dependencies of text. Experimental results on the Jigsaw Unintended Bias in Toxicity Classification dataset demonstrate that the BERT+BiLSTM model achieves superior performance in malicious comment detection tasks, with a precision of 0.94, recall of 0.93, and accuracy of 0.94. This surpasses other models, including standalone BERT, TextCNN, TextRNN, and traditional machine learning algorithms using TF-IDF features. These results confirm the superiority of the BERT+BiLSTM model in handling imbalanced data and capturing deep semantic features of malicious comments, providing an effective technical means for social media content moderation and online environment purification.
DCMay 1, 2025
Intelligent Task Scheduling for Microservices via A3C-Based Reinforcement LearningYang Wang, Tengda Tang, Zhou Fang et al.
To address the challenges of high resource dynamism and intensive task concurrency in microservice systems, this paper proposes an adaptive resource scheduling method based on the A3C reinforcement learning algorithm. The scheduling problem is modeled as a Markov Decision Process, where policy and value networks are jointly optimized to enable fine-grained resource allocation under varying load conditions. The method incorporates an asynchronous multi-threaded learning mechanism, allowing multiple agents to perform parallel sampling and synchronize updates to the global network parameters. This design improves both policy convergence efficiency and model stability. In the experimental section, a real-world dataset is used to construct a scheduling scenario. The proposed method is compared with several typical approaches across multiple evaluation metrics, including task delay, scheduling success rate, resource utilization, and convergence speed. The results show that the proposed method delivers high scheduling performance and system stability in multi-task concurrent environments. It effectively alleviates the resource allocation bottlenecks faced by traditional methods under heavy load, demonstrating its practical value for intelligent scheduling in microservice systems.
LGMay 14, 2024
vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy ImprovementYiwen Zhu, Jinyi Liu, Wenya Wei et al.
Reinforcement Learning (RL) is a widely employed technique in decision-making problems, encompassing two fundamental operations -- policy evaluation and policy improvement. Enhancing learning efficiency remains a key challenge in RL, with many efforts focused on using ensemble critics to boost policy evaluation efficiency. However, when using multiple critics, the actor in the policy improvement process can obtain different gradients. Previous studies have combined these gradients without considering their disagreements. Therefore, optimizing the policy improvement process is crucial to enhance learning efficiency. This study focuses on investigating the impact of gradient disagreements caused by ensemble critics on policy improvement. We introduce the concept of uncertainty of gradient directions as a means to measure the disagreement among gradients utilized in the policy improvement process. Through measuring the disagreement among gradients, we find that transitions with lower uncertainty of gradient directions are more reliable in the policy improvement process. Building on this analysis, we propose a method called von Mises-Fisher Experience Resampling (vMFER), which optimizes the policy improvement process by resampling transitions and assigning higher confidence to transitions with lower uncertainty of gradient directions. Our experiments demonstrate that vMFER significantly outperforms the benchmark and is particularly well-suited for ensemble structures in RL.
LGApr 15, 2025
Dynamical errors in machine learning forecastsZhou Fang, Gianmarco Mengaldo
In machine learning forecasting, standard error metrics such as mean absolute error (MAE) and mean squared error (MSE) quantify discrepancies between predictions and target values. However, these metrics do not directly evaluate the physical and/or dynamical consistency of forecasts, an increasingly critical concern in scientific and engineering applications. Indeed, a fundamental yet often overlooked question is whether machine learning forecasts preserve the dynamical behavior of the underlying system. Addressing this issue is essential for assessing the fidelity of machine learning models and identifying potential failure modes, particularly in applications where maintaining correct dynamical behavior is crucial. In this work, we investigate the relationship between standard forecasting error metrics, such as MAE and MSE, and the dynamical properties of the underlying system. To achieve this goal, we use two recently developed dynamical indices: the instantaneous dimension ($d$), and the inverse persistence ($θ$). Our results indicate that larger forecast errors -- e.g., higher MSE -- tend to occur in states with higher $d$ (higher complexity) and higher $θ$ (lower persistence). To further assess dynamical consistency, we propose error metrics based on the dynamical indices that measure the discrepancy of the forecasted $d$ and $θ$ versus their correct values. Leveraging these dynamical indices-based metrics, we analyze direct and recursive forecasting strategies for three canonical datasets -- Lorenz, Kuramoto-Sivashinsky equation, and Kolmogorov flow -- as well as a real-world weather forecasting task. Our findings reveal substantial distortions in dynamical properties in ML forecasts, especially for long forecast lead times or long recursive simulations, providing complementary information on ML forecast fidelity that can be used to improve ML models.
LGMar 24, 2025
Unsupervised Detection of Fraudulent Transactions in E-commerce Using Contrastive LearningXuan Li, Yuting Peng, Xiaoxuan Sun et al.
With the rapid development of e-commerce, e-commerce platforms are facing an increasing number of fraud threats. Effectively identifying and preventing these fraudulent activities has become a critical research problem. Traditional fraud detection methods typically rely on supervised learning, which requires large amounts of labeled data. However, such data is often difficult to obtain, and the continuous evolution of fraudulent activities further reduces the adaptability and effectiveness of traditional methods. To address this issue, this study proposes an unsupervised e-commerce fraud detection algorithm based on SimCLR. The algorithm leverages the contrastive learning framework to effectively detect fraud by learning the underlying representations of transaction data in an unlabeled setting. Experimental results on the eBay platform dataset show that the proposed algorithm outperforms traditional unsupervised methods such as K-means, Isolation Forest, and Autoencoders in terms of accuracy, precision, recall, and F1 score, demonstrating strong fraud detection capabilities. The results confirm that the SimCLR-based unsupervised fraud detection method has broad application prospects in e-commerce platform security, improving both detection accuracy and robustness. In the future, with the increasing scale and diversity of datasets, the model's performance will continue to improve, and it could be integrated with real-time monitoring systems to provide more efficient security for e-commerce platforms.
LGDec 21, 2024
Breaking the Context Bottleneck on Long Time Series ForecastingChao Ma, Yikai Hou, Xiang Li et al.
Long-term time-series forecasting is essential for planning and decision-making in economics, energy, and transportation, where long foresight is required. To obtain such long foresight, models must be both efficient and effective in processing long sequence. Recent advancements have enhanced the efficiency of these models; however, the challenge of effectively leveraging longer sequences persists. This is primarily due to the tendency of these models to overfit when presented with extended inputs, necessitating the use of shorter input lengths to maintain tolerable error margins. In this work, we investigate the multiscale modeling method and propose the Logsparse Decomposable Multiscaling (LDM) framework for the efficient and effective processing of long sequences. We demonstrate that by decoupling patterns at different scales in time series, we can enhance predictability by reducing non-stationarity, improve efficiency through a compact long input representation, and simplify the architecture by providing clear task assignments. Experimental results demonstrate that LDM not only outperforms all baselines in long-term forecasting benchmarks, but also reducing both training time and memory costs.
CVDec 6, 2024
Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language ModelsZehao Wang, Xinpeng Liu, Xiaoqian Wu et al.
Multimodal Large Language Models (MLLMs) have garnered significant attention recently and demonstrate outstanding capabilities in various tasks such as OCR, VQA, captioning, $\textit{etc}$. However, hallucination remains a persistent issue. While numerous methods have been proposed to mitigate hallucinations, achieving notable improvements, these methods primarily focus on mitigating hallucinations about $\textbf{object/noun-related}$ concepts. Verb concepts, crucial for understanding human actions, have been largely overlooked. In this paper, to the best of our knowledge, we are the $\textbf{first}$ to investigate the $\textbf{verb hallucination}$ phenomenon of MLLMs from various perspectives. Our findings reveal that most state-of-the-art MLLMs suffer from severe verb hallucination. To assess the effectiveness of existing mitigation methods for object concept hallucination on verb hallucination, we evaluated these methods and found that they do not effectively address verb hallucination. To address this issue, we propose a novel rich verb knowledge-based tuning method to mitigate verb hallucination. The experiment results demonstrate that our method significantly reduces hallucinations related to verbs.
LGJun 16, 2024
Incorporating uncertainty quantification into travel mode choice modeling: a Bayesian neural network (BNN) approach and an uncertainty-guided active survey frameworkShuwen Zheng, Zhou Fang, Liang Zhao
Existing deep learning approaches for travel mode choice modeling fail to inform modelers about their prediction uncertainty. Even when facing scenarios that are out of the distribution of training data, which implies high prediction uncertainty, these approaches still provide deterministic answers, potentially leading to misguidance. To address this limitation, this study introduces the concept of uncertainty from the field of explainable artificial intelligence into travel mode choice modeling. We propose a Bayesian neural network-based travel mode prediction model (BTMP) that quantifies the uncertainty of travel mode predictions, enabling the model itself to "know" and "tell" what it doesn't know. With BTMP, we further propose an uncertainty-guided active survey framework, which dynamically formulates survey questions representing travel mode choice scenarios with high prediction uncertainty. Through iterative collection of responses to these dynamically tailored survey questions, BTMP is iteratively trained to achieve the desired accuracy faster with fewer questions, thereby reducing survey costs. Experimental validation using synthetic datasets confirms the effectiveness of BTMP in quantifying prediction uncertainty. Furthermore, experiments, utilizing both synthetic and real-world data, demonstrate that the BTMP model, trained with the uncertainty-guided active survey framework, requires 20% to 50% fewer survey responses to match the performance of the model trained on randomly collected survey data. Overall, the proposed BTMP model and active survey framework innovatively incorporate uncertainty quantification into travel mode choice modeling, providing model users with essential insights into prediction reliability while optimizing data collection for deep learning model training in a cost-efficient manner.
LGDec 1, 2021
Homotopy Based Reinforcement Learning with Maximum Entropy for Autonomous Air CombatYiwen Zhu, Zhou Fang, Yuan Zheng et al.
The Intelligent decision of the unmanned combat aerial vehicle (UCAV) has long been a challenging problem. The conventional search method can hardly satisfy the real-time demand during high dynamics air combat scenarios. The reinforcement learning (RL) method can significantly shorten the decision time via using neural networks. However, the sparse reward problem limits its convergence speed and the artificial prior experience reward can easily deviate its optimal convergent direction of the original task, which raises great difficulties for the RL air combat application. In this paper, we propose a homotopy-based soft actor-critic method (HSAC) which focuses on addressing these problems via following the homotopy path between the original task with sparse reward and the auxiliary task with artificial prior experience reward. The convergence and the feasibility of this method are also proved in this paper. To confirm our method feasibly, we construct a detailed 3D air combat simulation environment for the RL-based methods training firstly, and we implement our method in both the attack horizontal flight UCAV task and the self-play confrontation task. Experimental results show that our method performs better than the methods only utilizing the sparse reward or the artificial prior experience reward. The agent trained by our method can reach more than 98.3% win rate in the attack horizontal flight UCAV task and average 67.4% win rate when confronted with the agents trained by the other two methods.
MMAug 18, 2021
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight CuesJunjie H. Xu, Zhou Fang, Qihang Chen et al.
This paper presents a commentator for providing real-time game commentary in a fighting game. The commentary takes into account highlight cues, obtained by analyzing scenes during gameplay, as input to adjust the pitch and loudness of commentary to be spoken by using a Text-to-Speech (TTS) technology. We investigate different designs for pitch and loudness adjustment. The proposed AI consists of two parts: a dynamic adjuster for controlling pitch and loudness of the TTS and a real-time game commentary generator. We conduct a pilot study on a fighting game, and our result shows that by adjusting the loudness significantly according to the level of game highlight, the entertainment of the gameplay can be enhanced.
MMAug 18, 2021
Promoting Mental Well-Being for Audiences in a Live-Streaming Game by Highlight-Based Bullet CommentsJunjie H. Xu, Yulin Cai, Zhou Fang et al.
This paper proposes a method for generating bullet comments for live-streaming games based on highlights (i.e., the exciting parts of video clips) extracted from the game content and evaluate the effect of mental health promotion. Game live streaming is becoming a popular theme for academic research. Compared to traditional online video sharing platforms, such as Youtube and Vimeo, video live streaming platform has the benefits of communicating with other viewers in real-time. In sports broadcasting, the commentator plays an essential role as mood maker by making matches more exciting. The enjoyment emerged while watching game live streaming also benefits the audience's mental health. However, many e-sports live streaming channels do not have a commentator for entertaining viewers. Therefore, this paper presents a design of an AI commentator that can be embedded in live streaming games. To generate bullet comments for real-time game live streaming, the system employs highlight evaluation to detect the highlights, and generate the bullet comments. An experiment is conducted and the effectiveness of generated bullet comments in a live-streaming fighting game channel is evaluated.
CVOct 9, 2020
Incorporating planning intelligence into deep learning: A planning support tool for street network designZhou Fang, Ying Jin, Tianren Yang
Deep learning applications in shaping ad hoc planning proposals are limited by the difficulty in integrating professional knowledge about cities with artificial intelligence. We propose a novel, complementary use of deep neural networks and planning guidance to automate street network generation that can be context-aware, example-based and user-guided. The model tests suggest that the incorporation of planning knowledge (e.g., road junctions and neighborhood types) in the model training leads to a more realistic prediction of street configurations. Furthermore, the new tool provides both professional and lay users an opportunity to systematically and intuitively explore benchmark proposals for comparisons and further evaluations.
CVOct 9, 2020
DeepStreet: A deep learning powered urban street network generation moduleZhou Fang, Tianren Yang, Ying Jin
In countries experiencing unprecedented waves of urbanization, there is a need for rapid and high quality urban street design. Our study presents a novel deep learning powered approach, DeepStreet (DS), for automatic street network generation that can be applied to the urban street design with local characteristics. DS is driven by a Convolutional Neural Network (CNN) that enables the interpolation of streets based on the areas of immediate vicinity. Specifically, the CNN is firstly trained to detect, recognize and capture the local features as well as the patterns of the existing street network sourced from the OpenStreetMap. With the trained CNN, DS is able to predict street networks' future expansion patterns within the predefined region conditioned on its surrounding street networks. To test the performance of DS, we apply it to an area in and around the Eixample area in the City of Barcelona, a well known example in the fields of urban and transport planning with iconic grid like street networks in the centre and irregular road alignments farther afield. The results show that DS can (1) detect and self cluster different types of complex street patterns in Barcelona; (2) predict both gridiron and irregular street and road networks. DS proves to have a great potential as a novel tool for designers to efficiently design the urban street network that well maintains the consistency across the existing and newly generated urban street network. Furthermore, the generated networks can serve as a benchmark to guide the local plan-making especially in rapidly developing cities.
ROAug 3, 2020
GP-SLAM+: real-time 3D lidar SLAM based on improved regionalized Gaussian process map reconstructionJianyuan Ruan, Bo Li, Yinqiang Wang et al.
This paper presents a 3D lidar SLAM system based on improved regionalized Gaussian process (GP) map reconstruction to provide both low-drift state estimation and mapping in real-time for robotics applications. We utilize spatial GP regression to model the environment. This tool enables us to recover surfaces including those in sparsely scanned areas and obtain uniform samples with uncertainty. Those properties facilitate robust data association and map updating in our scan-to-map registration scheme, especially when working with sparse range data. Compared with previous GP-SLAM, this work overcomes the prohibitive computational complexity of GP and redesigns the registration strategy to meet the accuracy requirements in 3D scenarios. For large-scale tasks, a two-thread framework is employed to suppress the drift further. Aerial and ground-based experiments demonstrate that our method allows robust odometry and precise mapping in real-time. It also outperforms the state-of-the-art lidar SLAM systems in our tests with light-weight sensors.
HCNov 7, 2019
Towards An Angry-Birds-like Game System for Promoting Mental Well-being of Players Using Art-Therapy-embedded PCGZhou Fang, Pujana Paliyawan, Ruck Thawonmas et al.
This paper presents an integration of a game system and the art therapy concept for promoting the mental well-being of video game players. In the proposed game system, the player plays an Angry-Birds-like game in which levels in the game are generated based on images they draw. Upon finishing a game level, the player also receives positive feedback (praising words) toward their drawing and the generated level from an Art Therapy AI. The proposed system is composed of three major parts: (1) a drawing recognizer that identifies what object is drawn by the player (Sketcher), (2) a level generator that converts the drawing image into a pixel image, then a set of blocks representing a game level (PCG AI), and (3) the Art Therapy AI that encourages the player and improves their emotion. This paper describes an overview of the system and explains how its major components function.
MLMay 31, 2018
Efficacy of regularized multi-task learning based on SVM modelsShaohan Chen, Zhou Fang, Sijie Lu et al.
This paper investigates the efficacy of a regularized multi-task learning (MTL) framework based on SVM (M-SVM) to answer whether MTL always provides reliable results and how MTL outperforms independent learning. We first find that M-SVM is Bayes risk consistent in the limit of large sample size. This implies that despite the task dissimilarities, M-SVM always produces a reliable decision rule for each task in terms of misclassification error when the data size is large enough. Furthermore, we find that the task-interaction vanishes as the data size goes to infinity, and the convergence rates of M-SVM and its single-task counterpart have the same upper bound. The former suggests that M-SVM cannot improve the limit classifier's performance; based on the latter, we conjecture that the optimal convergence rate is not improved when the task number is fixed. As a novel insight of MTL, our theoretical and experimental results achieved an excellent agreement that the benefit of the MTL methods lies in the improvement of the pre-convergence-rate factor (PCR, to be denoted in Section III) rather than the convergence rate. Moreover, this improvement of PCR factors is more significant when the data size is small.