LGMar 17, 2023Code
Data-centric Artificial Intelligence: A SurveyDaochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai et al.
Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler of its great success is the availability of abundant and high-quality data for building machine learning models. Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of data-centric AI. The attention of researchers and practitioners has gradually shifted from advancing model design to enhancing the quality and quantity of the data. In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals (training data development, inference data development, and data maintenance) and the representative methods. We also organize the existing literature from automation and collaboration perspectives, discuss the challenges, and tabulate the benchmarks for various tasks. We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle. We hope it can help the readers efficiently grasp a broad picture of this field, and equip them with the techniques and further research ideas to systematically engineer data for building AI systems. A companion list of data-centric AI resources will be regularly updated on https://github.com/daochenzha/data-centric-AI
AIJan 12, 2023Code
Data-centric AI: Perspectives and ChallengesDaochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai et al.
The role of data in building AI systems has recently been significantly magnified by the emerging concept of data-centric AI (DCAI), which advocates a fundamental shift from model advancements to ensuring data quality and reliability. Although our community has continuously invested efforts into enhancing data in different aspects, they are often isolated initiatives on specific tasks. To facilitate the collective initiative in our community and push forward DCAI, we draw a big picture and bring together three general missions: training data development, inference data development, and data maintenance. We provide a top-level discussion on representative DCAI tasks and share perspectives. Finally, we list open challenges. More resources are summarized at https://github.com/daochenzha/data-centric-AI
IRSep 4, 2023Code
DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical ResearchYu-Neng Chuang, Guanchu Wang, Chia-Yuan Chang et al.
The exponential growth in scholarly publications necessitates advanced tools for efficient article retrieval, especially in interdisciplinary fields where diverse terminologies are used to describe similar research. Traditional keyword-based search engines often fall short in assisting users who may not be familiar with specific terminologies. To address this, we present a knowledge graph-based paper search engine for biomedical research to enhance the user experience in discovering relevant queries and articles. The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG. To reduce information overload, DiscoverPath presents users with a focused subgraph containing the queried entity and its neighboring nodes and incorporates a query recommendation system, enabling users to iteratively refine their queries. The system is equipped with an accessible Graphical User Interface that provides an intuitive visualization of the KG, query recommendations, and detailed article information, enabling efficient article retrieval, thus fostering interdisciplinary knowledge exploration. DiscoverPath is open-sourced at https://github.com/ynchuang/DiscoverPath.
LGOct 5, 2022Code
DreamShard: Generalizable Embedding Table Placement for Recommender SystemsDaochen Zha, Louis Feng, Qiaoyu Tan et al.
We study embedding table placement for distributed recommender systems, which aims to partition and place the tables on multiple hardware devices (e.g., GPUs) to balance the computation and communication costs. Although prior work has explored learning-based approaches for the device placement of computational graphs, embedding table placement remains to be a challenging problem because of 1) the operation fusion of embedding tables, and 2) the generalizability requirement on unseen placement tasks with different numbers of tables and/or devices. To this end, we present DreamShard, a reinforcement learning (RL) approach for embedding table placement. DreamShard achieves the reasoning of operation fusion and generalizability with 1) a cost network to directly predict the costs of the fused operation, and 2) a policy network that is efficiently trained on an estimated Markov decision process (MDP) without real GPU execution, where the states and the rewards are estimated with the cost network. Equipped with sum and max representation reductions, the two networks can directly generalize to any unseen tasks with different numbers of tables and/or devices without fine-tuning. Extensive experiments show that DreamShard substantially outperforms the existing human expert and RNN-based strategies with up to 19% speedup over the strongest baseline on large-scale synthetic tables and our production tables. The code is available at https://github.com/daochenzha/dreamshard
LGAug 26, 2022Code
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement LearningDaochen Zha, Kwei-Herng Lai, Qiaoyu Tan et al.
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. While numerous over-sampling algorithms have been proposed, they heavily rely on heuristics, which could be sub-optimal since we may need different sampling strategies for different datasets and base classifiers, and they cannot directly optimize the performance metric. Motivated by this, we investigate developing a learning-based over-sampling algorithm to optimize the classification performance, which is a challenging task because of the huge and hierarchical decision space. At the high level, we need to decide how many synthetic samples to generate. At the low level, we need to determine where the synthetic samples should be located, which depends on the high-level decision since the optimal locations of the samples may differ for different numbers of samples. To address the challenges, we propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions. Motivated by the success of SMOTE~\cite{chawla2002smote} and its extensions, we formulate the generation process as a Markov decision process (MDP) consisting of three levels of policies to generate synthetic samples within the SMOTE search space. Then we leverage deep hierarchical reinforcement learning to optimize the performance metric on the validation data. Extensive experiments on six real-world datasets demonstrate that AutoSMOTE significantly outperforms the state-of-the-art resampling algorithms. The code is at https://github.com/daochenzha/autosmote
LGApr 15, 2023
Context-aware Domain Adaptation for Time Series Anomaly DetectionKwei-Herng Lai, Lan Wang, Huiyuan Chen et al.
Time series anomaly detection is a challenging task with a wide range of real-world applications. Due to label sparsity, training a deep anomaly detector often relies on unsupervised approaches. Recent efforts have been devoted to time series domain adaptation to leverage knowledge from similar domains. However, existing solutions may suffer from negative knowledge transfer on anomalies due to their diversity and sparsity. Motivated by the empirical study of context alignment between two domains, we aim to transfer knowledge between two domains via adaptively sampling context information for two domains. This is challenging because it requires simultaneously modeling the complex in-domain temporal dependencies and cross-domain correlations while exploiting label information from the source domain. To this end, we propose a framework that combines context sampling and anomaly detection into a joint learning procedure. We formulate context sampling into the Markov decision process and exploit deep reinforcement learning to optimize the time series domain adaptation process via context sampling and design a tailored reward function to generate domain-invariant features that better align two domains for anomaly detection. Experiments on three public datasets show promise for knowledge transfer between two similar domains and two entirely different domains.
AINov 26, 2022
Mitigating Relational Bias on Knowledge GraphsYu-Neng Chuang, Kwei-Herng Lai, Ruixiang Tang et al.
Knowledge graph data are prevalent in real-world applications, and knowledge graph neural networks (KGNNs) are essential techniques for knowledge graph representation learning. Although KGNN effectively models the structural information from knowledge graphs, these frameworks amplify the underlying data bias that leads to discrimination towards certain groups or individuals in resulting applications. Additionally, as existing debiasing approaches mainly focus on the entity-wise bias, eliminating the multi-hop relational bias that pervasively exists in knowledge graphs remains an open question. However, it is very challenging to eliminate relational bias due to the sparsity of the paths that generate the bias and the non-linear proximity structure of knowledge graphs. To tackle the challenges, we propose Fair-KGNN, a KGNN framework that simultaneously alleviates multi-hop bias and preserves the proximity information of entity-to-relation in knowledge graphs. The proposed framework is generalizable to mitigate the relational bias for all types of KGNN. We develop two instances of Fair-KGNN incorporating with two state-of-the-art KGNN models, RGCN and CompGCN, to mitigate gender-occupation and nationality-salary bias. The experiments carried out on three benchmark knowledge graph datasets demonstrate that the Fair-KGNN can effectively mitigate unfair situations during representation learning while preserving the predictive performance of KGNN models.
LGOct 2, 2023
CODA: Temporal Domain Generalization via Concept Drift SimulatorChia-Yuan Chang, Yu-Neng Chuang, Zhimeng Jiang et al.
In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized prediction model architectures. To this end, there is an urgent demand for a model-agnostic temporal domain generalization approach that maintains generality across diverse data modalities and architectures. In this work, we aim to address the concept drift problem from a data-centric perspective to bypass considering the interaction between data and model. Developing such a framework presents non-trivial challenges: (i) existing generative models struggle to generate out-of-distribution future data, and (ii) precisely capturing the temporal trends of joint distribution along chronological source domains is computationally infeasible. To tackle the challenges, we propose the COncept Drift simulAtor (CODA) framework incorporating a predicted feature correlation matrix to simulate future data for model training. Specifically, CODA leverages feature correlations to represent data characteristics at specific time points, thereby circumventing the daunting computational costs. Experimental results demonstrate that using CODA-generated data as training input effectively achieves temporal domain generalization across different model architectures.
LGJul 9, 2023
Towards Assumption-free Bias MitigationChia-Yuan Chang, Yu-Neng Chuang, Kwei-Herng Lai et al.
Despite the impressive prediction ability, machine learning models show discrimination towards certain demographics and suffer from unfair prediction behaviors. To alleviate the discrimination, extensive studies focus on eliminating the unequal distribution of sensitive attributes via multiple approaches. However, due to privacy concerns, sensitive attributes are often either unavailable or missing in real-world scenarios. Therefore, several existing works alleviate the bias without sensitive attributes. Those studies face challenges, either in inaccurate predictions of sensitive attributes or the need to mitigate unequal distribution of manually defined non-sensitive attributes related to bias. The latter requires strong assumptions about the correlation between sensitive and non-sensitive attributes. As data distribution and task goals vary, the strong assumption on non-sensitive attributes may not be valid and require domain expertise. In this work, we propose an assumption-free framework to detect the related attributes automatically by modeling feature interaction for bias mitigation. The proposed framework aims to mitigate the unfair impact of identified biased feature interactions. Experimental results on four real-world datasets demonstrate that our proposed framework can significantly alleviate unfair prediction behaviors by considering biased feature interactions.
LGAug 28, 2023
Tackling Diverse Minorities in Imbalanced ClassificationKwei-Herng Lai, Daochen Zha, Huiyuan Chen et al.
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. When working with large datasets, the imbalanced issue can be further exacerbated, making it exceptionally difficult to train classifiers effectively. To address the problem, over-sampling techniques have been developed to linearly interpolating data instances between minorities and their neighbors. However, in many real-world scenarios such as anomaly detection, minority instances are often dispersed diversely in the feature space rather than clustered together. Inspired by domain-agnostic data mix-up, we propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. It is non-trivial to develop such a framework, the challenges include source sample selection, mix-up strategy selection, and the coordination between the underlying model and mix-up strategies. To tackle these challenges, we formulate the problem of iterative data mix-up as a Markov decision process (MDP) that maps data attributes onto an augmentation strategy. To solve the MDP, we employ an actor-critic framework to adapt the discrete-continuous decision space. This framework is utilized to train a data augmentation policy and design a reward signal that explores classifier uncertainty and encourages performance improvement, irrespective of the classifier's convergence. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets using three different types of classifiers. The results of these experiments showcase the potential and promise of our framework in addressing imbalanced datasets with diverse minorities.
92.9CLMay 14
GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party ConversationsJingbo Yang, Kwei-Herng Lai, Xiaowen Wang et al.
Large Language Model (LLM) agents increasingly serve as personal assistants and workplace collaborators, where their utility depends on memory systems that extract, retrieve, and apply information across long-running conversations. However, both existing memory systems and benchmarks are built around the dyadic, single-user setup, even though real deployments routinely span groups and channels with multiple users interacting with the agent and with each other. This mismatch leaves three properties of group memory unmeasured: (i) group dynamics that go beyond concatenated one-on-one chats, (ii) speaker-grounded belief tracking, where the per-user memory modeling is needed, and (iii) audience-adapted language, where Theory-of-Mind shifts produce role-specific vocabulary. We introduce GroupMemBench, a benchmark that exposes all three. A graph-grounded synthesis pipeline produces multi-party conversations with controllable reply structure and conditions each message on per-user personas and target audiences. An adversarial query pipeline then binds every question to a specific asker across six categories, spanning multi-hop reasoning, knowledge update, term ambiguity, user-implicit reasoning, temporal reasoning, and abstention, and iteratively searches challenging, realistic queries that reflect comprehensive memory capability. Benchmarking leading memory systems exposes a sharp collapse: the strongest one reaches only 46.0% average accuracy, with knowledge update at 27.1% and term ambiguity at 37.7%, while a simple BM25 baseline matches or exceeds most agent memory systems. This indicates current memory ingestion erases the structural and lexical features group memory depends on, leaving multi-user memory far from solved.
AIOct 7, 2025Code
Training-Free Time Series Classification via In-Context Reasoning with LLM AgentsSongyuan Sui, Zihang Xu, Yu-Neng Chuang et al.
Time series classification (TSC) spans diverse application scenarios, yet labeled data are often scarce, making task-specific training costly and inflexible. Recent reasoning-oriented large language models (LLMs) show promise in understanding temporal patterns, but purely zero-shot usage remains suboptimal. We propose FETA, a multi-agent framework for training-free TSC via exemplar-based in-context reasoning. FETA decomposes a multivariate series into channel-wise subproblems, retrieves a few structurally similar labeled examples for each channel, and leverages a reasoning LLM to compare the query against these exemplars, producing channel-level labels with self-assessed confidences; a confidence-weighted aggregator then fuses all channel decisions. This design eliminates the need for pretraining or fine-tuning, improves efficiency by pruning irrelevant channels and controlling input length, and enhances interpretability through exemplar grounding and confidence estimation. On nine challenging UEA datasets, FETA achieves strong accuracy under a fully training-free setting, surpassing multiple trained baselines. These results demonstrate that a multi-agent in-context reasoning framework can transform LLMs into competitive, plug-and-play TSC solvers without any parameter training. The code is available at https://github.com/SongyuanSui/FETATSC.
LGJan 5, 2022Code
Towards Similarity-Aware Time-Series ClassificationDaochen Zha, Kwei-Herng Lai, Kaixiong Zhou et al.
We study time-series classification (TSC), a fundamental task of time-series data mining. Prior work has approached TSC from two major directions: (1) similarity-based methods that classify time-series based on the nearest neighbors, and (2) deep learning models that directly learn the representations for classification in a data-driven manner. Motivated by the different working mechanisms within these two research lines, we aim to connect them in such a way as to jointly model time-series similarities and learn the representations. This is a challenging task because it is unclear how we should efficiently leverage similarity information. To tackle the challenge, we propose Similarity-Aware Time-Series Classification (SimTSC), a conceptually simple and general framework that models similarity information with graph neural networks (GNNs). Specifically, we formulate TSC as a node classification problem in graphs, where the nodes correspond to time-series, and the links correspond to pair-wise similarities. We further design a graph construction strategy and a batch training algorithm with negative sampling to improve training efficiency. We instantiate SimTSC with ResNet as the backbone and Dynamic Time Warping (DTW) as the similarity measure. Extensive experiments on the full UCR datasets and several multivariate datasets demonstrate the effectiveness of incorporating similarity information into deep learning models in both supervised and semi-supervised settings. Our code is available at https://github.com/daochenzha/SimTSC
CVAug 9, 2021Code
AutoVideo: An Automated Video Action Recognition SystemDaochen Zha, Zaid Pervaiz Bhat, Yi-Wei Chen et al.
Action recognition is an important task for video understanding with broad applications. However, developing an effective action recognition solution often requires extensive engineering efforts in building and testing different combinations of the modules and their hyperparameters. In this demo, we present AutoVideo, a Python system for automated video action recognition. AutoVideo is featured for 1) highly modular and extendable infrastructure following the standard pipeline language, 2) an exhaustive list of primitives for pipeline construction, 3) data-driven tuners to save the efforts of pipeline tuning, and 4) easy-to-use Graphical User Interface (GUI). AutoVideo is released under MIT license at https://github.com/datamllab/autovideo
LGJun 10, 2021Code
Simplifying Deep Reinforcement Learning via Self-SupervisionDaochen Zha, Kwei-Herng Lai, Kaixiong Zhou et al.
Supervised regression to demonstrations has been demonstrated to be a stable way to train deep policy networks. We are motivated to study how we can take full advantage of supervised loss functions for stably training deep reinforcement learning agents. This is a challenging task because it is unclear how the training data could be collected to enable policy improvement. In this work, we propose Self-Supervised Reinforcement Learning (SSRL), a simple algorithm that optimizes policies with purely supervised losses. We demonstrate that, without policy gradient or value estimation, an iterative procedure of ``labeling" data and supervised regression is sufficient to drive stable policy improvement. By selecting and imitating trajectories with high episodic rewards, SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time, showing the potential of solving reinforcement learning with supervised learning techniques. The code is available at https://github.com/daochenzha/SSRL
DBSep 18, 2020Code
TODS: An Automated Time Series Outlier Detection SystemKwei-Herng Lai, Daochen Zha, Guanchu Wang et al.
We present TODS, an automated Time Series Outlier Detection System for research and industrial applications. TODS is a highly modular system that supports easy pipeline construction. The basic building block of TODS is primitive, which is an implementation of a function with hyperparameters. TODS currently supports 70 primitives, including data processing, time series processing, feature analysis, detection algorithms, and a reinforcement module. Users can freely construct a pipeline using these primitives and perform end- to-end outlier detection with the constructed pipeline. TODS provides a Graphical User Interface (GUI), where users can flexibly design a pipeline with drag-and-drop. Moreover, a data-driven searcher is provided to automatically discover the most suitable pipelines given a dataset. TODS is released under Apache 2.0 license at https://github.com/datamllab/tods.
AIOct 10, 2019Code
RLCard: A Toolkit for Reinforcement Learning in Card GamesDaochen Zha, Kwei-Herng Lai, Yuanpu Cao et al.
RLCard is an open-source toolkit for reinforcement learning research in card games. It supports various card environments with easy-to-use interfaces, including Blackjack, Leduc Hold'em, Texas Hold'em, UNO, Dou Dizhu and Mahjong. The goal of RLCard is to bridge reinforcement learning and imperfect information games, and push forward the research of reinforcement learning in domains with multiple agents, large state and action space, and sparse reward. In this paper, we provide an overview of the key components in RLCard, a discussion of the design principles, a brief introduction of the interfaces, and comprehensive evaluations of the environments. The codes and documents are available at https://github.com/datamllab/rlcard
CVMar 17, 2025
LangDA: Building Context-Awareness via Language for Domain Adaptive Semantic SegmentationChang Liu, Bavesh Balaji, Saad Hossain et al.
Unsupervised domain adaptation for semantic segmentation (DASS) aims to transfer knowledge from a label-rich source domain to a target domain with no labels. Two key approaches in DASS are (1) vision-only approaches using masking or multi-resolution crops, and (2) language-based approaches that use generic class-wise prompts informed by target domain (e.g. "a {snowy} photo of a {class}"). However, the former is susceptible to noisy pseudo-labels that are biased to the source domain. The latter does not fully capture the intricate spatial relationships of objects -- key for dense prediction tasks. To this end, we propose LangDA. LangDA addresses these challenges by, first, learning contextual relationships between objects via VLM-generated scene descriptions (e.g. "a pedestrian is on the sidewalk, and the street is lined with buildings."). Second, LangDA aligns the entire image features with text representation of this context-aware scene caption and learns generalized representations via text. With this, LangDA sets the new state-of-the-art across three DASS benchmarks, outperforming existing methods by 2.6%, 1.4% and 3.9%.
LGJun 20, 2024
LTSM-Bundle: A Toolbox and Benchmark on Large Language Models for Time Series ForecastingYu-Neng Chuang, Songchen Li, Jiayi Yuan et al.
Time Series Forecasting (TSF) has long been a challenge in time series analysis. Inspired by the success of Large Language Models (LLMs), researchers are now developing Large Time Series Models (LTSMs)-universal transformer-based models that use autoregressive prediction-to improve TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datasets. Recent endeavors have studied and evaluated various design choices aimed at enhancing LTSM training and generalization capabilities. However, these design choices are typically studied and evaluated in isolation and are not benchmarked collectively. In this work, we introduce LTSM-Bundle, a comprehensive toolbox, and benchmark for training LTSMs, spanning pre-processing techniques, model configurations, and dataset configuration. It modularized and benchmarked LTSMs from multiple dimensions, encompassing prompting strategies, tokenization approaches, training paradigms, base model selection, data quantity, and dataset diversity. Furthermore, we combine the most effective design choices identified in our study. Empirical results demonstrate that this combination achieves superior zero-shot and few-shot performances compared to state-of-the-art LTSMs and traditional TSF methods on benchmark datasets.
LGSep 16, 2020
Meta-AAD: Active Anomaly Detection with Deep Reinforcement LearningDaochen Zha, Kwei-Herng Lai, Mingyang Wan et al.
High false-positive rate is a long-standing challenge for anomaly detection algorithms, especially in high-stake applications. To identify the true anomalies, in practice, analysts or domain experts will be employed to investigate the top instances one by one in a ranked list of anomalies identified by an anomaly detection system. This verification procedure generates informative labels that can be leveraged to re-rank the anomalies so as to help the analyst to discover more true anomalies given a time budget. Some re-ranking strategies have been proposed to approximate the above sequential decision process. Specifically, existing strategies have been focused on making the top instances more likely to be anomalous based on the feedback. Then they greedily select the top-1 instance for query. However, these greedy strategies could be sub-optimal since some low-ranked instances could be more helpful in the long-term. In this work, we propose Active Anomaly Detection with Meta-Policy (Meta-AAD), a novel framework that learns a meta-policy for query selection. Specifically, Meta-AAD leverages deep reinforcement learning to train the meta-policy to select the most proper instance to explicitly optimize the number of discovered anomalies throughout the querying process. Meta-AAD is easy to deploy since a trained meta-policy can be directly applied to any new datasets without further tuning. Extensive experiments on 24 benchmark datasets demonstrate that Meta-AAD significantly outperforms the state-of-the-art re-ranking strategies and the unsupervised baseline. The empirical analysis shows that the trained meta-policy is transferable and inherently achieves a balance between long-term and short-term rewards.
LGJun 26, 2020
Policy-GNN: Aggregation Optimization for Graph Neural NetworksKwei-Herng Lai, Daochen Zha, Kaixiong Zhou et al.
Graph data are pervasive in many real-world applications. Recently, increasing attention has been paid on graph neural networks (GNNs), which aim to model the local graph structures and capture the hierarchical patterns by aggregating the information from neighbors with stackable network modules. Motivated by the observation that different nodes often require different iterations of aggregation to fully capture the structural information, in this paper, we propose to explicitly sample diverse iterations of aggregation for different nodes to boost the performance of GNNs. It is a challenging task to develop an effective aggregation strategy for each node, given complex graphs and sparse features. Moreover, it is not straightforward to derive an efficient algorithm since we need to feed the sampled nodes into different number of network layers. To address the above challenges, we propose Policy-GNN, a meta-policy framework that models the sampling procedure and message passing of GNNs into a combined learning process. Specifically, Policy-GNN uses a meta-policy to adaptively determine the number of aggregations for each node. The meta-policy is trained with deep reinforcement learning (RL) by exploiting the feedback from the model. We further introduce parameter sharing and a buffer mechanism to boost the training efficiency. Experimental results on three real-world benchmark datasets suggest that Policy-GNN significantly outperforms the state-of-the-art alternatives, showing the promise in aggregation optimization for GNNs.
LGJun 7, 2020
Dual Policy DistillationKwei-Herng Lai, Daochen Zha, Yuening Li et al.
Policy distillation, which transfers a teacher policy to a student policy has achieved great success in challenging tasks of deep reinforcement learning. This teacher-student framework requires a well-trained teacher model which is computationally expensive. Moreover, the performance of the student model could be limited by the teacher model if the teacher model is not optimal. In the light of collaborative learning, we study the feasibility of involving joint intellectual efforts from diverse perspectives of student models. In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment and extract knowledge from each other to enhance their learning. The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms, since it is unclear whether the knowledge distilled from an imperfect and noisy peer learner would be helpful. To address the challenge, we theoretically justify that distilling knowledge from a peer learner will lead to policy improvement and propose a disadvantageous distillation strategy based on the theoretical results. The conducted experiments on several continuous control tasks show that the proposed framework achieves superior performance with a learning-based agent and function approximation without the use of expensive teacher models.
LGJun 19, 2019
Experience Replay OptimizationDaochen Zha, Kwei-Herng Lai, Kaixiong Zhou et al.
Experience replay enables reinforcement learning agents to memorize and reuse past experiences, just as humans replay memories for the situation at hand. Contemporary off-policy algorithms either replay past experiences uniformly or utilize a rule-based replay strategy, which may be sub-optimal. In this work, we consider learning a replay policy to optimize the cumulative reward. Replay learning is challenging because the replay memory is noisy and large, and the cumulative reward is unstable. To address these issues, we propose a novel experience replay optimization (ERO) framework which alternately updates two policies: the agent policy, and the replay policy. The agent is updated to maximize the cumulative reward based on the replayed data, while the replay policy is updated to provide the agent with the most useful experiences. The conducted experiments on various continuous control tasks demonstrate the effectiveness of ERO, empirically showing promise in experience replay learning to improve the performance of off-policy reinforcement learning algorithms.
IRAug 28, 2018
Superhighway: Bypass Data Sparsity in Cross-Domain CFKwei-Herng Lai, Ting-Hsiang Wang, Heng-Yu Chi et al.
Cross-domain collaborative filtering (CF) aims to alleviate data sparsity in single-domain CF by leveraging knowledge transferred from related domains. Many traditional methods focus on enriching compared neighborhood relations in CF directly to address the sparsity problem. In this paper, we propose superhighway construction, an alternative explicit relation-enrichment procedure, to improve recommendations by enhancing cross-domain connectivity. Specifically, assuming partially overlapped items (users), superhighway bypasses multi-hop inter-domain paths between cross-domain users (items, respectively) with direct paths to enrich the cross-domain connectivity. The experiments conducted on a real-world cross-region music dataset and a cross-platform movie dataset show that the proposed superhighway construction significantly improves recommendation performance in both target and source domains.
MMAug 28, 2018
Representation Learning for Image-based Music RecommendationChih-Chun Hsia, Kwei-Herng Lai, Yian Chen et al.
Image perception is one of the most direct ways to provide contextual information about a user concerning his/her surrounding environment; hence images are a suitable proxy for contextual recommendation. We propose a novel representation learning framework for image-based music recommendation that bridges the heterogeneity gap between music and image data; the proposed method is a key component for various contextual recommendation tasks. Preliminary experiments show that for an image-to-song retrieval task, the proposed method retrieves relevant or conceptually similar songs for input images.