Kai Yang

h-index5

3papers

41citations

Novelty43%

AI Score32

Ranked #127,558 of 194,257 authors (top 66%)#28,079 in LG (top 70%)

3 Papers

4.6LGJun 29, 2024Code

A Two-stage Reinforcement Learning-based Approach for Multi-entity Task Allocation

Aicheng Gong, Kai Yang, Jiafei Lyu et al.

Task allocation is a key combinatorial optimization problem, crucial for modern applications such as multi-robot cooperation and resource scheduling. Decision makers must allocate entities to tasks reasonably across different scenarios. However, traditional methods assume static attributes and numbers of tasks and entities, often relying on dynamic programming and heuristic algorithms for solutions. In reality, task allocation resembles Markov decision processes, with dynamically changing task and entity attributes. Thus, algorithms must dynamically allocate tasks based on their states. To address this issue, we propose a two-stage task allocation algorithm based on similarity, utilizing reinforcement learning to learn allocation strategies. The proposed pre-assign strategy allows entities to preselect appropriate tasks, effectively avoiding local optima and thereby better finding the optimal allocation. We also introduce an attention mechanism and a hyperparameter network structure to adapt to the changing number and attributes of entities and tasks, enabling our network structure to generalize to new tasks. Experimental results across multiple environments demonstrate that our algorithm effectively addresses the challenges of dynamic task allocation in practical applications. Compared to heuristic algorithms like genetic algorithms, our reinforcement learning approach better solves dynamic allocation problems and achieves zero-shot generalization to new tasks with good performance. The code is available at https://github.com/yk7333/TaskAllocation.

2.6LGJun 11, 2024

CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning

Zeyuan Liu, Kai Yang, Xiu Li

Distribution shift is a major obstacle in offline reinforcement learning, which necessitates minimizing the discrepancy between the learned policy and the behavior policy to avoid overestimating rare or unseen actions. Previous conservative offline RL algorithms struggle to generalize to unseen actions, despite their success in learning good in-distribution policy. In contrast, we propose to use the gradient fields of the dataset density generated from a pre-trained offline RL algorithm to adjust the original actions. We decouple the conservatism constraints from the policy, thus can benefit wide offline RL algorithms. As a consequence, we propose the Conservative Denoising Score-based Algorithm (CDSA) which utilizes the denoising score-based model to model the gradient of the dataset density, rather than the dataset density itself, and facilitates a more accurate and efficient method to adjust the action generated by the pre-trained policy in a deterministic and continuous MDP environment. In experiments, we show that our approach significantly improves the performance of baseline algorithms in D4RL datasets, and demonstrate the generalizability and plug-and-play capability of our model across different pre-trained offline RL policy in different tasks. We also validate that the agent exhibits greater risk aversion after employing our method while showcasing its ability to generalize effectively across diverse tasks.

1.5CLJul 17, 2019Code

Fake News Detection as Natural Language Inference

Kai-Chou Yang, Timothy Niven, Hung-Yu Kao

This report describes the entry by the Intelligent Knowledge Management (IKM) Lab in the WSDM 2019 Fake News Classification challenge. We treat the task as natural language inference (NLI). We individually train a number of the strongest NLI models as well as BERT. We ensemble these results and retrain with noisy labels in two stages. We analyze transitivity relations in the train and test sets and determine a set of test cases that can be reliably classified on this basis. The remainder of test cases are classified by our ensemble. Our entry achieves test set accuracy of 88.063% for 3rd place in the competition.