Tao Jiang

h-index24

9papers

1,278citations

Novelty27%

AI Score27

Ranked #156,914 of 194,257 authors (top 81%)#50,918 in CV (top 86%)

9 Papers

64.6AIJan 22, 2025

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Kimi Team, Angang Du, Bofei Gao et al. · pku, tsinghua

Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of artificial intelligence, with the promise that large language models (LLMs) can scale their training data by learning to explore with rewards. However, prior published work has not produced competitive results. In light of this, we report on the training practice of Kimi k1.5, our latest multi-modal LLM trained with RL, including its RL training techniques, multi-modal data recipes, and infrastructure optimization. Long context scaling and improved policy optimization methods are key ingredients of our approach, which establishes a simplistic, effective RL framework without relying on more complex techniques such as Monte Carlo tree search, value functions, and process reward models. Notably, our system achieves state-of-the-art reasoning performance across multiple benchmarks and modalities -- e.g., 77.5 on AIME, 96.2 on MATH 500, 94-th percentile on Codeforces, 74.9 on MathVista -- matching OpenAI's o1. Moreover, we present effective long2short methods that use long-CoT techniques to improve short-CoT models, yielding state-of-the-art short-CoT reasoning results -- e.g., 60.8 on AIME, 94.6 on MATH500, 47.3 on LiveCodeBench -- outperforming existing short-CoT models such as GPT-4o and Claude Sonnet 3.5 by a large margin (up to +550%).

4.3MAMay 7, 2023

Robust Multi-agent Communication via Multi-view Message Certification

Lei Yuan, Tao Jiang, Lihe Li et al.

Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment. Major relevant works tackle this issue under specific assumptions, like a limited number of message channels would sustain perturbations, limiting the efficiency in complex scenarios. In this paper, we take a further step addressing this issue by learning a robust multi-agent communication policy via multi-view message certification, dubbed CroMAC. Agents trained under CroMAC can obtain guaranteed lower bounds on state-action values to identify and choose the optimal action under a worst-case deviation when the received messages are perturbed. Concretely, we first model multi-agent communication as a multi-view problem, where every message stands for a view of the state. Then we extract a certificated joint message representation by a multi-view variational autoencoder (MVAE) that uses a product-of-experts inference network. For the optimization phase, we do perturbations in the latent space of the state for a certificate guarantee. Then the learned joint message representation is used to approximate the certificated state representation during training. Extensive experiments in several cooperative multi-agent benchmarks validate the effectiveness of the proposed CroMAC.

3.7CVOct 11, 2021

EMDS-7: Environmental Microorganism Image Dataset Seventh Version for Multiple Object Detection Evaluation

Hechen Yang, Chen Li, Xin Zhao et al.

The Environmental Microorganism Image Dataset Seventh Version (EMDS-7) is a microscopic image data set, including the original Environmental Microorganism images (EMs) and the corresponding object labeling files in ".XML" format file. The EMDS-7 data set consists of 41 types of EMs, which has a total of 2365 images and 13216 labeled objects. The EMDS-7 database mainly focuses on the object detection. In order to prove the effectiveness of EMDS-7, we select the most commonly used deep learning methods (Faster-RCNN, YOLOv3, YOLOv4, SSD and RetinaNet) and evaluation indices for testing and evaluation. EMDS-7 is freely published for non-commercial purpose at: https://figshare.com/articles/dataset/EMDS-7_DataSet/16869571

1.4CVJun 22, 2021

A Comparison for Patch-level Classification of Deep Learning Methods on Transparent Environmental Microorganism Images: from Convolutional Neural Networks to Visual Transformers

Hechen Yang, Chen Li, Jinghua Zhang et al.

Nowadays, analysis of Transparent Environmental Microorganism Images (T-EM images) in the field of computer vision has gradually become a new and interesting spot. This paper compares different deep learning classification performance for the problem that T-EM images are challenging to analyze. We crop the T-EM images into 8 * 8 and 224 * 224 pixel patches in the same proportion and then divide the two different pixel patches into foreground and background according to ground truth. We also use four convolutional neural networks and a novel ViT network model to compare the foreground and background classification experiments. We conclude that ViT performs the worst in classifying 8 * 8 pixel patches, but it outperforms most convolutional neural networks in classifying 224 * 224 pixel patches.

6.5CVFeb 20, 2021

EMDS-5: Environmental Microorganism Image Dataset Fifth Version for Multiple Image Analysis Tasks

Zihan Li, Chen Li, Yudong Yao et al.

Environmental Microorganism Data Set Fifth Version (EMDS-5) is a microscopic image dataset including original Environmental Microorganism (EM) images and two sets of Ground Truth (GT) images. The GT image sets include a single-object GT image set and a multi-object GT image set. The EMDS-5 dataset has 21 types of EMs, each of which contains 20 original EM images, 20 single-object GT images and 20 multi-object GT images. EMDS-5 can realize to evaluate image preprocessing, image segmentation, feature extraction, image classification and image retrieval functions. In order to prove the effectiveness of EMDS-5, for each function, we select the most representative algorithms and price indicators for testing and evaluation. The image preprocessing functions contain two parts: image denoising and image edge detection. Image denoising uses nine kinds of filters to denoise 13 kinds of noises, respectively. In the aspect of edge detection, six edge detection operators are used to detect the edges of the images, and two evaluation indicators, peak-signal to noise ratio and mean structural similarity, are used for evaluation. Image segmentation includes single-object image segmentation and multi-object image segmentation. Six methods are used for single-object image segmentation, while k-means and U-net are used for multi-object segmentation.We extract nine features from the images in EMDS-5 and use the Support Vector Machine classifier for testing. In terms of image classification, we select the VGG16 feature to test different classifiers. We test two types of retrieval approaches: texture feature retrieval and deep learning feature retrieval. We select the last layer of features of these two deep learning networks as feature vectors. We use mean average precision as the evaluation index for retrieval.

17.8IVMar 27, 2020

A Comprehensive Review for Breast Histopathology Image Analysis Using Classical and Deep Neural Networks

Xiaomin Zhou, Chen Li, Md Mamunur Rahaman et al.

Breast cancer is one of the most common and deadliest cancers among women. Since histopathological images contain sufficient phenotypic information, they play an indispensable role in the diagnosis and treatment of breast cancers. To improve the accuracy and objectivity of Breast Histopathological Image Analysis (BHIA), Artificial Neural Network (ANN) approaches are widely used in the segmentation and classification tasks of breast histopathological images. In this review, we present a comprehensive overview of the BHIA techniques based on ANNs. First of all, we categorize the BHIA systems into classical and deep neural networks for in-depth investigation. Then, the relevant studies based on BHIA systems are presented. After that, we analyze the existing models to discover the most suitable algorithms. Finally, publicly accessible datasets, along with their download links, are provided for the convenience of future researchers.

4.2CVMar 8, 2020

A Multi-scale CNN-CRF Framework for Environmental Microorganism Image Segmentation

Jinghua Zhang, Chen Li, Frank Kulwa et al.

To assist researchers to identify Environmental Microorganisms (EMs) effectively, a Multiscale CNN-CRF (MSCC) framework for the EM image segmentation is proposed in this paper. There are two parts in this framework: The first is a novel pixel-level segmentation approach, using a newly introduced Convolutional Neural Network (CNN), namely, "mU-Net-B3", with a dense Conditional Random Field (CRF) postprocessing. The second is a VGG-16 based patch-level segmentation method with a novel "buffer" strategy, which further improves the segmentation quality of the details of the EMs. In the experiment, compared with the state-of-the-art methods on 420 EM images, the proposed MSCC method reduces the memory requirement from 355 MB to 103 MB, improves the overall evaluation indexes (Dice, Jaccard, Recall, Accuracy) from 85.24%, 77.42%, 82.27%, and 96.76% to 87.13%, 79.74%, 87.12%, and 96.91%, respectively, and reduces the volume overlap error from 22.58% to 20.26%. Therefore, the MSCC method shows great potential in the EM segmentation field.

9.1CVMar 3, 2020

Gastric histopathology image segmentation using a hierarchical conditional random field

Changhao Sun, Chen Li, Jinghua Zhang et al.

For the Convolutional Neural Networks (CNNs) applied in the intelligent diagnosis of gastric cancer, existing methods mostly focus on individual characteristics or network frameworks without a policy to depict the integral information. Mainly, Conditional Random Field (CRF), an efficient and stable algorithm for analyzing images containing complicated contents, can characterize spatial relation in images. In this paper, a novel Hierarchical Conditional Random Field (HCRF) based Gastric Histopathology Image Segmentation (GHIS) method is proposed, which can automatically localize abnormal (cancer) regions in gastric histopathology images obtained by an optical microscope to assist histopathologists in medical work. This HCRF model is built up with higher order potentials, including pixel-level and patch-level potentials, and graph-based post-processing is applied to further improve its segmentation performance. Especially, a CNN is trained to build up the pixel-level potentials and another three CNNs are fine-tuned to build up the patch-level potentials for sufficient spatial segmentation information. In the experiment, a hematoxylin and eosin (H&E) stained gastric histopathological dataset with 560 abnormal images are divided into training, validation and test sets with a ratio of 1 : 1 : 2. Finally, segmentation accuracy, recall and specificity of 78.91%, 65.59%, and 81.33% are achieved on the test set. Our HCRF model demonstrates high segmentation performance and shows its effectiveness and future potential in the GHIS field.

17.3SDNov 23, 2018

Training Multi-Task Adversarial Network for Extracting Noise-Robust Speaker Embedding

Jianfeng Zhou, Tao Jiang, Lin Li et al.

Under noisy environments, to achieve the robust performance of speaker recognition is still a challenging task. Motivated by the promising performance of multi-task training in a variety of image processing tasks, we explore the potential of multi-task adversarial training for learning a noise-robust speaker embedding. In this paper we present a novel framework which consists of three components: an encoder that extracts noise-robust speaker embedding; a classifier that classifies the speakers; a discriminator that discriminates the noise type of the speaker embedding. Besides, we propose a training strategy using the training accuracy as an indicator to stabilize the multi-class adversarial optimization process. We conduct our experiments on the English and Mandarin corpus and the experimental results demonstrate that our proposed multi-task adversarial training method could greatly outperform the other methods without adversarial training in noisy environments. Furthermore, experiments indicate that our method is also able to improve the speaker verification performance the clean condition.