Jiahui Dai

AI
h-index25
7papers
334citations
Novelty35%
AI Score28

7 Papers

CLApr 10, 2025
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

ByteDance Seed, Jiaze Chen, Tiantian Fan et al. · bytedance

We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed1.5-Thinking is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research. Model trial link: https://www.volcengine.com/experience/ark.

LGMay 7, 2020
Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices

Chunjie Luo, Xiwen He, Jianfeng Zhan et al.

Due to increasing amounts of data and compute resources, deep learning achieves many successes in various domains. The application of deep learning on the mobile and embedded devices is taken more and more attentions, benchmarking and ranking the AI abilities of mobile and embedded devices becomes an urgent problem to be solved. Considering the model diversity and framework diversity, we propose a benchmark suite, AIoTBench, which focuses on the evaluation of the inference abilities of mobile and embedded devices. AIoTBench covers three typical heavy-weight networks: ResNet50, InceptionV3, DenseNet121, as well as three light-weight networks: SqueezeNet, MobileNetV2, MnasNet. Each network is implemented by three frameworks which are designed for mobile and embedded devices: Tensorflow Lite, Caffe2, Pytorch Mobile. To compare and rank the AI capabilities of the devices, we propose two unified metrics as the AI scores: Valid Images Per Second (VIPS) and Valid FLOPs Per Second (VOPS). Currently, we have compared and ranked 5 mobile devices using our benchmark. This list will be extended and updated soon after.

AIApr 30, 2020
AIBench Training: Balanced Industry-Standard AI Training Benchmarking

Fei Tang, Wanling Gao, Jianfeng Zhan et al.

Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks. Only using a few AI component benchmarks like MLPerfalone in the other stages may lead to misleading conclusions. Moreover, the learning dynamics are not well understood, and the benchmarks' shelf-life is short. This paper proposes a balanced benchmarking methodology. We use real-world benchmarks to cover the factors space that impacts the learning dynamics to the most considerable extent. After performing an exhaustive survey on Internet service AI domains, we identify and implement nineteen representative AI tasks with state-of-the-art models. For repeatable performance ranking (RPR subset) and workload characterization (WC subset), we keep two subsets to a minimum for affordability. We contribute by far the most comprehensive AI training benchmark suite. The evaluations show: (1) AIBench Training (v1.1) outperforms MLPerfTraining (v0.7) in terms of diversity and representativeness of model complexity, computational cost, convergent rate, computation, and memory access patterns, and hotspot functions; (2) Against the AIBench full benchmarks, its RPR subset shortens the benchmarking cost by 64%, while maintaining the primary workload characteristics; (3) The performance ranking shows the single-purpose AI accelerator like TPU with the optimized TensorFlowframework performs better than that of GPUs while losing the latter's general support for various AI models. The specification, source code, and performance numbers are available from the AIBench homepage https://www.benchcouncil.org/aibench-training/index.html.

PFFeb 17, 2020
AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark Suite

Wanling Gao, Fei Tang, Jianfeng Zhan et al.

Domain-specific software and hardware co-design is encouraging as it is much easier to achieve efficiency for fewer tasks. Agile domain-specific benchmarking speeds up the process as it provides not only relevant design inputs but also relevant metrics, and tools. Unfortunately, modern workloads like Big data, AI, and Internet services dwarf the traditional one in terms of code size, deployment scale, and execution path, and hence raise serious benchmarking challenges. This paper proposes an agile domain-specific benchmarking methodology. Together with seventeen industry partners, we identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks. We propose the permutations of essential AI and non-AI component benchmarks as end-to-end benchmarks. An end-to-end benchmark is a distillation of the essential attributes of an industry-scale application. We design and implement a highly extensible, configurable, and flexible benchmark framework, on the basis of which, we propose the guideline for building end-to-end benchmarks, and present the first end-to-end Internet service AI benchmark. The preliminary evaluation shows the value of our benchmark suite---AIBench against MLPerf and TailBench for hardware and software designers, micro-architectural researchers, and code developers. The specifications, source code, testbed, and results are publicly available from the web site \url{http://www.benchcouncil.org/AIBench/index.html}.

CVAug 13, 2019
AIBench: An Industry Standard Internet Service AI Benchmark Suite

Wanling Gao, Fei Tang, Lei Wang et al.

Today's Internet Services are undergoing fundamental changes and shifting to an intelligent computing era where AI is widely employed to augment services. In this context, many innovative AI algorithms, systems, and architectures are proposed, and thus the importance of benchmarking and evaluating them rises. However, modern Internet services adopt a microservice-based architecture and consist of various modules. The diversity of these modules and complexity of execution paths, the massive scale and complex hierarchy of datacenter infrastructure, the confidential issues of data sets and workloads pose great challenges to benchmarking. In this paper, we present the first industry-standard Internet service AI benchmark suite---AIBench with seventeen industry partners, including several top Internet service providers. AIBench provides a highly extensible, configurable, and flexible benchmark framework that contains loosely coupled modules. We identify sixteen prominent AI problem domains like learning to rank, each of which forms an AI component benchmark, from three most important Internet service domains: search engine, social network, and e-commerce, which is by far the most comprehensive AI benchmarking effort. On the basis of the AIBench framework, abstracting the real-world data sets and workloads from one of the top e-commerce providers, we design and implement the first end-to-end Internet service AI benchmark, which contains the primary modules in the critical paths of an industry scale application and is scalable to deploy on different cluster scales. The specifications, source code, and performance numbers are publicly available from the benchmark council web site http://www.benchcouncil.org/AIBench/index.html.

SPMar 13, 2019
Signal Demodulation with Machine Learning Methods for Physical Layer Visible Light Communications: Prototype Platform, Open Dataset and Algorithms

Shuai Ma, Jiahui Dai, Songtao Lu et al.

In this paper, we investigate the design and implementation of machine learning (ML) based demodulation methods in the physical layer of visible light communication (VLC) systems. We build a flexible hardware prototype of an end-to-end VLC system, from which the received signals are collected as the real data. The dataset is available online, which contains eight types of modulated signals. Then, we propose three ML demodulators based on convolutional neural network (CNN), deep belief network (DBN), and adaptive boosting (AdaBoost), respectively. Specifically, the CNN based demodulator converts the modulated signals to images and recognizes the signals by the image classification. The proposed DBN based demodulator contains three restricted Boltzmann machines (RBMs) to extract the modulation features. The AdaBoost method includes a strong classifier that is constructed by the weak classifiers with the k-nearest neighbor (KNN) algorithm. These three demodulators are trained and tested by our online open dataset. Experimental results show that the demodulation accuracy of the three data-driven demodulators drops as the transmission distance increases. A higher modulation order negatively influences the accuracy for a given transmission distance. Among the three ML methods, the AdaBoost modulator achieves the best performance.

DCFeb 23, 2018
BigDataBench: A Scalable and Unified Big Data and AI Benchmark Suite

Wanling Gao, Jianfeng Zhan, Lei Wang et al.

Several fundamental changes in technology indicate domain-specific hardware and software co-design is the only path left. In this context, architecture, system, data management, and machine learning communities pay greater attention to innovative big data and AI algorithms, architecture, and systems. Unfortunately, complexity, diversity, frequently-changed workloads, and rapid evolution of big data and AI systems raise great challenges. First, the traditional benchmarking methodology that creates a new benchmark or proxy for every possible workload is not scalable, or even impossible for Big Data and AI benchmarking. Second, it is prohibitively expensive to tailor the architecture to characteristics of one or more application or even a domain of applications. We consider each big data and AI workload as a pipeline of one or more classes of units of computation performed on different initial or intermediate data inputs, each class of which we call a data motif. On the basis of our previous work that identifies eight data motifs taking up most of the run time of a wide variety of big data and AI workloads, we propose a scalable benchmarking methodology that uses the combination of one or more data motifs---to represent diversity of big data and AI workloads. Following this methodology, we present a unified big data and AI benchmark suite---BigDataBench 4.0, publicly available from~\url{http://prof.ict.ac.cn/BigDataBench}. This unified benchmark suite sheds new light on domain-specific hardware and software co-design: tailoring the system and architecture to characteristics of the unified eight data motifs other than one or more application case by case. Also, for the first time, we comprehensively characterize the CPU pipeline efficiency using the benchmarks of seven workload types in BigDataBench 4.0.