LGJul 5, 2023
Improving Automatic Parallel Training via Balanced Memory Workload OptimizationYujie Wang, Youhe Jiang, Xupeng Miao et al.
Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts to design distributed training plans or limit parallelism combinations to a constrained search space. In this paper, we present Galvatron-BMW, a novel system framework that integrates multiple prevalent parallelism dimensions and automatically identifies the most efficient hybrid parallelism strategy. To effectively navigate this vast search space, we employ a decision tree approach for decomposition and pruning based on intuitive insights. We further utilize a dynamic programming search algorithm to derive the optimal plan. Moreover, to improve resource utilization and enhance system efficiency, we propose a bi-objective optimization workflow that focuses on workload balance. Our evaluations on different Transformer models demonstrate the capabilities of Galvatron-BMW in automating distributed training under varying GPU memory constraints. Across all tested scenarios, Galvatron-BMW consistently achieves superior system throughput, surpassing previous approaches that rely on limited parallelism strategies.
LGFeb 7, 2023
DivBO: Diversity-aware CASH for Ensemble LearningYu Shen, Yupeng Lu, Yang Li et al.
The Combined Algorithm Selection and Hyperparameters optimization (CASH) problem is one of the fundamental problems in Automated Machine Learning (AutoML). Motivated by the success of ensemble learning, recent AutoML systems build post-hoc ensembles to output the final predictions instead of using the best single learner. However, while most CASH methods focus on searching for a single learner with the best performance, they neglect the diversity among base learners (i.e., they may suggest similar configurations to previously evaluated ones), which is also a crucial consideration when building an ensemble. To tackle this issue and further enhance the ensemble performance, we propose DivBO, a diversity-aware framework to inject explicit search of diversity into the CASH problems. In the framework, we propose to use a diversity surrogate to predict the pair-wise diversity of two unseen configurations. Furthermore, we introduce a temporary pool and a weighted acquisition function to guide the search of both performance and diversity based on Bayesian optimization. Empirical results on 15 public datasets show that DivBO achieves the best average ranks (1.82 and 1.73) on both validation and test errors among 10 compared methods, including post-hoc designs in recent AutoML systems and state-of-the-art baselines for ensemble learning on CASH problems.
IVApr 17, 2025Code
NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and ResultsXin Li, Kun Yuan, Bingchen Li et al.
This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating reliance on model ensembles, redundant weights, and other computationally expensive components in the previous IQA/VQA competitions. Track 2 introduces a new short-form UGC dataset tailored for single image super-resolution, i.e., the KwaiSR dataset. It consists of 1,800 synthetically generated S-UGC image pairs and 1,900 real-world S-UGC images, which are split into training, validation, and test sets using a ratio of 8:1:1. The primary objective of the challenge is to drive research that benefits the user experience of short-form UGC platforms such as Kwai and TikTok. This challenge attracted 266 participants and received 18 valid final submissions with corresponding fact sheets, significantly contributing to the progress of short-form UGC VQA and image superresolution. The project is publicly available at https://github.com/lixinustc/KVQE- ChallengeCVPR-NTIRE2025.
AISep 22, 2025Code
MontePrep: Monte-Carlo-Driven Automatic Data Preparation without Target Data InstancesCongcong Ge, Yachuan Liu, Yixuan Tang et al.
In commercial systems, a pervasive requirement for automatic data preparation (ADP) is to transfer relational data from disparate sources to targets with standardized schema specifications. Previous methods rely on labor-intensive supervision signals or target table data access permissions, limiting their usage in real-world scenarios. To tackle these challenges, we propose an effective end-to-end ADP framework MontePrep, which enables training-free pipeline synthesis with zero target-instance requirements. MontePrep is formulated as an open-source large language model (LLM) powered tree-structured search problem. It consists of three pivot components, i.e., a data preparation action sandbox (DPAS), a fundamental pipeline generator (FPG), and an execution-aware pipeline optimizer (EPO). We first introduce DPAS, a lightweight action sandbox, to navigate the search-based pipeline generation. The design of DPAS circumvents exploration of infeasible pipelines. Then, we present FPG to build executable DP pipelines incrementally, which explores the predefined action sandbox by the LLM-powered Monte Carlo Tree Search. Furthermore, we propose EPO, which invokes pipeline execution results from sources to targets to evaluate the reliability of the generated pipelines in FPG. In this way, unreasonable pipelines are eliminated, thus facilitating the search process from both efficiency and effectiveness perspectives. Extensive experimental results demonstrate the superiority of MontePrep with significant improvement against five state-of-the-art competitors.
DBApr 27
BoomHQ: Learning to Boost Multiple Hybrid Queries on Vector DBMSsErmu Qiu, Tianyi Chen, Jun Gao et al.
Hybrid queries, which combine vector nearest neighbor searches with scalar predicates, represent a fundamental challenge in managing vector databases. Existing methods often restrict the number of vector columns involved or the complexity of scalar predicates, thereby limiting their flexibility in handling diverse query patterns. Moreover, these approaches typically do not fully leverage the correlations between scalar and vector attributes, or the distributional patterns observed from query vector neighborhoods. To address these limitations, we introduce BoomHQ, a learning-based framework to boost multiple hybrid queries on vector DBMSs. First, BoomHQ models the correlation between vector and scalar attributes using an autoencoder-based architecture, which is also friendly to data updates. Second, BoomHQ captures prevailing query patterns, particularly using estimated selectivity of scalar predicates within the neighborhood of a query vector. Guided by these two key features, BoomHQ predicts the execution hints and rewrites the original query into an optimized version. Furthermore, we extend well-known benchmarks by introducing vector and scalar data with inherent correlations to better evaluate query execution. Experimental results demonstrate that for multiple hybrid queries at specified recall thresholds, our method achieves a 2x average and over 25x peak speedup compared to the state-of-the-art. Additionally, BoomHQ shows strong robustness against data updates and consistent optimization effectiveness across three representative vector database systems.