Bo Bai

h-index34

11papers

347citations

Novelty46%

AI Score48

Ranked #26,696 of 194,257 authors (top 14%)#281 in IR (top 13%)

11 Papers

20.9IVSep 27, 2023Code

High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models

Selim F. Yilmaz, Xueyan Niu, Bo Bai et al.

We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver. Specifically, we are interested in the perception-distortion trade-off in the practical finite block length regime, in which separate source and channel coding can be highly suboptimal. We introduce a novel scheme, where the conventional DeepJSCC encoder targets transmitting a lower resolution version of the image, which later can be refined thanks to the generative model available at the receiver. In particular, we utilize the range-null space decomposition of the target image; DeepJSCC transmits the range-space of the image, while DDPM progressively refines its null space contents. Through extensive experiments, we demonstrate significant improvements in distortion and perceptual quality of reconstructed images compared to standard DeepJSCC and the state-of-the-art generative learning-based method.

6.4LGSep 1, 2024Code

Towards Faster Graph Partitioning via Pre-training and Inductive Inference

Meng Qin, Chaorui Zhang, Yu Gao et al.

Graph partitioning (GP) is a classic problem that divides the node set of a graph into densely-connected blocks. Following the IEEE HPEC Graph Challenge and recent advances in pre-training techniques (e.g., large-language models), we propose PR-GPT (Pre-trained & Refined Graph ParTitioning) based on a novel pre-training & refinement paradigm. We first conduct the offline pre-training of a deep graph learning (DGL) model on small synthetic graphs with various topology properties. By using the inductive inference of DGL, one can directly generalize the pre-trained model (with frozen model parameters) to large graphs and derive feasible GP results. We also use the derived partition as a good initialization of an efficient GP method (e.g., InfoMap) to further refine the quality of partitioning. In this setting, the online generalization and refinement of PR-GPT can not only benefit from the transfer ability regarding quality but also ensure high inference efficiency without re-training. Based on a mechanism of reducing the scale of a graph to be processed by the refinement method, PR-GPT also has the potential to support streaming GP. Experiments on the Graph Challenge benchmark demonstrate that PR-GPT can ensure faster GP on large-scale graphs without significant quality degradation, compared with running a refinement method from scratch. We will make our code public at https://github.com/KuroginQin/PRGPT.

5.7CVAug 3, 2022

GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning

Benyuan Sun, Jin Dai, Zihao Liang et al.

Pre-training over mixtured multi-task, multi-domain, and multi-modal data remains an open challenge in vision perception pre-training. In this paper, we propose GPPF, a General Perception Pre-training Framework, that pre-trains a task-level dynamic network, which is composed by knowledge "legos" in each layers, on labeled multi-task and multi-domain datasets. By inspecting humans' innate ability to learn in complex environment, we recognize and transfer three critical elements to deep networks: (1) simultaneous exposure to diverse cross-task and cross-domain information in each batch. (2) partitioned knowledge storage in separate lego units driven by knowledge sharing. (3) sparse activation of a subset of lego units for both pre-training and downstream tasks. Noteworthy, the joint training of disparate vision tasks is non-trivial due to their differences in input shapes, loss functions, output formats, data distributions, etc. Therefore, we innovatively develop a plug-and-play multi-task training algorithm, which supports Single Iteration Multiple Tasks (SIMT) concurrently training. SIMT lays the foundation of pre-training with large-scale multi-task multi-domain datasets and is proved essential for stable training in our GPPF experiments. Excitingly, the exhaustive experiments show that, our GPPF-R50 model achieves significant improvements of 2.5-5.8 over a strong baseline of the 8 pre-training tasks in GPPF-15M and harvests a range of SOTAs over the 22 downstream tasks with similar computation budgets. We also validate the generalization ability of GPPF to SOTA vision transformers with consistent improvements. These solid experimental results fully prove the effective knowledge learning, storing, sharing, and transfer provided by our novel GPPF framework.

7.4ITMay 9

Tight Lower Bounds on The Single-Error Detection Threshold for Analog Error-Correcting Codes

Zhengyi Jiang, Wenhao Liu, Zhongyi Huang et al.

Analog error-correcting codes (Analog ECCs) for approximate vector-matrix multiplication have been extensively studied as means to achieve fault-tolerant in-memory computation. The theoretical foundations for such coding schemes, particularly the characterization of their correction capabilities via the height profile, have been well established in recent literature. In this paper, we focus on the case of single-error detection Analog ECCs. Among several open problems related to this case proposed by Ron M. Roth in [1], Problem 1 asks: "Identify the values of $k$ and $n$ for which every linear $[n, k]$ code $\mathcal{C}$ over $\mathbb{R}$ satisfies: $$\mathsf{h}_1(\mathcal{C}):=\max_{\boldsymbol{c}\in \mathcal{C}\setminus{\{\boldsymbol{0}\}}}\mathsf{h}_1(\boldsymbol{c})\geq \Big\lceil \frac{k}{n-k} \Big\rceil.\text{"}$$ Here, for any $\boldsymbol{x}\in\mathbb{R}^n$, $\mathsf{h}_1(\boldsymbol{x})$ represents the ratio between the largest and second largest absolute values of $\boldsymbol{x}$'s entries. As the simplest special case of Problem 1 (with $n-k=2$), the following problem was posed as Problem 2 in [1]: "Must every $(n-2)$-dimensional subspace of $\mathbb{R}^n$, $n$ even, contain a nonzero vector in which the ratio between the largest and second largest absolute values of its entries is at least $(n/2)-1$?" These problems directly pertain to the lower bounds on the single-error detection threshold for Analog ECCs: Problem 1 corresponds to arbitrary $n-k$ and Problem 2 corresponds to $n-k=2$. In this paper, we provide an affirmative answer to Problem 2 and a rigorous proof using theories related to convex optimization. Furthermore, we extend our analytical method to show that the lower bound in Problem 1 is tight for the case where $n-k$ divides $k$. Our results fill the gap in the lower bound theory of thresholds for single-error detection in Analog ECCs.

1.2SISep 29, 2022

Trading off Quality for Efficiency of Community Detection: An Inductive Method across Graphs

Meng Qin, Chaorui Zhang, Bo Bai et al.

Many network applications can be formulated as NP-hard combinatorial optimization problems of community detection (CD). Due to the NP-hardness, to balance the CD quality and efficiency remains a challenge. Most existing CD methods are transductive, which are independently optimized only for the CD on a single graph. Some of these methods use advanced machine learning techniques to obtain high-quality CD results but usually have high complexity. Other approaches use fast heuristic approximation to ensure low runtime but may suffer from quality degradation. In contrast to these transductive methods, we propose an alternative inductive community detection (ICD) method across graphs of a system or scenario to alleviate the NP-hard challenge. ICD first conducts the offline training of an adversarial dual GNN on historical graphs to capture key properties of the system. The trained model is then directly generalized to new unseen graphs for online CD without additional optimization, where a better trade-off between quality and efficiency can be achieved. ICD can also capture the permutation invariant community labels in the offline training and tackle the online CD on new graphs with non-fixed number of nodes and communities. Experiments on a set of benchmarks demonstrate that ICD can achieve a significant trade-off between quality and efficiency over various baselines.

14.5IVFeb 14, 2024Code

Extreme Video Compression with Pre-trained Diffusion Models

Bohan Li, Yiming Liu, Xueyan Niu et al.

Diffusion models have achieved remarkable success in generating high quality image and video data. More recently, they have also been used for image compression with high perceptual quality. In this paper, we present a novel approach to extreme video compression leveraging the predictive power of diffusion-based generative models at the decoder. The conditional diffusion model takes several neural compressed frames and generates subsequent frames. When the reconstruction quality drops below the desired level, new frames are encoded to restart prediction. The entire video is sequentially encoded to achieve a visually pleasing reconstruction, considering perceptual quality metrics such as the learned perceptual image patch similarity (LPIPS) and the Frechet video distance (FVD), at bit rates as low as 0.02 bits per pixel (bpp). Experimental results demonstrate the effectiveness of the proposed scheme compared to standard codecs such as H.264 and H.265 in the low bpp regime. The results showcase the potential of exploiting the temporal relations in video data using generative models. Code is available at: https://github.com/ElesionKyrie/Extreme-Video-Compression-With-Prediction-Using-Pre-trainded-Diffusion-Models-

14.6CLDec 15, 2023

Extending Context Window of Large Language Models via Semantic Compression

Weizhi Fei, Xueyan Niu, Pingyi Zhou et al. · tsinghua

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long texts. We propose a novel semantic compression method that enables generalization to texts that are 6-8 times longer, without incurring significant computational costs or requiring fine-tuning. Our proposed framework draws inspiration from source coding in information theory and employs a pre-trained model to reduce the semantic redundancy of long inputs before passing them to the LLMs for downstream tasks. Experimental results demonstrate that our method effectively extends the context window of LLMs across a range of tasks including question answering, summarization, few-shot learning, and information retrieval. Furthermore, the proposed semantic compression method exhibits consistent fluency in text generation while reducing the associated computational overhead.

6.3IRAug 12, 2025

DB3 Team's Solution For Meta KDD Cup' 25

Yikuan Xia, Jiazun Chen, Yirui Zhan et al.

This paper presents the db3 team's winning solution for the Meta CRAG-MM Challenge 2025 at KDD Cup'25. Addressing the challenge's unique multi-modal, multi-turn question answering benchmark (CRAG-MM), we developed a comprehensive framework that integrates tailored retrieval pipelines for different tasks with a unified LLM-tuning approach for hallucination control. Our solution features (1) domain-specific retrieval pipelines handling image-indexed knowledge graphs, web sources, and multi-turn conversations; and (2) advanced refusal training using SFT, DPO, and RL. The system achieved 2nd place in Task 1, 2nd place in Task 2, and 1st place in Task 3, securing the grand prize for excellence in ego-centric queries through superior handling of first-person perspective challenges.

3.6IRMar 2, 2025

ER-RAG: Enhance RAG with ER-Based Unified Modeling of Heterogeneous Data Sources

Yikuan Xia, Jiazun Chen, Yirui Zhan et al.

Large language models (LLMs) excel in question-answering (QA) tasks, and retrieval-augmented generation (RAG) enhances their precision by incorporating external evidence from diverse sources like web pages, databases, and knowledge graphs. However, current RAG methods rely on agent-specific strategies for individual data sources, posing challenges low-resource or black-box environments and complicates operations when evidence is fragmented across sources. To address these limitations, we propose ER-RAG, a framework that unifies evidence integration across heterogeneous data sources using the Entity-Relationship (ER) model. ER-RAG standardizes entity retrieval and relationship querying through ER-based APIs with GET and JOIN operations. It employs a two-stage generation process: first, a preference optimization module selects optimal sources; second, another module constructs API chains based on source schemas. This unified approach allows efficient fine-tuning and seamless integration across diverse data sources. ER-RAG demonstrated its effectiveness by winning all three tracks of the 2024 KDDCup CRAG Challenge, achieving performance on par with commercial RAG pipelines using an 8B LLM backbone. It outperformed hybrid competitors by 3.1% in LLM score and accelerated retrieval by 5.5X.

9.9LGNov 5, 2021

Branch and Bound in Mixed Integer Linear Programming Problems: A Survey of Techniques and Trends

Lingying Huang, Xiaomeng Chen, Wei Huo et al.

In this paper, we surveyed the existing literature studying different approaches and algorithms for the four critical components in the general branch and bound (B&B) algorithm, namely, branching variable selection, node selection, node pruning, and cutting-plane selection. However, the complexity of the B&B algorithm always grows exponentially with respect to the increase of the decision variable dimensions. In order to improve the speed of B&B algorithms, learning techniques have been introduced in this algorithm recently. We further surveyed how machine learning can be used to improve the four critical components in B&B algorithms. In general, a supervised learning method helps to generate a policy that mimics an expert but significantly improves the speed. An unsupervised learning method helps choose different methods based on the features. In addition, models trained with reinforcement learning can beat the expert policy, given enough training and a supervised initialization. Detailed comparisons between different algorithms have been summarized in our survey. Finally, we discussed some future research directions to accelerate and improve the algorithms further in the literature.

18.8SIJan 26, 2019

GCN-GAN: A Non-linear Temporal Link Prediction Model for Weighted Dynamic Networks

Kai Lei, Meng Qin, Bo Bai et al.

In this paper, we generally formulate the dynamics prediction problem of various network systems (e.g., the prediction of mobility, traffic and topology) as the temporal link prediction task. Different from conventional techniques of temporal link prediction that ignore the potential non-linear characteristics and the informative link weights in the dynamic network, we introduce a novel non-linear model GCN-GAN to tackle the challenging temporal link prediction task of weighted dynamic networks. The proposed model leverages the benefits of the graph convolutional network (GCN), long short-term memory (LSTM) as well as the generative adversarial network (GAN). Thus, the dynamics, topology structure and evolutionary patterns of weighted dynamic networks can be fully exploited to improve the temporal link prediction performance. Concretely, we first utilize GCN to explore the local topological characteristics of each single snapshot and then employ LSTM to characterize the evolving features of the dynamic networks. Moreover, GAN is used to enhance the ability of the model to generate the next weighted network snapshot, which can effectively tackle the sparsity and the wide-value-range problem of edge weights in real-life dynamic networks. To verify the model's effectiveness, we conduct extensive experiments on four datasets of different network systems and application scenarios. The experimental results demonstrate that our model achieves impressive results compared to the state-of-the-art competitors.