Liang Chen

h-index14

6papers

69citations

Novelty51%

AI Score37

Ranked #92,895 of 194,257 authors (top 48%)#283 in NE (top 26%)

6 Papers

8.3CLFeb 12, 2025Code

Measuring Diversity in Synthetic Datasets

Yuchang Zhu, Huizhe Zhang, Bingzhe Wu et al.

Large language models (LLMs) are widely adopted to generate synthetic datasets for various natural language processing (NLP) tasks, such as text classification and summarization. However, accurately measuring the diversity of these synthetic datasets-an aspect crucial for robust model performance-remains a significant challenge. In this paper, we introduce DCScore, a novel method for measuring synthetic dataset diversity from a classification perspective. Specifically, DCScore formulates diversity evaluation as a sample classification task, leveraging mutual relationships among samples. We further provide theoretical verification of the diversity-related axioms satisfied by DCScore, highlighting its role as a principled diversity evaluation method. Experimental results on synthetic datasets reveal that DCScore enjoys a stronger correlation with multiple diversity pseudo-truths of evaluated datasets, underscoring its effectiveness. Moreover, both empirical and theoretical evidence demonstrate that DCScore substantially reduces computational costs compared to existing methods. Code is available at: https://github.com/bluewhalelab/dcscore.

21.3LGFeb 19, 2025

Are Large Language Models In-Context Graph Learners?

Jintang Li, Ruofan Wu, Yuchang Zhu et al.

Large language models (LLMs) have demonstrated remarkable in-context reasoning capabilities across a wide range of tasks, particularly with unstructured inputs such as language or images. However, LLMs struggle to handle structured data, such as graphs, due to their lack of understanding of non-Euclidean structures. As a result, without additional fine-tuning, their performance significantly lags behind that of graph neural networks (GNNs) in graph learning tasks. In this paper, we show that learning on graph data can be conceptualized as a retrieval-augmented generation (RAG) process, where specific instances (e.g., nodes or edges) act as queries, and the graph itself serves as the retrieved context. Building on this insight, we propose a series of RAG frameworks to enhance the in-context learning capabilities of LLMs for graph learning tasks. Comprehensive evaluations demonstrate that our proposed RAG frameworks significantly improve LLM performance on graph-based tasks, particularly in scenarios where a pretrained LLM must be used without modification or accessed via an API.

3.3NEMar 26, 2024Code

SGHormer: An Energy-Saving Graph Transformer Driven by Spikes

Huizhe Zhang, Jintang Li, Liang Chen et al.

Graph Transformers (GTs) with powerful representation learning ability make a huge success in wide range of graph tasks. However, the costs behind outstanding performances of GTs are higher energy consumption and computational overhead. The complex structure and quadratic complexity during attention calculation in vanilla transformer seriously hinder its scalability on the large-scale graph data. Though existing methods have made strides in simplifying combinations among blocks or attention-learning paradigm to improve GTs' efficiency, a series of energy-saving solutions originated from biologically plausible structures are rarely taken into consideration when constructing GT framework. To this end, we propose a new spiking-based graph transformer (SGHormer). It turns full-precision embeddings into sparse and binarized spikes to reduce memory and computational costs. The spiking graph self-attention and spiking rectify blocks in SGHormer explicitly capture global structure information and recover the expressive power of spiking embeddings, respectively. In experiments, SGHormer achieves comparable performances to other full-precision GTs with extremely low computational energy consumption. The results show that SGHomer makes a remarkable progress in the field of low-energy GTs.

14.7NEMay 30, 2023Code

A Graph is Worth 1-bit Spikes: When Graph Contrastive Learning Meets Spiking Neural Networks

Jintang Li, Huizhe Zhang, Ruofan Wu et al.

While contrastive self-supervised learning has become the de-facto learning paradigm for graph neural networks, the pursuit of higher task accuracy requires a larger hidden dimensionality to learn informative and discriminative full-precision representations, raising concerns about computation, memory footprint, and energy consumption burden (largely overlooked) for real-world applications. This work explores a promising direction for graph contrastive learning (GCL) with spiking neural networks (SNNs), which leverage sparse and binary characteristics to learn more biologically plausible and compact representations. We propose SpikeGCL, a novel GCL framework to learn binarized 1-bit representations for graphs, making balanced trade-offs between efficiency and performance. We provide theoretical guarantees to demonstrate that SpikeGCL has comparable expressiveness with its full-precision counterparts. Experimental results demonstrate that, with nearly 32x representation storage compression, SpikeGCL is either comparable to or outperforms many fancy state-of-the-art supervised and self-supervised methods across several graph benchmarks.

1.1CVMar 25, 2016

Object Recognition Based on Amounts of Unlabeled Data

Fuqiang Liu, Fukun Bi, Liang Chen

This paper proposes a novel semi-supervised method on object recognition. First, based on Boost Picking, a universal algorithm, Boost Picking Teaching (BPT), is proposed to train an effective binary-classifier just using a few labeled data and amounts of unlabeled data. Then, an ensemble strategy is detailed to synthesize multiple BPT-trained binary-classifiers to be a high-performance multi-classifier. The rationality of the strategy is also analyzed in theory. Finally, the proposed method is tested on two databases, CIFAR-10 and CIFAR-100. Using 2% labeled data and 98% unlabeled data, the accuracies of the proposed method on the two data sets are 78.39% and 50.77% respectively.

3.3MMJul 17, 2013

Smart Streaming for Online Video Services

Liang Chen, Yipeng Zhou, Dah Ming Chiu

Bandwidth consumption is a significant concern for online video service providers. Practical video streaming systems usually use some form of HTTP streaming (progressive download) to let users download the video at a faster rate than the video bitrate. Since users may quit before viewing the complete video, however, much of the downloaded video will be "wasted". To the extent that users' departure behavior can be predicted, we develop smart streaming that can be used to improve user QoE with limited server bandwidth or save bandwidth cost with unlimited server bandwidth. Through measurement, we extract certain user behavior properties for implementing such smart streaming, and demonstrate its advantage using prototype implementation as well as simulations.