Peng Fang

CL
h-index23
5papers
280citations
Novelty49%
AI Score48

5 Papers

DCMar 28, 2023
Distributed Graph Embedding with Information-Oriented Random Walks

Peng Fang, Arijit Khan, Siqiang Luo et al.

Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks. The increasing availability of billion-edge graphs underscores the importance of learning efficient and effective embeddings on large graphs, such as link prediction on Twitter with over one billion edges. Most existing graph embedding methods fall short of reaching high data scalability. In this paper, we present a general-purpose, distributed, information-centric random walk-based graph embedding framework, DistGER, which can scale to embed billion-edge graphs. DistGER incrementally computes information-centric random walks. It further leverages a multi-proximity-aware, streaming, parallel graph partitioning strategy, simultaneously achieving high local partition quality and excellent workload balancing across machines. DistGER also improves the distributed Skip-Gram learning model to generate node embeddings by optimizing the access locality, CPU throughput, and synchronization efficiency. Experiments on real-world graphs demonstrate that compared to state-of-the-art distributed graph embedding frameworks, including KnightKing, DistDGL, and Pytorch-BigGraph, DistGER exhibits 2.33x-129x acceleration, 45% reduction in cross-machines communication, and > 10% effectiveness improvement in downstream tasks.

SIApr 8
IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics

Yunyao Zhang, Zuocheng Ying, Xinglang Zhang et al.

LLM-based social network simulation introduces a new computational approach for modeling event evolution in complex online environments. However, existing methods typically simulate social processes under a fixed event trajectory, treating the event as static once initialized and overlooking intervention dynamics, and thus fail to capture the intrinsic evolution of real social network events, where source-side interventions and collective interactions continuously reshape event trajectories, sometimes leading to secondary popularity explosions and collective attitude shifts. To address this limitation, we introduce an intervention-aware simulation framework, IntervenSim, that models event evolution and intervention in a closed loop. We model event developments and source-side interventions using source agents, and collective crowd reactions using crowd agents, capturing their continuous co-evolution through an intervention-aware mechanism that couples source-side intervention, group interaction, and feedback-driven adjustment of subsequent interventions. Experiments on diverse real-world events show that IntervenSim improves MAPE by 41.6% and DTW by 66.9% over prior frameworks, while reducing computational cost with fewer yet more capable agents. These improvements indicate that IntervenSim not only simulates regular event trajectories more faithfully, but also better captures opinion dynamics under intervention in complex cases.

CLApr 18, 2025Code
CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models

Feiyang Li, Peng Fang, Zhan Shi et al.

Chain-of-thought (CoT) reasoning boosts large language models' (LLMs) performance on complex tasks but faces two key limitations: a lack of reliability when solely relying on LLM-generated reasoning chains and lower reasoning performance from natural language prompts compared with code prompts. To address these issues, we propose CoT-RAG, a novel reasoning framework with three key designs: (i) Knowledge Graph-driven CoT Generation, featuring knowledge graphs to modulate reasoning chain generation of LLMs, thereby enhancing reasoning credibility; (ii) Learnable Knowledge Case-aware RAG, which incorporates retrieval-augmented generation (RAG) into knowledge graphs to retrieve relevant sub-cases and sub-descriptions, providing LLMs with learnable information; (iii) Pseudo Program Prompting Execution, which promotes greater logical rigor by guiding LLMs to execute reasoning tasks as pseudo-programs. Evaluations on nine public datasets spanning three reasoning tasks reveal significant accuracy gains-ranging from 4.0% to 44.3%-over state-of-the-art methods. Furthermore, tests on four domain-specific datasets demonstrate exceptional accuracy and efficient execution, underscoring its practical applicability and scalability. Our code and data are available at https: //github.com/hustlfy123/CoT-RAG.

LGOct 2, 2025
Pilot selection in the era of Virtual reality: algorithms for accurate and interpretable machine learning models

Luoma Ke, Guangpeng Zhang, Jibo He et al.

With the rapid growth of the aviation industry, there is a need for a large number of flight crew. How to select the right pilots in a cost-efficient manner has become an important research question. In the current study, twenty-three pilots were recruited from China Eastern Airlines, and 23 novices were from the community of Tsinghua University. A novel approach incorporating machine learning and virtual reality technology was applied to distinguish features between these participants with different flight skills. Results indicate that SVM with the MIC feature selection method consistently achieved the highest prediction performance on all metrics with an Accuracy of 0.93, an AUC of 0.96, and an F1 of 0.93, which outperforms four other classifier algorithms and two other feature selection methods. From the perspective of feature selection methods, the MIC method can select features with a nonlinear relationship to sampling labels, instead of a simple filter-out. Our new implementation of the SVM + MIC algorithm outperforms all existing pilot selection algorithms and perhaps provides the first implementation based on eye tracking and flight dynamics data. This study's VR simulation platforms and algorithms can be used for pilot selection and training.

SDMay 11, 2020
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

Geng Yang, Shan Yang, Kai Liu et al.

In this paper, we propose multi-band MelGAN, a much faster waveform generation model targeting to high-quality text-to-speech. Specifically, we improve the original MelGAN by the following aspects. First, we increase the receptive field of the generator, which is proven to be beneficial to speech generation. Second, we substitute the feature matching loss with the multi-resolution STFT loss to better measure the difference between fake and real speech. Together with pre-training, this improvement leads to both better quality and better training stability. More importantly, we extend MelGAN with multi-band processing: the generator takes mel-spectrograms as input and produces sub-band signals which are subsequently summed back to full-band signals as discriminator input. The proposed multi-band MelGAN has achieved high MOS of 4.34 and 4.22 in waveform generation and TTS, respectively. With only 1.91M parameters, our model effectively reduces the total computational complexity of the original MelGAN from 5.85 to 0.95 GFLOPS. Our Pytorch implementation, which will be open-resourced shortly, can achieve a real-time factor of 0.03 on CPU without hardware specific optimization.