Yu-Zhong Chen

CL
3papers
95citations
Novelty55%
AI Score29

3 Papers

CLJun 17, 2024
Retrieval-Augmented Feature Generation for Domain-Specific Classification

Xinhao Zhang, Jinghan Zhang, Fengran Mo et al.

Feature generation can significantly enhance learning outcomes, particularly for tasks with limited data. An effective way to improve feature generation is to expand the current feature space using existing features and enriching the informational content. However, generating new, interpretable features usually requires domain-specific knowledge on top of the existing features. In this paper, we introduce a Retrieval-Augmented Feature Generation method, RAFG, to generate useful and explainable features specific to domain classification tasks. To increase the interpretability of the generated features, we conduct knowledge retrieval among the existing features in the domain to identify potential feature associations. These associations are expected to help generate useful features. Moreover, we develop a framework based on large language models (LLMs) for feature generation with reasoning to verify the quality of the features during their generation process. Experiments across several datasets in medical, economic, and geographic domains show that our RAFG method can produce high-quality, meaningful features and significantly improve classification performance compared with baseline methods.

CRMar 24, 2016
Spatiotemporal patterns and predictability of cyberattacks

Yu-Zhong Chen, Zi-Gang Huang, Shouhuai Xu et al.

A relatively unexplored issue in cybersecurity science and engineering is whether there exist intrinsic patterns of cyberattacks. Conventional wisdom favors absence of such patterns due to the overwhelming complexity of the modern cyberspace. Surprisingly, through a detailed analysis of an extensive data set that records the time-dependent frequencies of attacks over a relatively wide range of consecutive IP addresses, we successfully uncover intrinsic spatiotemporal patterns underlying cyberattacks, where the term "spatio" refers to the IP address space. In particular, we focus on analyzing {\em macroscopic} properties of the attack traffic flows and identify two main patterns with distinct spatiotemporal characteristics: deterministic and stochastic. Strikingly, there are very few sets of major attackers committing almost all the attacks, since their attack "fingerprints" and target selection scheme can be unequivocally identified according to the very limited number of unique spatiotemporal characteristics, each of which only exists on a consecutive IP region and differs significantly from the others. We utilize a number of quantitative measures, including the flux-fluctuation law, the Markov state transition probability matrix, and predictability measures, to characterize the attack patterns in a comprehensive manner. A general finding is that the attack patterns possess high degrees of predictability, potentially paving the way to anticipating and, consequently, mitigating or even preventing large-scale cyberattacks using macroscopic approaches.

SYSep 10, 2015
The paradox of controlling complex networks: control inputs versus energy requirement

Yu-Zhong Chen, Lezhi Wang, Wenxu Wang et al.

In this paper, we investigate the linear controllability framework for complex networks from a physical point of view. There are three main results. (1) If one applies control signals as determined from the structural controllability theory, there is a high probability that the control energy will diverge. Especially, if a network is deemed controllable using a single driving signal, then most likely the energy will diverge. (2) The energy required for control exhibits a power-law scaling behavior. (3) Applying additional control signals at proper nodes in the network can reduce and optimize the energy cost. We identify the fundamental structures embedded in the network, the longest control chains, which determine the control energy and give rise to the power-scaling behavior. (To our knowledge, this was not reported in any previous work on control of complex networks.) In addition, the issue of control precision is addressed. These results are supported by extensive simulations from model and real networks, physical reasoning, and mathematical analyses. Notes on the submission history of this work: This work started in late 2012. The phenomena of power-law energy scaling and energy divergence with a single controller were discovered in 2013. Strategies to reduce and optimize control energy was articulated and tested in 2013. The senior co-author (YCL) gave talks about these results at several conferences, including the NETSCI 2014 Satellite entitled "Controlling Complex Networks" on June 2, 2014. The paper was submitted to PNAS in September 2014 and was turned down. It was revised and submitted to PRX in early 2015 and was rejected. After that it was revised and submitted to Nature Communications in May 2015 and again was turned down.