AIJan 13, 2023
Evolve Path Tracer: Early Detection of Malicious Addresses in CryptocurrencyLing Cheng, Feida Zhu, Yong Wang et al.
With the ever-increasing boom of Cryptocurrency, detecting fraudulent behaviors and associated malicious addresses draws significant research effort. However, most existing studies still rely on the full history features or full-fledged address transaction networks, thus cannot meet the requirements of early malicious address detection, which is urgent but seldom discussed by existing studies. To detect fraud behaviors of malicious addresses in the early stage, we present Evolve Path Tracer, which consists of Evolve Path Encoder LSTM, Evolve Path Graph GCN, and Hierarchical Survival Predictor. Specifically, in addition to the general address features, we propose asset transfer paths and corresponding path graphs to characterize early transaction patterns. Further, since the transaction patterns are changing rapidly during the early stage, we propose Evolve Path Encoder LSTM and Evolve Path Graph GCN to encode asset transfer path and path graph under an evolving structure setting. Hierarchical Survival Predictor then predicts addresses' labels with nice scalability and faster prediction speed. We investigate the effectiveness and versatility of Evolve Path Tracer on three real-world illicit bitcoin datasets. Our experimental results demonstrate that Evolve Path Tracer outperforms the state-of-the-art methods. Extensive scalability experiments demonstrate the model's adaptivity under a dynamic prediction setting.
LGSep 26, 2023
From Asset Flow to Status, Action and Intention Discovery: Early Malice Detection in CryptocurrencyLing Cheng, Feida Zhu, Yong Wang et al.
Cryptocurrency has been subject to illicit activities probably more often than traditional financial assets due to the pseudo-anonymous nature of its transacting entities. An ideal detection model is expected to achieve all three critical properties of (I) early detection, (II) good interpretability, and (III) versatility for various illicit activities. However, existing solutions cannot meet all these requirements, as most of them heavily rely on deep learning without interpretability and are only available for retrospective analysis of a specific illicit type. To tackle all these challenges, we propose Intention-Monitor for early malice detection in Bitcoin (BTC), where the on-chain record data for a certain address are much scarcer than other cryptocurrency platforms. We first define asset transfer paths with the Decision-Tree based feature Selection and Complement (DT-SC) to build different feature sets for different malice types. Then, the Status/Action Proposal Module (S/A-PM) and the Intention-VAE module generate the status, action, intent-snippet, and hidden intent-snippet embedding. With all these modules, our model is highly interpretable and can detect various illegal activities. Moreover, well-designed loss functions further enhance the prediction speed and model's interpretability. Extensive experiments on three real-world datasets demonstrate that our proposed algorithm outperforms the state-of-the-art methods. Furthermore, additional case studies justify our model can not only explain existing illicit patterns but can also find new suspicious characters.
LGSep 11, 2023
Examining the Effect of Pre-training on Time Series ClassificationJiashu Pu, Shiwei Zhao, Ling Cheng et al.
Although the pre-training followed by fine-tuning paradigm is used extensively in many fields, there is still some controversy surrounding the impact of pre-training on the fine-tuning process. Currently, experimental findings based on text and image data lack consensus. To delve deeper into the unsupervised pre-training followed by fine-tuning paradigm, we have extended previous research to a new modality: time series. In this study, we conducted a thorough examination of 150 classification datasets derived from the Univariate Time Series (UTS) and Multivariate Time Series (MTS) benchmarks. Our analysis reveals several key conclusions. (i) Pre-training can only help improve the optimization process for models that fit the data poorly, rather than those that fit the data well. (ii) Pre-training does not exhibit the effect of regularization when given sufficient training time. (iii) Pre-training can only speed up convergence if the model has sufficient ability to fit the data. (iv) Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume, such as faster convergence. (v) While both the pre-training task and the model structure determine the effectiveness of the paradigm on a given dataset, the model structure plays a more significant role.
LGSep 24, 2022
Toward Intention Discovery for Early Malice Detection in BitcoinLing Cheng, Feida Zhu, Yong Wang et al.
Bitcoin has been subject to illicit activities more often than probably any other financial assets, due to the pseudo-anonymous nature of its transacting entities. An ideal detection model is expected to achieve all the three properties of (I) early detection, (II) good interpretability, and (III) versatility for various illicit activities. However, existing solutions cannot meet all these requirements, as most of them heavily rely on deep learning without satisfying interpretability and are only available for retrospective analysis of a specific illicit type. First, we present asset transfer paths, which aim to describe addresses' early characteristics. Next, with a decision tree based strategy for feature selection and segmentation, we split the entire observation period into different segments and encode each as a segment vector. After clustering all these segment vectors, we get the global status vectors, essentially the basic unit to describe the whole intention. Finally, a hierarchical self-attention predictor predicts the label for the given address in real time. A survival module tells the predictor when to stop and proposes the status sequence, namely intention. % With the type-dependent selection strategy and global status vectors, our model can be applied to detect various illicit activities with strong interpretability. The well-designed predictor and particular loss functions strengthen the model's prediction speed and interpretability one step further. Extensive experiments on three real-world datasets show that our proposed algorithm outperforms state-of-the-art methods. Besides, additional case studies justify our model can not only explain existing illicit patterns but can also find new suspicious characters.
CLFeb 15, 2024
Crafting a Good Prompt or Providing Exemplary Dialogues? A Study of In-Context Learning for Persona-based Dialogue GenerationJiashu Pu, Yajing Wan, Yuru Zhang et al.
Previous in-context learning (ICL) research has focused on tasks such as classification, machine translation, text2table, etc., while studies on whether ICL can improve human-like dialogue generation are scarce. Our work fills this gap by systematically investigating the ICL capabilities of large language models (LLMs) in persona-based dialogue generation, conducting extensive experiments on high-quality real human Chinese dialogue datasets. From experimental results, we draw three conclusions: 1) adjusting prompt instructions is the most direct, effective, and economical way to improve generation quality; 2) randomly retrieving demonstrations (demos) achieves the best results, possibly due to the greater diversity and the amount of effective information; counter-intuitively, retrieving demos with a context identical to the query performs the worst; 3) even when we destroy the multi-turn associations and single-turn semantics in the demos, increasing the number of demos still improves dialogue performance, proving that LLMs can learn from corrupted dialogue demos. Previous explanations of the ICL mechanism, such as $n$-gram induction head, cannot fully account for this phenomenon.
SIApr 22, 2025
New Recipe for Semi-supervised Community Detection: Clique Annealing under Crystallization KineticsLing Cheng, Jiashu Pu, Ruicheng Liang et al.
Semi-supervised community detection methods are widely used for identifying specific communities due to the label scarcity. Existing semi-supervised community detection methods typically involve two learning stages learning in both initial identification and subsequent adjustment, which often starts from an unreasonable community core candidate. Moreover, these methods encounter scalability issues because they depend on reinforcement learning and generative adversarial networks, leading to higher computational costs and restricting the selection of candidates. To address these limitations, we draw a parallel between crystallization kinetics and community detection to integrate the spontaneity of the annealing process into community detection. Specifically, we liken community detection to identifying a crystal subgrain (core) that expands into a complete grain (community) through a process similar to annealing. Based on this finding, we propose CLique ANNealing (CLANN), which applies kinetics concepts to community detection by integrating these principles into the optimization process to strengthen the consistency of the community core. Subsequently, a learning-free Transitive Annealer was employed to refine the first-stage candidates by merging neighboring cliques and repositioning the community core, enabling a spontaneous growth process that enhances scalability. Extensive experiments on \textbf{43} different network settings demonstrate that CLANN outperforms state-of-the-art methods across multiple real-world datasets, showcasing its exceptional efficacy and efficiency in community detection.
CRJan 6, 2025
Proof-of-Data: A Consensus Protocol for Collaborative IntelligenceHuiwen Liu, Feida Zhu, Ling Cheng
Existing research on federated learning has been focused on the setting where learning is coordinated by a centralized entity. Yet the greatest potential of future collaborative intelligence would be unleashed in a more open and democratized setting with no central entity in a dominant role, referred to as "decentralized federated learning". New challenges arise accordingly in achieving both correct model training and fair reward allocation with collective effort among all participating nodes, especially with the threat of the Byzantine node jeopardising both tasks. In this paper, we propose a blockchain-based decentralized Byzantine fault-tolerant federated learning framework based on a novel Proof-of-Data (PoD) consensus protocol to resolve both the "trust" and "incentive" components. By decoupling model training and contribution accounting, PoD is able to enjoy not only the benefit of learning efficiency and system liveliness from asynchronous societal-scale PoW-style learning but also the finality of consensus and reward allocation from epoch-based BFT-style voting. To mitigate false reward claims by data forgery from Byzantine attacks, a privacy-aware data verification and contribution-based reward allocation mechanism is designed to complete the framework. Our evaluation results show that PoD demonstrates performance in model training close to that of the centralized counterpart while achieving trust in consensus and fairness for reward allocation with a fault tolerance ratio of 1/3.
CVSep 29, 2021
Geometry-Entangled Visual Semantic Transformer for Image CaptioningLing Cheng, Wei Wei, Feida Zhu et al.
Recent advancements of image captioning have featured Visual-Semantic Fusion or Geometry-Aid attention refinement. However, those fusion-based models, they are still criticized for the lack of geometry information for inter and intra attention refinement. On the other side, models based on Geometry-Aid attention still suffer from the modality gap between visual and semantic information. In this paper, we introduce a novel Geometry-Entangled Visual Semantic Transformer (GEVST) network to realize the complementary advantages of Visual-Semantic Fusion and Geometry-Aid attention refinement. Concretely, a Dense-Cap model proposes some dense captions with corresponding geometry information at first. Then, to empower GEVST with the ability to bridge the modality gap among visual and semantic information, we build four parallel transformer encoders VV(Pure Visual), VS(Semantic fused to Visual), SV(Visual fused to Semantic), SS(Pure Semantic) for final caption generation. Both visual and semantic geometry features are used in the Fusion module and also the Self-Attention module for better attention measurement. To validate our model, we conduct extensive experiments on the MS-COCO dataset, the experimental results show that our GEVST model can obtain promising performance gains.
CRMay 17, 2021
On Decentralization of Bitcoin: An Asset PerspectiveLing Cheng, Feida Zhu, Huiwen Liu et al.
Since its advent in 2009, Bitcoin, a cryptography-enabled peer-to-peer digital payment system, has been gaining increasing attention from both academia and industry. An effort designed to overcome a cluster of bottlenecks inherent in existing centralized financial systems, Bitcoin has always been championed by the crypto community as an example of the spirit of decentralization. While the decentralized nature of Bitcoin's Proof-of-Work consensus algorithm has often been discussed in great detail, no systematic study has so far been conducted to quantitatively measure the degree of decentralization of Bitcoin from an asset perspective -- How decentralized is Bitcoin as a financial asset? We present in this paper the first systematic investigation of the degree of decentralization for Bitcoin based on its entire transaction history. We proposed both static and dynamic analysis of Bitcoin transaction network with quantifiable decentralization measures developed based on network analysis and market efficiency study. Case studies are also conducted to demonstrate the effectiveness of our proposed metrics.
NEJun 29, 2020
Solving MKP Applied to IoT in Smart Grid Using Meta-heuristics Algorithms: A Parallel Processing PerspectiveJandre Albertyn, Ling Cheng, Adnan M. Abu-Mahfouz
Increasing electricity prices in South Africa and the imminent threat of load shedding due to the overloaded power grid has led to a need for Demand Side Management (DSM) devices like smart grids. For smart grids to perform to their peak, their energy management controller (EMC) systems need to be optimized. Current solutions for DSM and optimization of the Multiple Knapsack Problem (MKP) have been investigated in this paper to discover the current state of common DSM models. Solutions from other NP-Hard problems in the form of the iterative Discrete Flower Pollination Algorithm (iDFPA) as well as possible future scalability options in the form of optimization through parallelization have also been suggested.
LGMar 25, 2020
A multivariate water quality parameter prediction model using recurrent neural networkDhruti Dheda, Ling Cheng
The global degradation of water resources is a matter of great concern, especially for the survival of humanity. The effective monitoring and management of existing water resources is necessary to achieve and maintain optimal water quality. The prediction of the quality of water resources will aid in the timely identification of possible problem areas and thus increase the efficiency of water management. The purpose of this research is to develop a water quality prediction model based on water quality parameters through the application of a specialised recurrent neural network (RNN), Long Short-Term Memory (LSTM) and the use of historical water quality data over several years. Both multivariate single and multiple step LSTM models were developed, using a Rectified Linear Unit (ReLU) activation function and a Root Mean Square Propagation (RMSprop) optimiser was developed. The single step model attained an error of 0.01 mg/L, whilst the multiple step model achieved a Root Mean Squared Error (RMSE) of 0.227 mg/L.
CVMar 24, 2020
Surface Damage Detection Scheme using Convolutional Neural Network and Artificial Neural NetworkAlice Yi Yang, Ling Cheng
Surface damage on concrete is important as the damage can affect the structural integrity of the structure. This paper proposes a two-step surface damage detection scheme using Convolutional Neural Network (CNN) and Artificial Neural Network (ANN). The CNN classifies given input images into two categories: positive and negative. The positive category is where the surface damage is present within the image, otherwise the image is classified as negative. This is an image-based classification. The ANN accepts image inputs that have been classified as positive by the ANN. This reduces the number of images that are further processed by the ANN. The ANN performs feature-based classification, in which the features are extracted from the detected edges within the image. The edges are detected using Canny edge detection. A total of 19 features are extracted from the detected edges. These features are inputs into the ANN. The purpose of the ANN is to highlight only the positive damaged edges within the image. The CNN achieves an accuracy of 80.7% for image classification and the ANN achieves an accuracy of 98.1% for surface detection. The decreased accuracy in the CNN is due to the false positive detection, however false positives are tolerated whereas false negatives are not. The false negative detection for both CNN and ANN in the two-step scheme are 0%.
NEFeb 19, 2020
Optimal DG allocation and sizing in power system networks using swarm-based algorithmsKayode Adetunji, Ivan Hofsajer, Ling Cheng
Distributed generation (DG) units are power generating plants that are very important to the architecture of present power system networks. The benefit of the addition of these DG units is to increase the power supply to a network. However, the installation of these DG units can cause an adverse effect if not properly allocated and/or sized. Therefore, there is a need to optimally allocate and size them to avoid cases such as voltage instability and expensive investment costs. In this paper, two swarm-based meta-heuristic algorithms, particle swarm optimization (PSO) and whale optimization algorithm (WOA) were developed to solve optimal placement and sizing of DG units in the quest for transmission network planning. A supportive technique, loss sensitivity factors (LSF) was used to identify potential buses for optimal location of DG units. The feasibility of the algorithms was confirmed on two IEEE bus test systems (14- and 30-bus). Comparison results showed that both algorithms produce good solutions and they outperform each other in different metrics. The WOA real power loss reduction considering techno-economic factors in the IEEE 14-bus and 30-bus test system are 6.14 MW and 10.77 MW, compared to the PSOs' 6.47 MW and 11.73 MW respectively. The PSO has a more reduced total DG unit size in both bus systems with 133.45 MW and 82.44 MW compared to WOAs' 152.21 MW and 82.44 MW respectively. The paper unveils the strengths and weaknesses of the PSO and the WOA in the application of optimal sizing of DG units in transmission networks.
LGFeb 7, 2020
Short sighted deep learningEllen de Melllo Koch, Anita de Mello Koch, Nicholas Kastanos et al.
A theory explaining how deep learning works is yet to be developed. Previous work suggests that deep learning performs a coarse graining, similar in spirit to the renormalization group (RG). This idea has been explored in the setting of a local (nearest neighbor interactions) Ising spin lattice. We extend the discussion to the setting of a long range spin lattice. Markov Chain Monte Carlo (MCMC) simulations determine both the critical temperature and scaling dimensions of the system. The model is used to train both a single RBM (restricted Boltzmann machine) network, as well as a stacked RBM network. Following earlier Ising model studies, the trained weights of a single layer RBM network define a flow of lattice models. In contrast to results for nearest neighbor Ising, the RBM flow for the long ranged model does not converge to the correct values for the spin and energy scaling dimension. Further, correlation functions between visible and hidden nodes exhibit key differences between the stacked RBM and RG flows. The stacked RBM flow appears to move towards low temperatures whereas the RG flow moves towards high temperature. This again differs from results obtained for nearest neighbor Ising.
SYOct 16, 2019
Trends in the optimal location and sizing of electrical units in smart grids using meta-heuristic algorithmsKayode Adetunji, Ivan Hofsajer, Ling Cheng
The development of smart grids has effectively transformed the traditional grid system. This promises numerous advantages for economic values and autonomous control of energy sources. In smart grids development, there are various objectives such as voltage stability, minimized power loss, minimized economic cost and voltage profile improvement. Thus, researchers have investigated several approaches based on meta-heuristic optimization algorithms for the optimal location and sizing of electrical units in a distribution system. Meta-heuristic algorithms have been applied to solve different problems in power systems and they have been successfully used in distribution systems. This paper presents a comprehensive review on existing methods for the optimal location and sizing of electrical units in distribution networks while considering the improvement of major objective functions. Techniques such as voltage stability index, power loss index, and loss sensitivity factors have been implemented alongside the meta-heuristic optimization algorithms to reduce the search space of solutions for objective functions. However, these techniques can cause a loss of optimality. Another perceived problem is the inappropriate handling of multiple objectives, which can also affect the optimality of results. Hence, a recent method such as Pareto fronts generation has been developed to produce non-dominating solutions. This review shows a need for more research on (i) the effective handling of multiple objective functions, (ii) more efficient meta-heuristic optimization algorithms and/or (iii) better supporting techniques.
CVSep 5, 2019
Stack-VS: Stacked Visual-Semantic Attention for Image Caption GenerationWei Wei, Ling Cheng, Xianling Mao et al.
Recently, automatic image caption generation has been an important focus of the work on multimodal translation task. Existing approaches can be roughly categorized into two classes, i.e., top-down and bottom-up, the former transfers the image information (called as visual-level feature) directly into a caption, and the later uses the extracted words (called as semanticlevel attribute) to generate a description. However, previous methods either are typically based one-stage decoder or partially utilize part of visual-level or semantic-level information for image caption generation. In this paper, we address the problem and propose an innovative multi-stage architecture (called as Stack-VS) for rich fine-gained image caption generation, via combining bottom-up and top-down attention models to effectively handle both visual-level and semantic-level information of an input image. Specifically, we also propose a novel well-designed stack decoder model, which is constituted by a sequence of decoder cells, each of which contains two LSTM-layers work interactively to re-optimize attention weights on both visual-level feature vectors and semantic-level attribute embeddings for generating a fine-gained image caption. Extensive experiments on the popular benchmark dataset MSCOCO show the significant improvements on different evaluation metrics, i.e., the improvements on BLEU-4/CIDEr/SPICE scores are 0.372, 1.226 and 0.216, respectively, as compared to the state-of-the-arts.
LGJun 12, 2019
Is Deep Learning a Renormalization Group Flow?Ellen de Mello Koch, Robert de Mello Koch, Ling Cheng
Although there has been a rapid development of practical applications, theoretical explanations of deep learning are in their infancy. Deep learning performs a sophisticated coarse graining. Since coarse graining is a key ingredient of the renormalization group (RG), RG may provide a useful theoretical framework directly relevant to deep learning. In this study we pursue this possibility. A statistical mechanics model for a magnet, the Ising model, is used to train an unsupervised restricted Boltzmann machine (RBM). The patterns generated by the trained RBM are compared to the configurations generated through an RG treatment of the Ising model. Although we are motivated by the connection between deep learning and RG flow, in this study we focus mainly on comparing a single layer of a deep network to a single step in the RG flow. We argue that correlation functions between hidden and visible neurons are capable of diagnosing RG-like coarse graining. Numerical experiments show the presence of RG-like patterns in correlators computed using the trained RBMs. The observables we consider are also able to exhibit important differences between RG and deep learning.
CVFeb 21, 2019
Long-Bone Fracture Detection using Artificial Neural Networks based on Contour Features of X-ray ImagesAlice Yi Yang, Ling Cheng
The following paper proposes two contour-based fracture detection schemes. The development of the contour-based fracture is based on the line-based fracture detection schemes proposed in arXiv:1902.07458. Existing Computer Aided Diagnosis (CAD) systems commonly employs Convolutional Neural Networks (CNN), although the cost to obtain a high accuracy is the amount of training data required. The purpose of the proposed schemes is to obtain a high classification accuracy with a reduced number of training data through the use of detected contours in X-ray images. There are two contour-based fracture detection schemes. The first is the Standard Contour Histogram Feature-Based (CHFB) and the second is the improved CHFB scheme. The difference between the two schemes is the removal of the surrounding detected flesh contours from the leg region in the improved CHFB scheme. The flesh contours are automatically classified as non-fractures. The contours are further refined to give a precise representation of the image edge objects. A total of 19 features are extracted from each refined contour. 8 out of the 19 features are based on the number of occurrences for particular detected gradients in the contour. Moreover, the occurrence of the 0-degree gradient in the contours are employed for the separation of the knee, leg and foot region. The features are a summary representation of the contour, in which it is used as inputs into the Artificial Neural Network (ANN). Both Standard CHFB and improved CHFB schemes are evaluated with the same experimental set-ups. The average system accuracy for the Standard CHFB scheme is 80.7%, whilst the improved CHFB scheme has an average accuracy of 82.98%. Additionally, the hierarchical clustering technique is adopted to highlight the fractured region within the X-ray image, using extracted 0-degree gradients from fractured contours.
CVFeb 20, 2019
Long-Bone Fracture Detection using Artificial Neural Networks based on Line Features of X-ray ImagesAlice Yi Yang, Ling Cheng
Two line-based fracture detection scheme are developed and discussed, namely Standard line-based fracture detection and Adaptive Differential Parameter Optimized (ADPO) line-based fracture detection. The purpose for the two line-based fracture detection schemes is to detect fractured lines from X-ray images using extracted features based on recognised patterns to differentiate fractured lines from non-fractured lines. The difference between the two schemes is the detection of detailed lines. The ADPO scheme optimizes the parameters of the Probabilistic Hough Transform, such that granule lines within the fractured regions are detected, whereas the Standard scheme is unable to detect them. The lines are detected using the Probabilistic Hough Function, in which the detected lines are a representation of the image edge objects. The lines are given in the form of points, (x,y), which includes the starting and ending point. Based on the given line points, 13 features are extracted from each line, as a summary of line information. These features are used for fracture and non-fracture classification of the detected lines. The classification is carried out by the Artificial Neural Network (ANN). There are two evaluations that are employed to evaluate both the entirety of the system and the ANN. The Standard Scheme is capable of achieving an average accuracy of 74.25%, whilst the ADPO scheme achieved an average accuracy of 74.4%. The ADPO scheme is opted for over the Standard scheme, however it can be further improved with detected contours and its extracted features.