SEAug 5, 2023Code
An Empirical Study of AI-based Smart Contract CreationRabimba Karanjai, Edward Li, Lei Xu et al.
The introduction of large language models (LLMs) like ChatGPT and Google Palm2 for smart contract generation seems to be the first well-established instance of an AI pair programmer. LLMs have access to a large number of open-source smart contracts, enabling them to utilize more extensive code in Solidity than other code generation tools. Although the initial and informal assessments of LLMs for smart contract generation are promising, a systematic evaluation is needed to explore the limits and benefits of these models. The main objective of this study is to assess the quality of generated code provided by LLMs for smart contracts. We also aim to evaluate the impact of the quality and variety of input parameters fed to LLMs. To achieve this aim, we created an experimental setup for evaluating the generated code in terms of validity, correctness, and efficiency. Our study finds crucial evidence of security bugs getting introduced in the generated smart contracts as well as the overall quality and correctness of the code getting impacted. However, we also identified the areas where it can be improved. The paper also proposes several potential research directions to improve the process, quality and safety of generated smart contract codes.
LGSep 20, 2022
Audit and Improve Robustness of Private Neural Networks on Encrypted DataJiaqi Xue, Lei Xu, Lin Chen et al.
Performing neural network inference on encrypted data without decryption is one popular method to enable privacy-preserving neural networks (PNet) as a service. Compared with regular neural networks deployed for machine-learning-as-a-service, PNet requires additional encoding, e.g., quantized-precision numbers, and polynomial activation. Encrypted input also introduces novel challenges such as adversarial robustness and security. To the best of our knowledge, we are the first to study questions including (i) Whether PNet is more robust against adversarial inputs than regular neural networks? (ii) How to design a robust PNet given the encrypted input without decryption? We propose PNet-Attack to generate black-box adversarial examples that can successfully attack PNet in both target and untarget manners. The attack results show that PNet robustness against adversarial inputs needs to be improved. This is not a trivial task because the PNet model owner does not have access to the plaintext of the input values, which prevents the application of existing detection and defense methods such as input tuning, model normalization, and adversarial training. To tackle this challenge, we propose a new fast and accurate noise insertion method, called RPNet, to design Robust and Private Neural Networks. Our comprehensive experiments show that PNet-Attack reduces at least $2.5\times$ queries than prior works. We theoretically analyze our RPNet methods and demonstrate that RPNet can decrease $\sim 91.88\%$ attack success rate.
SEJul 6, 2024
Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance ComputingRabimba Karanjai, Aftab Hussain, Md Rafiqul Islam Rabin et al.
Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and complex logic. These factors make unit testing challenging and expensive, as it requires specialized knowledge and existing automated tools are often ineffective. To address this, we propose an automated method for generating unit tests for such software, considering their unique features like complex logic and parallel processing. Recently, large language models (LLMs) have shown promise in coding and testing. We explored the capabilities of Davinci (text-davinci-002) and ChatGPT (gpt-3.5-turbo) in creating unit tests for C++ parallel programs. Our results show that LLMs can generate mostly correct and comprehensive unit tests, although they have some limitations, such as repetitive assertions and blank test cases.
CYNov 20, 2022
Detecting Conspiracy Theory Against COVID-19 VaccinesMd Hasibul Amin, Harika Madanu, Sahithi Lavu et al.
Since the beginning of the vaccination trial, social media has been flooded with anti-vaccination comments and conspiracy beliefs. As the day passes, the number of COVID- 19 cases increases, and online platforms and a few news portals entertain sharing different conspiracy theories. The most popular conspiracy belief was the link between the 5G network spreading COVID-19 and the Chinese government spreading the virus as a bioweapon, which initially created racial hatred. Although some disbelief has less impact on society, others create massive destruction. For example, the 5G conspiracy led to the burn of the 5G Tower, and belief in the Chinese bioweapon story promoted an attack on the Asian-Americans. Another popular conspiracy belief was that Bill Gates spread this Coronavirus disease (COVID-19) by launching a mass vaccination program to track everyone. This Conspiracy belief creates distrust issues among laypeople and creates vaccine hesitancy. This study aims to discover the conspiracy theory against the vaccine on social platforms. We performed a sentiment analysis on the 598 unique sample comments related to COVID-19 vaccines. We used two different models, BERT and Perspective API, to find out the sentiment and toxicity of the sentence toward the COVID-19 vaccine.
CVJul 4, 2022
RAF: Recursive Adversarial Attacks on Face Recognition Using Extremely Limited QueriesKeshav Kasichainula, Hadi Mansourifar, Weidong Shi
Recent successful adversarial attacks on face recognition show that, despite the remarkable progress of face recognition models, they are still far behind the human intelligence for perception and recognition. It reveals the vulnerability of deep convolutional neural networks (CNNs) as state-of-the-art building block for face recognition models against adversarial examples, which can cause certain consequences for secure systems. Gradient-based adversarial attacks are widely studied before and proved to be successful against face recognition models. However, finding the optimized perturbation per each face needs to submitting the significant number of queries to the target model. In this paper, we propose recursive adversarial attack on face recognition using automatic face warping which needs extremely limited number of queries to fool the target model. Instead of a random face warping procedure, the warping functions are applied on specific detected regions of face like eyebrows, nose, lips, etc. We evaluate the robustness of proposed method in the decision-based black-box attack setting, where the attackers have no access to the model parameters and gradients, but hard-label predictions and confidence scores are provided by the target model.
48.4AIApr 23
Rethinking Publication: A Certification Framework for AI-Enabled ResearchYang Lu, Rabimba Karanjai, Lei Xu et al.
AI research pipelines now produce a growing share of publishable academic output, including work that meets existing peer-review standards for quality and novelty. Yet the publication system was built on the assumption of universal human authorship and lacks a principled way to evaluate knowledge produced through automated pipelines. This paper proposes a two-layer certification framework that separates knowledge quality assessment from grading of human contribution, allowing publication systems to handle pipeline-generated work consistently and transparently without creating new institutions. The paper uses normative-conceptual analysis, framework design under four explicit constraints, and dry-run validation on two representative submission cases spanning key attribution scenarios. The framework grades contributions as Category A (pipeline-reachable), Category B (requiring human direction at identifiable stages), and Category C (beyond current pipeline reach at the formulation stage). It also introduces benchmark slots for fully disclosed automated research as both a transparent publication track and a calibration instrument for reviewer judgment. Contribution grading is contemporaneous, based on pipeline capability at the time of submission. Dry-run validation shows that the framework can certify knowledge appropriately while tolerating irreducible attribution uncertainty. The paper argues that publication has always certified both that knowledge is valid and that a human made it. AI pipelines separate these functions for the first time. The framework is implementable within existing editorial infrastructure and grounds recognition of frontier human contribution in epistemic achievement rather than unverifiable claims of human origin.
DCNov 13, 2025
HPCAgentTester: A Multi-Agent LLM Approach for Enhanced HPC Unit Test GenerationRabimba Karanjai, Lei Xu, Weidong Shi
Unit testing in High-Performance Computing (HPC) is critical but challenged by parallelism, complex algorithms, and diverse hardware. Traditional methods often fail to address non-deterministic behavior and synchronization issues in HPC applications. This paper introduces HPCAgentTester, a novel multi-agent Large Language Model (LLM) framework designed to automate and enhance unit test generation for HPC software utilizing OpenMP and MPI. HPCAgentTester employs a unique collaborative workflow where specialized LLM agents (Recipe Agent and Test Agent) iteratively generate and refine test cases through a critique loop. This architecture enables the generation of context-aware unit tests that specifically target parallel execution constructs, complex communication patterns, and hierarchical parallelism. We demonstrate HPCAgentTester's ability to produce compilable and functionally correct tests for OpenMP and MPI primitives, effectively identifying subtle bugs that are often missed by conventional techniques. Our evaluation shows that HPCAgentTester significantly improves test compilation rates and correctness compared to standalone LLMs, offering a more robust and scalable solution for ensuring the reliability of parallel software systems.
SEMar 13, 2024
Teaching Machines to Code: Smart Contract Translation with LLMsRabimba Karanjai, Lei Xu, Weidong Shi
The advent of large language models (LLMs) has marked a significant milestone in the realm of artificial intelligence, with their capabilities often matching or surpassing human expertise in various domains. Among these achievements, their adeptness in translation tasks stands out, closely mimicking the intricate and preliminary processes undertaken by human translators to ensure the fidelity and quality of the translated content. Despite the advancements in utilizing LLMs for translating programming code across different languages, the domain of smart contract translation, particularly into languages not previously encountered by the LLM, remains largely unexplored. In our research, we present a pioneering approach, SolMover, which harnesses the synergy of two distinct LLMs within a unified framework. This framework is designed to grasp coding principles and apply this understanding to the translation of code into an unfamiliar language. Our study delves into the capacity of LLMs to mimic human learning processes, offering an in-depth evaluation of our methodology for converting smart contracts written in Solidity to Move, a language with limited resources. The framework employs one LLM to decipher coding conventions for the new language, creating a blueprint for the second LLM, which, lacking planning abilities, possesses coding expertise. The empirical evidence from our experiments suggests that SolMover substantially enhances performance compared to gpt-3.5-turbo-1106, and achieves superior results over competitors such as Palm2 and Mixtral-8x7B-Instruct. Additionally, our analysis highlights the efficacy of our bug mitigation strategy in elevating code quality across all models, even outside the SolMover framework.
CLMar 31, 2025
Synthesizing Public Opinions with LLMs: Role Creation, Impacts, and the Future to eDemorcacyRabimba Karanjai, Boris Shor, Amanda Austin et al.
This paper investigates the use of Large Language Models (LLMs) to synthesize public opinion data, addressing challenges in traditional survey methods like declining response rates and non-response bias. We introduce a novel technique: role creation based on knowledge injection, a form of in-context learning that leverages RAG and specified personality profiles from the HEXACO model and demographic information, and uses that for dynamically generated prompts. This method allows LLMs to simulate diverse opinions more accurately than existing prompt engineering approaches. We compare our results with pre-trained models with standard few-shot prompts. Experiments using questions from the Cooperative Election Study (CES) demonstrate that our role-creation approach significantly improves the alignment of LLM-generated opinions with real-world human survey responses, increasing answer adherence. In addition, we discuss challenges, limitations and future research directions.
CRFeb 22, 2025
Securing Smart Contract Languages with a Unified Agentic Framework for Vulnerability Repair in Solidity and MoveRabimba Karanjai, Lei Xu, Weidong Shi
The rapid growth of the blockchain ecosystem and the increasing value locked in smart contracts necessitate robust security measures. While languages like Solidity and Move aim to improve smart contract security, vulnerabilities persist. This paper presents Smartify, a novel multi-agent framework leveraging Large Language Models (LLMs) to automatically detect and repair vulnerabilities in Solidity and Move smart contracts. Unlike traditional methods that rely solely on vast pre-training datasets, Smartify employs a team of specialized agents working on different specially fine-tuned LLMs to analyze code based on underlying programming concepts and language-specific security principles. We evaluated Smartify on a dataset for Solidity and a curated dataset for Move, demonstrating its effectiveness in fixing a wide range of vulnerabilities. Our results show that Smartify (Gemma2+codegemma) achieves state-of-the-art performance, surpassing existing LLMs and enhancing general-purpose models' capabilities, such as Llama 3.1. Notably, Smartify can incorporate language-specific knowledge, such as the nuances of Move, without requiring massive language-specific pre-training datasets. This work offers a detailed analysis of various LLMs' performance on smart contract repair, highlighting the strengths of our multi-agent approach and providing a blueprint for developing more secure and reliable decentralized applications in the growing blockchain landscape. We also provide a detailed recipe for extending this to other similar use cases.
SEDec 16, 2024
LogBabylon: A Unified Framework for Cross-Log File Integration and AnalysisRabimba Karanjai, Yang Lu, Dana Alsagheer et al.
Logs are critical resources that record events, activities, or messages produced by software applications, operating systems, servers, and network devices. However, consolidating the heterogeneous logs and cross-referencing them is challenging and complicated. Manually analyzing the log data is time-consuming and prone to errors. LogBabylon is a centralized log data consolidating solution that leverages Large Language Models (LLMs) integrated with Retrieval-Augmented Generation (RAG) technology. LogBabylon interprets the log data in a human-readable way and adds insight analysis of the system performance and anomaly alerts. It provides a paramount view of the system landscape, enabling proactive management and rapid incident response. LogBabylon consolidates diverse log sources and enhances the extracted information's accuracy and relevancy. This facilitates a deeper understanding of log data, supporting more effective decision-making and operational efficiency. Furthermore, LogBabylon streamlines the log analysis process, significantly reducing the time and effort required to interpret complex datasets. Its capabilities extend to generating context-aware insights, offering an invaluable tool for continuous monitoring, performance optimization, and security assurance in dynamic computing environments.
CYApr 17, 2025
Governance Challenges in Reinforcement Learning from Human Feedback: Evaluator Rationality and Reinforcement StabilityDana Alsagheer, Abdulrahman Kamal, Mohammad Kamal et al.
Reinforcement Learning from Human Feedback (RLHF) is central in aligning large language models (LLMs) with human values and expectations. However, the process remains susceptible to governance challenges, including evaluator bias, inconsistency, and the unreliability of feedback. This study examines how the cognitive capacity of evaluators, specifically their level of rationality, affects the stability of reinforcement signals. A controlled experiment comparing high-rationality and low-rationality participants reveals that evaluators with higher rationality scores produce significantly more consistent and expert-aligned feedback. In contrast, lower-rationality participants demonstrate considerable variability in their reinforcement decisions ($p < 0.01$). To address these challenges and improve RLHF governance, we recommend implementing evaluator pre-screening, systematic auditing of feedback consistency, and reliability-weighted reinforcement aggregation. These measures enhance the fairness, transparency, and robustness of AI alignment pipelines.
AIMar 14, 2025
Collaboration is all you need: LLM Assisted Safe Code TranslationRabimba Karanjai, Sam Blackshear, Lei Xu et al.
This paper introduces UniTranslator, a visionary framework that re-imagines code translation as a collaborative endeavor among multiple, compact LLMs. By orchestrating the interaction of specialized agents, each focused on different aspects of the translation process and grounded in a deep understanding of programming concepts, UniTranslator achieves a level of accuracy and efficiency that rivals larger, monolithic models. Our preliminary evaluation demonstrates the potential of UniTranslator to overcome the limitations of existing approaches and unlock the power of smaller LLMs for complex code translation tasks. We explore the effectiveness of this dynamic multi-agent paradigm in handling diverse language pairs, including low-resource languages, and in mitigating common issues such as code artifacts and hallucinations through the use of Natural Language Inference (NLI) grounding and iterative feedback mechanisms
AIOct 14, 2025
Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language ModelsRabimba Karanjai, Yang Lu, Ranjith Chodavarapu et al.
The rapid advancement of large language model (LLM) technology has led to diverse applications, many of which inherently require randomness, such as stochastic decision-making, gaming, scheduling, AI agents, and cryptography-related tasks. However, the capabilities of LLMs in handling randomness, particularly in generating and utilizing random numbers effectively, remain unclear. This paper investigates the capacity of LLMs for handling tasks that involve randomness through a series of experiments. We designed a set of experiments that consider various factors that can influence an LLM's performance in tasks involving randomness, such as accessibility to external tools, types of tasks, model states (fresh vs. non-fresh), and prompting strategies. The experiments cover a range of tasks, including generating random numbers, generating random strings such as passwords, shuffling items, and evaluating the quality of randomness using entropy and the NIST randomness test-suite. Our findings reveal that while LLMs can generate outputs that exhibit some degree of randomness, their performance is inconsistent and often deviates significantly from the expected behavior. The analysis of the experimental results highlights key limitations and areas where improvement is needed for the LLMs to effectively handle tasks involving randomness
CLJun 16, 2025
From General Reasoning to Domain Expertise: Uncovering the Limits of Generalization in Large Language ModelsDana Alsagheer, Yang Lu, Abdulrahman Kamal et al.
Recent advancements in Large Language Models (LLMs) have demonstrated remarkable capabilities in various domains. However, effective decision-making relies heavily on strong reasoning abilities. Reasoning is the foundation for decision-making, providing the analytical and logical framework to make sound choices. Reasoning involves analyzing information, drawing inferences, and reaching conclusions based on logic or evidence. Decision-making builds on this foundation by applying the insights from reasoning to select the best course of action among alternatives. Together, these processes create a continuous cycle of thought and action aimed at achieving goals effectively. As AI technology evolves, there is a growing trend to train LLMs to excel in general reasoning. This study explores how the general reasoning capabilities of LLMs connect to their performance in domain-specific reasoning tasks.
LGMar 16, 2024
LookALike: Human Mimicry based collaborative decision makingRabimba Karanjai, Weidong Shi
Artificial General Intelligence falls short when communicating role specific nuances to other systems. This is more pronounced when building autonomous LLM agents capable and designed to communicate with each other for real world problem solving. Humans can communicate context and domain specific nuances along with knowledge, and that has led to refinement of skills. In this work we propose and evaluate a novel method that leads to knowledge distillation among LLM agents leading to realtime human role play preserving unique contexts without relying on any stored data or pretraining. We also evaluate how our system performs better in simulated real world tasks compared to state of the art.
CLFeb 21, 2022
Counter Hate Speech in Social Media: A SurveyDana Alsagheer, Hadi Mansourifar, Weidong Shi
With the high prevalence of offensive language against minorities in social media, counter-hate speeches (CHS) generation is considered an automatic way of tackling this challenge. The CHS is supposed to appear as a third voice to educate people and keep the social [red lines bold] without limiting the principles of freedom of speech. In this paper, we review the most important research in the past and present with a main focus on methodologies, collected datasets and statistical analysis CHS's impact on social media. The CHS generation is based on the optimistic assumption that any attempt to intervene the hate speech in social media can play a positive role in this context. Beyond that, previous works ignored the investigation of the sequence of comments before and after the CHS. However, the positive impact is not guaranteed, as shown in some previous works. To the best of our knowledge, no attempt has been made to survey the related work to compare the past research in terms of CHS's impact on social media. We take the first step in this direction by providing a comprehensive review on related works and categorizing them based on different factors including impact, methodology, data source, etc.
LGJun 24, 2021
Hate Speech Detection in ClubhouseHadi Mansourifar, Dana Alsagheer, Reza Fathi et al.
With the rise of voice chat rooms, a gigantic resource of data can be exposed to the research community for natural language processing tasks. Moderators in voice chat rooms actively monitor the discussions and remove the participants with offensive language. However, it makes the hate speech detection even more difficult since some participants try to find creative ways to articulate hate speech. This makes the hate speech detection challenging in new social media like Clubhouse. To the best of our knowledge all the hate speech datasets have been collected from text resources like Twitter. In this paper, we take the first step to collect a significant dataset from Clubhouse as the rising star in social media industry. We analyze the collected instances from statistical point of view using the Google Perspective Scores. Our experiments show that, the Perspective Scores can outperform Bag of Words and Word2Vec as high level text features.
CLJun 22, 2021
Statistical Analysis of Perspective Scores on Hate Speech DetectionHadi Mansourifar, Dana Alsagheer, Weidong Shi et al.
Hate speech detection has become a hot topic in recent years due to the exponential growth of offensive language in social media. It has proven that, state-of-the-art hate speech classifiers are efficient only when tested on the data with the same feature distribution as training data. As a consequence, model architecture plays the second role to improve the current results. In such a diverse data distribution relying on low level features is the main cause of deficiency due to natural bias in data. That's why we need to use high level features to avoid a biased judgement. In this paper, we statistically analyze the Perspective Scores and their impact on hate speech detection. We show that, different hate speech datasets are very similar when it comes to extract their Perspective Scores. Eventually, we prove that, over-sampling the Perspective Scores of a hate speech dataset can significantly improve the generalization performance when it comes to be tested on other hate speech datasets.
CVOct 31, 2020
ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for Face RecognitionWeidong Shi, Guanghui Ren, Yunpeng Chen et al.
Knowledge Distillation (KD) refers to transferring knowledge from a large model to a smaller one, which is widely used to enhance model performance in machine learning. It tries to align embedding spaces generated from the teacher and the student model (i.e. to make images corresponding to the same semantics share the same embedding across different models). In this work, we focus on its application in face recognition. We observe that existing knowledge distillation models optimize the proxy tasks that force the student to mimic the teacher's behavior, instead of directly optimizing the face recognition accuracy. Consequently, the obtained student models are not guaranteed to be optimal on the target task or able to benefit from advanced constraints, such as large margin constraints (e.g. margin-based softmax). We then propose a novel method named ProxylessKD that directly optimizes face recognition accuracy by inheriting the teacher's classifier as the student's classifier to guide the student to learn discriminative embeddings in the teacher's embedding space. The proposed ProxylessKD is very easy to implement and sufficiently generic to be extended to other tasks beyond face recognition. We conduct extensive experiments on standard face recognition benchmarks, and the results demonstrate that ProxylessKD achieves superior performance over existing knowledge distillation methods.
LGAug 23, 2020
Towards Stable Imbalanced Data Classification via Virtual Big Data ProjectionHadi Mansourifar, Weidong Shi
Virtual Big Data (VBD) proved to be effective to alleviate mode collapse and vanishing generator gradient as two major problems of Generative Adversarial Neural Networks (GANs) very recently. In this paper, we investigate the capability of VBD to address two other major challenges in Machine Learning including deep autoencoder training and imbalanced data classification. First, we prove that, VBD can significantly decrease the validation loss of autoencoders via providing them a huge diversified training data which is the key to reach better generalization to minimize the over-fitting problem. Second, we use the VBD to propose the first projection-based method called cross-concatenation to balance the skewed class distributions without over-sampling. We prove that, cross-concatenation can solve uncertainty problem of data driven methods for imbalanced classification.
CVAug 23, 2020
Vulnerability of Face Recognition Systems Against Composite Face Reconstruction AttackHadi Mansourifar, Weidong Shi
Rounding confidence score is considered trivial but a simple and effective countermeasure to stop gradient descent based image reconstruction attacks. However, its capability in the face of more sophisticated reconstruction attacks is an uninvestigated research area. In this paper, we prove that, the face reconstruction attacks based on composite faces can reveal the inefficiency of rounding policy as countermeasure. We assume that, the attacker takes advantage of face composite parts which helps the attacker to get access to the most important features of the face or decompose it to the independent segments. Afterwards, decomposed segments are exploited as search parameters to create a search path to reconstruct optimal face. Face composition parts enable the attacker to violate the privacy of face recognition models even with a blind search. However, we assume that, the attacker may take advantage of random search to reconstruct the target face faster. The algorithm is started with random composition of face parts as initial face and confidence score is considered as fitness value. Our experiments show that, since the rounding policy as countermeasure can't stop the random search process, current face recognition systems are extremely vulnerable against such sophisticated attacks. To address this problem, we successfully test Face Detection Score Filtering (FDSF) as a countermeasure to protect the privacy of training data against proposed attack.
CVMar 27, 2020
One-Shot GAN Generated Fake Face DetectionHadi Mansourifar, Weidong Shi
Fake face detection is a significant challenge for intelligent systems as generative models become more powerful every single day. As the quality of fake faces increases, the trained models become more and more inefficient to detect the novel fake faces, since the corresponding training data is considered outdated. In this case, robust One-Shot learning methods is more compatible with the requirements of changeable training data. In this paper, we propose a universal One-Shot GAN generated fake face detection method which can be used in significantly different areas of anomaly detection. The proposed method is based on extracting out-of-context objects from faces via scene understanding models. To do so, we use state of the art scene understanding and object detection methods as a pre-processing tool to detect the weird objects in the face. Second, we create a bag of words given all the detected out-of-context objects per all training data. This way, we transform each image into a sparse vector where each feature represents the confidence score related to each detected object in the image. Our experiments show that, we can discriminate fake faces from real ones in terms of out-of-context features. It means that, different sets of objects are detected in fake faces comparing to real ones when we analyze them with scene understanding and object detection models. We prove that, the proposed method can outperform previous methods based on our experiments on Style-GAN generated fake faces.
LGMar 22, 2020
Deep Synthetic Minority Over-Sampling TechniqueHadi Mansourifar, Weidong Shi
Synthetic Minority Over-sampling Technique (SMOTE) is the most popular over-sampling method. However, its random nature makes the synthesized data and even imbalanced classification results unstable. It means that in case of running SMOTE n different times, n different synthesized in-stances are obtained with n different classification results. To address this problem, we adapt the SMOTE idea in deep learning architecture. In this method, a deep neural network regression model is used to train the inputs and outputs of traditional SMOTE. Inputs of the proposed deep regression model are two randomly chosen data points which are concatenated to form a double size vector. The outputs of this model are corresponding randomly interpolated data points between two randomly chosen vectors with original dimension. The experimental results show that, Deep SMOTE can outperform traditional SMOTE in terms of precision, F1 score and Area Under Curve (AUC) in majority of test cases.
AIMar 14, 2020
Hybrid Cryptocurrency Pump and Dump DetectionHadi Mansourifar, Lin Chen, Weidong Shi
Increasingly growing Cryptocurrency markets have become a hive for scammers to run pump and dump schemes which is considered as an anomalous activity in exchange markets. Anomaly detection in time series is challenging since existing methods are not sufficient to detect the anomalies in all contexts. In this paper, we propose a novel hybrid pump and dump detection method based on distance and density metrics. First, we propose a novel automatic thresh-old setting method for distance-based anomaly detection. Second, we propose a novel metric called density score for density-based anomaly detection. Finally, we exploit the combination of density and distance metrics successfully as a hybrid approach. Our experiments show that, the proposed hybrid approach is reliable to detect the majority of alleged P & D activities in top ranked exchange pairs by outperforming both density-based and distance-based methods.
CRJan 22, 2020
Nonlinear Blockchain Scalability: a Game-Theoretic PerspectiveLin Chen, Lei Xu, Zhimin Gao et al.
Recent advances in the blockchain research have been made in two important directions. One is refined resilience analysis utilizing game theory to study the consequences of selfish behaviors of users (miners), and the other is the extension from a linear (chain) structure to a non-linear (graphical) structure for performance improvements, such as IOTA and Graphcoin. The first question that comes to people's minds is what improvements that a blockchain system would see by leveraging these new advances. In this paper, we consider three major metrics for a blockchain system: full verification, scalability, and finality-duration. We { establish a formal framework and} prove that no blockchain system can achieve full verification, high scalability, and low finality-duration simultaneously. We observe that classical blockchain systems like Bitcoin achieves full verification and low finality-duration, Harmony and Ethereum 2.0 achieve low finality-duration and high scalability. As a complementary, we design a non-linear blockchain system that achieves full verification and scalability. We also establish, for the first time, the trade-off between scalability and finality-duration.
CRSep 14, 2019
Private and Atomic Exchange of Assets over Zero Knowledge Based Payment LedgerZhimin Gao, Lei Xu, Keshav Kasichainula et al.
Bitcoin brings a new type of digital currency that does not rely on a central system to maintain transactions. By benefiting from the concept of decentralized ledger, users who do not know or trust each other can still conduct transactions in a peer-to-peer manner. Inspired by Bitcoin, other cryptocurrencies were invented in recent years such as Ethereum, Dash, Zcash, Monero, Grin, etc. Some of these focus on enhancing privacy for instance crypto note or systems that apply the similar concept of encrypted notes used for transactions to enhance privacy (e.g., Zcash, Monero). However, there are few mechanisms to support the exchange of privacy-enhanced notes or assets on the chain, and at the same time preserving the privacy of the exchange operations. Existing approaches for fair exchanges of assets with privacy mostly rely on off-chain/side-chain, escrow or centralized services. Thus, we propose a solution that supports oblivious and privacy-protected fair exchange of crypto notes or privacy enhanced crypto assets. The technology is demonstrated by extending zero-knowledge based crypto notes. To address "privacy" and "multi-currency", we build a new zero-knowledge proving system and extend note format with new property to represent various types of tokenized assets or cryptocurrencies. By extending the payment protocol, exchange operations are realized through privacy enhanced transactions (e.g., shielded transactions). Based on the possible scenarios during the exchange operation, we add new constraints and conditions to the zero-knowledge proving system used for validating transactions publicly.
CRNov 12, 2018
Efficient Public Blockchain Client for Lightweight UsersLei Xu, Lin Chen, Zhimin Gao et al.
Public blockchains provide a decentralized method for storing transaction data and have many applications in different sectors. In order for users to track transactions, a simple method is to let them keep a local copy of the entire public ledger. Since the size of the ledger keeps growing, this method becomes increasingly less practical, especially for lightweight users such as IoT devices and smartphones. In order to cope with the problem, several solutions have been proposed to reduce the storage burden. However, existing solutions either achieve a limited storage reduction (e.g., simple payment verification), or rely on some strong security assumption (e.g., the use of trusted server). In this paper, we propose a new approach to solving the problem. Specifically, we propose an \underline{e}fficient verification protocol for \underline{p}ublic \underline{b}lock\underline{c}hains, or EPBC for short. EPBC is particularly suitable for lightweight users, who only need to store a small amount of data that is {\it independent of} the size of the blockchain. We analyze EPBC's performance and security, and discuss its integration with existing public ledger systems. Experimental results confirm that EPBC is practical for lightweight users.
DSNov 7, 2018
Election with Bribed Voter Uncertainty: Hardness and Approximation AlgorithmLin Chen, Lei Xu, Shouhuai Xu et al.
Bribery in election (or computational social choice in general) is an important problem that has received a considerable amount of attention. In the classic bribery problem, the briber (or attacker) bribes some voters in attempting to make the briber's designated candidate win an election. In this paper, we introduce a novel variant of the bribery problem, "Election with Bribed Voter Uncertainty" or BVU for short, accommodating the uncertainty that the vote of a bribed voter may or may not be counted. This uncertainty occurs either because a bribed voter may not cast its vote in fear of being caught, or because a bribed voter is indeed caught and therefore its vote is discarded. As a first step towards ultimately understanding and addressing this important problem, we show that it does not admit any multiplicative $O(1)$-approximation algorithm modulo standard complexity assumptions. We further show that there is an approximation algorithm that returns a solution with an additive-$ε$ error in FPT time for any fixed $ε$.
LGNov 5, 2018
Toward Efficient Breast Cancer Diagnosis and Survival Prediction Using L-PerceptronHadi Mansourifar, Weidong Shi
Breast cancer is the most frequently reported cancer type among the women around the globe and beyond that it has the second highest female fatality rate among all cancer types. Despite all the progresses made in prevention and early intervention, early prognosis and survival prediction rates are still unsatisfactory. In this paper, we propose a novel type of perceptron called L-Perceptron which outperforms all the previous supervised learning methods by reaching 97.42 \% and 98.73 \% in terms of accuracy and sensitivity, respectively in Wisconsin Breast Cancer dataset. Experimental results on Haberman's Breast Cancer Survival dataset, show the superiority of proposed method by reaching 75.18 \% and 83.86 \% in terms of accuracy and F1 score, respectively. The results are the best reported ones obtained in 10-fold cross validation in absence of any preprocessing or feature selection.