Meng Chang Chen

10papers

36citations

Novelty39%

AI Score21

Ranked #192,129 of 205,806 authors (top 93%)#31,099 in CL (top 96%)

10 Papers

CLApr 25, 2022

Headline Diagnosis: Manipulation of Content Farm Headlines

Yu-Chieh Chen, Pei-Yu Huang, Chun Lin et al.

As technology grows faster, the news spreads through social media. In order to attract more readers and acquire additional profit, some news agencies reproduce massive news in a more appealing manner. Therefore, it is essential to accurately predict whether a news article is from official news agencies. This work develops a headline classification based on Convoluted Neural Network to determine credibility of a news article. The model primarily focuses on investigating key factors from headlines. These factors include word segmentation, part-of-speech tags, and sentiment features. With integrating these features into the proposed classification model, the demonstrated evaluation achieves 93.99% for accuracy.

CRDec 24, 2019

Integration of Static and Dynamic Analysis for Malware Family Classification with Composite Neural Network

Yao Saint Yen, Zhe Wei Chen, Ying Ren Guo et al.

Deep learning has been used in the research of malware analysis. Most classification methods use either static analysis features or dynamic analysis features for malware family classification, and rarely combine them as classification features and also no extra effort is spent integrating the two types of features. In this paper, we combine static and dynamic analysis features with deep neural networks for Windows malware classification. We develop several methods to generate static and dynamic analysis features to classify malware in different ways. Given these features, we conduct experiments with composite neural network, showing that the proposed approach performs best with an accuracy of 83.17% on a total of 80 malware families with 4519 malware samples. Additionally, we show that using integrated features for malware family classification outperforms using static features or dynamic features alone. We show how static and dynamic features complement each other for malware classification.

CRDec 16, 2019

Learning Malware Representation based on Execution Sequences

Yi-Ting Huang, Ting-Yi Chen, Yeali S. Sun et al.

Malware analysis has been extensively investigated as the number and types of malware has increased dramatically. However, most previous studies use end-to-end systems to detect whether a sample is malicious, or to identify its malware family. In this paper, we propose a neural network framework composed of an embedder, an encoder, and a filter to learn malware representations from characteristic execution sequences for malware family classification. The embedder uses BERT and Sent2Vec, state-of-the-art embedding modules, to capture relations within a single API call and among consecutive API calls in an execution trace. The encoder comprises gated recurrent units (GRU) to preserve the ordinal position of API calls and a self-attention mechanism for comparing intra-relations among different positions of API calls. The filter identifies representative API calls to build the malware representation. We conduct broad experiments to determine the influence of individual framework components. The results show that the proposed framework outperforms the baselines, and also demonstrates that considering Sent2Vec to learn complete API call embeddings and GRU to explicitly preserve ordinal information yields more information and thus significant improvements. Also, the proposed approach effectively classifies new malicious execution traces on the basis of similarities with previously collected families.

LGOct 22, 2019

Composite Neural Network: Theory and Application to PM2.5 Prediction

Ming-Chuan Yang, Meng Chang Chen

This work investigates the framework and performance issues of the composite neural network, which is composed of a collection of pre-trained and non-instantiated neural network models connected as a rooted directed acyclic graph for solving complicated applications. A pre-trained neural network model is generally well trained, targeted to approximate a specific function. Despite a general belief that a composite neural network may perform better than a single component, the overall performance characteristics are not clear. In this work, we construct the framework of a composite network, and prove that a composite neural network performs better than any of its pre-trained components with a high probability bound. In addition, if an extra pre-trained component is added to a composite network, with high probability, the overall performance will not be degraded. In the study, we explore a complicated application -- PM2.5 prediction -- to illustrate the correctness of the proposed composite network theory. In the empirical evaluations of PM2.5 prediction, the constructed composite neural network models support the proposed theory and perform better than other machine learning models, demonstrate the advantages of the proposed framework.

LGOct 18, 2019

Theoretical Investigation of Composite Neural Network

Ming-Chuan Yang, Meng Chang Chen

This work theoretically investigates the performance of a composite neural network. A composite neural network is a rooted directed acyclic graph combining a set of pre-trained and non-instantiated neural network models, where a pre-trained neural network model is well-crafted for a specific task and targeted to approximate a specific function with instantiated weights. The advantages of adopting such a pre-trained model in a composite neural network are two folds. One is to benefit from other's intelligence and diligence, and the other is saving the efforts in data preparation and resources and time in training. However, the overall performance of composite neural network is still not clear. In this work, we prove that a composite neural network, with high probability, performs better than any of its pre-trained components under certain assumptions. In addition, if an extra pre-trained component is added to a composite network, with high probability the overall performance will be improved. In the empirical evaluations, distinctively different applications support the above findings.

HCAug 29, 2018

Bringing personalized learning into computer-aided question generation

Yi-Ting Huang, Meng Chang Chen, Yeali S. Sun

This paper proposes a novel and statistical method of ability estimation based on acquisition distribution for a personalized computer aided question generation. This method captures the learning outcomes over time and provides a flexible measurement based on the acquisition distributions instead of precalibration. Compared to the previous studies, the proposed method is robust, especially when an ability of a student is unknown. The results from the empirical data show that the estimated abilities match the actual abilities of learners, and the pretest and post-test of the experimental group show significant improvement. These results suggest that this method can serves as the ability estimation for a personalized computer-aided testing environment.

CLAug 29, 2018

Development and Evaluation of a Personalized Computer-aided Question Generation for English Learners to Improve Proficiency and Correct Mistakes

Yi-Ting Huang, Meng Chang Chen, Yeali S. Sun

In the last several years, the field of computer assisted language learning has increasingly focused on computer aided question generation. However, this approach often provides test takers with an exhaustive amount of questions that are not designed for any specific testing purpose. In this work, we present a personalized computer aided question generation that generates multiple choice questions at various difficulty levels and types, including vocabulary, grammar and reading comprehension. In order to improve the weaknesses of test takers, it selects questions depending on an estimated proficiency level and unclear concepts behind incorrect responses. This results show that the students with the personalized automatic quiz generation corrected their mistakes more frequently than ones only with computer aided question generation. Moreover, students demonstrated the most progress between the pretest and post test and correctly answered more difficult questions. Finally, we investigated the personalizing strategy and found that a student could make a significant progress if the proposed system offered the vocabulary questions at the same level of his or her proficiency level, and if the grammar and reading comprehension questions were at a level lower than his or her proficiency level.

CLAug 29, 2018

Characterizing the Influence of Features on Reading Difficulty Estimation for Non-native Readers

Yi-Ting Huang, Meng Chang Chen, Yeali S. Sun

In recent years, the number of people studying English as a second language (ESL) has surpassed the number of native speakers. Recent work have demonstrated the success of providing personalized content based on reading difficulty, such as information retrieval and summarization. However, almost all prior studies of reading difficulty are designed for native speakers, rather than non-native readers. In this study, we investigate various features for ESL readers, by conducting a linear regression to estimate the reading level of English language sources. This estimation is based not only on the complexity of lexical and syntactic features, but also several novel concepts, including the age of word and grammar acquisition from several sources, word sense from WordNet, and the implicit relation between sentences. By employing Bayesian Information Criterion (BIC) to select the optimal model, we find that the combination of the number of words, the age of word acquisition and the height of the parsing tree generate better results than alternative competing models. Thus, our results show that proposed second language reading difficulty estimation outperforms other first language reading difficulty estimations.

LGAug 21, 2018

Wrapped Loss Function for Regularizing Nonconforming Residual Distributions

Chun Ting Liu, Ming Chuan Yang, Meng Chang Chen

Multi-output is essential in machine learning that it might suffer from nonconforming residual distributions, i.e., the multi-output residual distributions are not conforming to the expected distribution. In this paper, we propose "Wrapped Loss Function" to wrap the original loss function to alleviate the problem. This wrapped loss function acts just like the original loss function that its gradient can be used for backpropagation optimization. Empirical evaluations show wrapped loss function has advanced properties of faster convergence, better accuracy, and improving imbalanced data.

CRMay 4, 2017

Virtual Machine Introspection Based Malware Behavior Profiling and Family Grouping

Shun-Wen Hsiao, Yeali S. Sun, Meng Chang Chen

The proliferation of malwares have been attributed to the alternations of a handful of original malware source codes. The malwares alternated from the same origin share some intrinsic behaviors and form a malware family. Expediently, identifying its malware family when a malware is first seen on the Internet can provide useful clues to mitigate the threat. In this paper, a malware profiler (VMP) is proposed to profile the execution behaviors of a malware by leveraging virtual machine introspection (VMI) technique. The VMP inserts plug-ins inside the virtual machine monitor (VMM) to record the invoked API calls with their input parameters and return values as the profile of malware. In this paper, a popular similarity measurement Jaccard distance and a phylogenetic tree construction method are adopted to discover malware families. The studies of malware profiles show the malwares from a malware family are very similar to each others and distinct from other malware families as well as benign software. This paper also examines VMP against existing anti-malware detection engines and some well-known malware grouping methods to compare the goodness in their malware family constructions. A peer voting approach is proposed and the results show VMP is better than almost all of the compared anti-malware engines, and compatible with the fine tuned text-mining approach and high order N-gram approaches. We also establish a malware profiling website based on VMP for malware research.