Tran Phuong Thao

CL
7papers
1,014citations
Novelty42%
AI Score25

7 Papers

HCSep 22, 2020
Influences of Temporal Factors on GPS-based Human Mobility Lifestyle

Tran Phuong Thao

Analysis of human mobility from GPS trajectories becomes crucial in many aspects such as policy planning for urban citizens, location-based service recommendation/prediction, and especially mitigating the spread of biological and mobile viruses. In this paper, we propose a method to find temporal factors affecting the human mobility lifestyle. We collected GPS data from 100 smartphone users in Japan. We designed a model that consists of 13 temporal patterns. We then applied a multiple linear regression and found that people tend to keep their mobility habits on Thursday and the days in the second week of a month but tend to lose their habits on Friday. We also explained some reasons behind these findings.

CRSep 17, 2020
Location-based Behavioral Authentication Using GPS Distance Coherence

Tran Phuong Thao

Most of the current user authentication systems are based on PIN code, password, or biometrics traits which can have some limitations in usage and security. Lifestyle authentication has become a new research approach. A promising idea for it is to use the location history since it is relatively unique. Even when people are living in the same area or have occasional travel, it does not vary from day to day. For Global Positioning System (GPS) data, the previous work used the longitude, the latitude, and the timestamp as the features for the classification. In this paper, we investigate a new approach utilizing the distance coherence which can be extracted from the GPS itself without the need to require other information. We applied three ensemble classification RandomForest, ExtraTrees, and Bagging algorithms; and the experimental result showed that the approach can achieve 99.42%, 99.12%, and 99.25% of accuracy, respectively.

CRSep 17, 2020
Improving Homograph Attack Classification

Tran Phuong Thao

A visual homograph attack is a way that the attacker deceives the web users about which domain they are visiting by exploiting forged domains that look similar to the genuine domains. T. Thao et al. (IFIP SEC'19) proposed a homograph classification by applying conventional supervised learning algorithms on the features extracted from a single-character-based Structural Similarity Index (SSIM). This paper aims to improve the classification accuracy by combining their SSIM features with 199 features extracted from a N-gram model and applying advanced ensemble learning algorithms. The experimental result showed that our proposed method could enhance even 1.81% of accuracy and reduce 2.15% of false-positive rate. Furthermore, existing work applied machine learning on some features without being able to explain why applying it can improve the accuracy. Even though the accuracy could be improved, understanding the ground-truth is also crucial. Therefore, in this paper, we conducted an error empirical analysis and could obtain several findings behind our proposed approach.

CLDec 19, 2019
Identifying Adversarial Sentences by Analyzing Text Complexity

Hoang-Quoc Nguyen-Son, Tran Phuong Thao, Seira Hidano et al.

Attackers create adversarial text to deceive both human perception and the current AI systems to perform malicious purposes such as spam product reviews and fake political posts. We investigate the difference between the adversarial and the original text to prevent the risk. We prove that the text written by a human is more coherent and fluent. Moreover, the human can express the idea through the flexible text with modern words while a machine focuses on optimizing the generated text by the simple and common words. We also suggest a method to identify the adversarial text by extracting the features related to our findings. The proposed method achieves high performance with 82.0% of accuracy and 18.4% of equal error rate, which is better than the existing methods whose the best accuracy is 77.0% corresponding to the error rate 22.8%.

CLOct 15, 2019
Detecting Machine-Translated Text using Back Translation

Hoang-Quoc Nguyen-Son, Tran Phuong Thao, Seira Hidano et al.

Machine-translated text plays a crucial role in the communication of people using different languages. However, adversaries can use such text for malicious purposes such as plagiarism and fake review. The existing methods detected a machine-translated text only using the text's intrinsic content, but they are unsuitable for classifying the machine-translated and human-written texts with the same meanings. We have proposed a method to extract features used to distinguish machine/human text based on the similarity between the intrinsic text and its back-translation. The evaluation of detecting translated sentences with French shows that our method achieves 75.0% of both accuracy and F-score. It outperforms the existing methods whose the best accuracy is 62.8% and the F-score is 62.7%. The proposed method even detects more efficiently the back-translated text with 83.4% of accuracy, which is higher than 66.7% of the best previous accuracy. We also achieve similar results not only with F-score but also with similar experiments related to Japanese. Moreover, we prove that our detector can recognize both machine-translated and machine-back-translated texts without the language information which is used to generate these machine texts. It demonstrates the persistence of our method in various applications in both low- and rich-resource languages.

CLApr 24, 2019
Detecting Machine-Translated Paragraphs by Matching Similar Words

Hoang-Quoc Nguyen-Son, Tran Phuong Thao, Seira Hidano et al.

Machine-translated text plays an important role in modern life by smoothing communication from various communities using different languages. However, unnatural translation may lead to misunderstanding, a detector is thus needed to avoid the unfortunate mistakes. While a previous method measured the naturalness of continuous words using a N-gram language model, another method matched noncontinuous words across sentences but this method ignores such words in an individual sentence. We have developed a method matching similar words throughout the paragraph and estimating the paragraph-level coherence, that can identify machine-translated text. Experiment evaluates on 2000 English human-generated and 2000 English machine-translated paragraphs from German showing that the coherence-based method achieves high performance (accuracy = 87.0%; equal error rate = 13.0%). It is efficiently better than previous methods (best accuracy = 72.4%; equal error rate = 29.7%). Similar experiments on Dutch and Japanese obtain 89.2% and 97.9% accuracy, respectively. The results demonstrate the persistence of the proposed method in various languages with different resource levels.

CRApr 24, 2019
Influences of Human Demographics, Brand Familiarity and Security Backgrounds on Homograph Recognition

Tran Phuong Thao, Yukiko Sawaya, Hoang-Quoc Nguyen-Son et al.

Homograph attack is a way that attackers deceive victims about which website domain name they are communicating with by exploiting the fact that many characters look alike. The attack becomes serious and is raising broad attention when recently many brand domains have been attacked such as Apple Inc., Adobe Inc., Lloyds Bank, etc. We first design a survey of human demographics, brand familiarity, and security backgrounds and apply it to 2,067 participants. We build a regression model to study which factors affect participants' ability in recognizing homograph domains. We find that for different levels of visual similarity, the participants exhibit different abilities. 13.95% of participants can recognize non-homographs while 16.60% of participants can recognize homographs whose the visual similarity with the target brand domains is under 99.9%; but when the similarity increases to 99.9%, the number of participants who can recognize homographs significantly drops down to only 0.19%; and for the homographs with 100% of visual similarity, there is no way for the participants to recognize. We also find that female participants tend to recognize homographs better the male but male participants tend to able to recognize non-homographs better than females. Security knowledge is a significant factor affecting both homographs and non-homographs; surprisingly, people who have strong security knowledge tend to be able to recognize homographs but not non-homographs. Furthermore, people who work or are educated in computer science or computer engineering do not appear as a factor affecting the ability in recognizing homographs; however, interestingly, right after they are explained about the homograph attack, people who work or are educated in computer science or computer engineering are the ones who can capture the situation the most quickly.