CLDec 19, 2022
NusaCrowd: Open Source Initiative for Indonesian NLP ResourcesSamuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji et al. · nvidia
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.
HCJun 4, 2024
WEIRD ICWSM: How Western, Educated, Industrialized, Rich, and Democratic is Social Computing Research?Ali Akbar Septiandri, Marios Constantinides, Daniele Quercia
Much of the research in social computing analyzes data from social media platforms, which may inherently carry biases. An overlooked source of such bias is the over-representation of WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations, which might not accurately mirror the global demographic diversity. We evaluated the dependence on WEIRD populations in research presented at the AAAI ICWSM conference; the only venue whose proceedings are fully dedicated to social computing research. We did so by analyzing 494 papers published from 2018 to 2022, which included full research papers, dataset papers and posters. After filtering out papers that analyze synthetic datasets or those lacking clear country of origin, we were left with 420 papers from which 188 participants in a crowdsourcing study with full manual validation extracted data for the WEIRD scores computation. This data was then used to adapt existing WEIRD metrics to be applicable for social media data. We found that 37% of these papers focused solely on data from Western countries. This percentage is significantly less than the percentages observed in research from CHI (76%) and FAccT (84%) conferences, suggesting a greater diversity of dataset origins within ICWSM. However, the studies at ICWSM still predominantly examine populations from countries that are more Educated, Industrialized, and Rich in comparison to those in FAccT, with a special note on the 'Democratic' variable reflecting political freedoms and rights. This points out the utility of social media data in shedding light on findings from countries with restricted political freedoms. Based on these insights, we recommend extensions of current "paper checklists" to include considerations about the WEIRD bias and call for the community to broaden research inclusivity by encouraging the use of diverse datasets from underrepresented regions.
LGNov 14, 2020
Cost-Sensitive Machine Learning Classification for Mass Tuberculosis Verbal ScreeningAli Akbar Septiandri, Aditiawarman, Roy Tjiong et al.
Score-based algorithms for tuberculosis (TB) verbal screening perform poorly, causing misclassification that leads to missed cases and unnecessary costly laboratory tests for false positives. We compared score-based classification defined by clinicians to machine learning classification such as SVM-RBF, logistic regression, and XGBoost. We restricted our analyses to data from adults, the population most affected by TB, and investigated the difference between untuned and unweighted classifiers to the cost-sensitive ones. Predictions were compared with the corresponding GeneXpert MTB/Rif results. After adjusting the weight of the positive class to 40 for XGBoost, we achieved 96.64% sensitivity and 35.06% specificity. As such, the sensitivity of our identifier increased by 1.26% while specificity increased by 13.19% in absolute value compared to the traditional score-based method defined by our clinicians. Our approach further demonstrated that only 2000 data points were sufficient to enable the model to converge. The results indicate that even with limited data we can actually devise a better method to identify TB suspects from verbal screening.
IVAug 28, 2020
Human Blastocyst Classification after In Vitro Fertilization Using Deep LearningAli Akbar Septiandri, Ade Jamal, Pritta Ameilia Iffanolida et al.
Embryo quality assessment after in vitro fertilization (IVF) is primarily done visually by embryologists. Variability among assessors, however, remains one of the main causes of the low success rate of IVF. This study aims to develop an automated embryo assessment based on a deep learning model. This study includes a total of 1084 images from 1226 embryos. The images were captured by an inverted microscope at day 3 after fertilization. The images were labelled based on Veeck criteria that differentiate embryos to grade 1 to 5 based on the size of the blastomere and the grade of fragmentation. Our deep learning grading results were compared to the grading results from trained embryologists to evaluate the model performance. Our best model from fine-tuning a pre-trained ResNet50 on the dataset results in 91.79% accuracy. The model presented could be developed into an automated embryo assessment method in point-of-care settings.
CLFeb 28, 2020
UKARA 1.0 Challenge Track 1: Automatic Short-Answer Scoring in Bahasa IndonesiaAli Akbar Septiandri, Yosef Ardhito Winatmoko
We describe our third-place solution to the UKARA 1.0 challenge on automated essay scoring. The task consists of a binary classification problem on two datasets | answers from two different questions. We ended up using two different models for the two datasets. For task A, we applied a random forest algorithm on features extracted using unigram with latent semantic analysis (LSA). On the other hand, for task B, we only used logistic regression on TF-IDF features. Our model results in F1 score of 0.812.
CLSep 26, 2019
Aspect and Opinion Term Extraction for Hotel Reviews using Transfer Learning and Auxiliary LabelsYosef Ardhito Winatmoko, Ali Akbar Septiandri, Arie Pratama Sutiono
Aspect and opinion term extraction is a critical step in Aspect-Based Sentiment Analysis (ABSA). Our study focuses on evaluating transfer learning using pre-trained BERT (Devlin et al., 2018) to classify tokens from hotel reviews in bahasa Indonesia. The primary challenge is the language informality of the review texts. By utilizing transfer learning from a multilingual model, we achieved up to 2% difference on token level F1-score compared to the state-of-the-art Bi-LSTM model with fewer training epochs (3 vs. 200 epochs). The fine-tuned model clearly outperforms the Bi-LSTM model on the entity level. Furthermore, we propose a method to include CRF with auxiliary labels as an output layer for the BERT-based models. The CRF addition further improves the F1-score for both token and entity level.
CLAug 14, 2019
Aspect and Opinion Terms Extraction Using Double Embeddings and Attention Mechanism for Indonesian Hotel ReviewsJordhy Fernando, Masayu Leylia Khodra, Ali Akbar Septiandri
Aspect and opinion terms extraction from review texts is one of the key tasks in aspect-based sentiment analysis. In order to extract aspect and opinion terms for Indonesian hotel reviews, we adapt double embeddings feature and attention mechanism that outperform the best system at SemEval 2015 and 2016. We conduct experiments using 4000 reviews to find the best configuration and show the influences of double embeddings and attention mechanism toward model performance. Using 1000 reviews for evaluation, we achieved F1-measure of 0.914 and 0.90 for aspect and opinion terms extraction in token and entity (term) level respectively.
CLJul 22, 2017
Predicting the Gender of Indonesian NamesAli Akbar Septiandri
We investigated a way to predict the gender of a name using character-level Long-Short Term Memory (char-LSTM). We compared our method with some conventional machine learning methods, namely Naive Bayes, logistic regression, and XGBoost with n-grams as the features. We evaluated the models on a dataset consisting of the names of Indonesian people. It is not common to use a family name as the surname in Indonesian culture, except in some ethnicities. Therefore, we inferred the gender from both full names and first names. The results show that we can achieve 92.25% accuracy from full names, while using first names only yields 90.65% accuracy. These results are better than the ones from applying the classical machine learning algorithms to n-grams.