Komparasi Algoritma Klasifikasi Machine Learning Dan Feature Selection pada Analisis Sentimen Review Film

Vinita Chandani, Romi Satria Wahono, . Purwanto

Abstract


Analisis sentimen adalah proses yang bertujuan untuk menentukan isi dari dataset yang berbentuk teks bersifat positif, negatif atau netral. Saat ini, pendapat khalayak umum menjadi sumber yang penting dalam pengambilan keputusan seseorang akan suatu produk. Algoritma klasifikasi seperti Naïve Bayes (NB), Support Vector Machine (SVM), dan Artificial Neural Network (ANN) diusulkan oleh banyak peneliti untuk digunakan pada analisis sentimen review film. Namun, klasifikasi sentimen teks mempunyai masalah pada banyaknya atribut yang digunakan pada sebuah dataset. Feature selection dapat digunakan untuk mengurangi atribut yang kurang relevan pada dataset. Beberapa algoritma feature selection yang digunakan adalah information gain, chi square, forward selection dan backward elimination. Hasil komparasi algoritma, SVM mendapatkan hasil yang terbaik dengan accuracy 81.10% dan AUC 0.904. Hasil dari komparasi feature selection, information gain mendapatkan hasil yang paling baik dengan average accuracy 84.57% dan average AUC 0.899. Hasil integrasi algoritma klasifikasi terbaik dan algoritma feature selection terbaik menghasilkan accuracy 81.50% dan AUC 0.929. Hasil ini mengalami kenaikan jika dibandingkan hasil eksperimen yang menggunakan SVM tanpa feature selection.  Hasil dari pengujian algoritma feature selection terbaik untuk setiap algoritma klasifikasi adalah information gain mendapatkan hasil terbaik untuk digunakan pada algoritma NB, SVM dan ANN.

Full Text:

PDF

References


Dergiades, T. (2012). Do investors’ sentiment dynamics affect stock returns? Evidence from the US economy. Economics Letters, 116(3), 404–407. doi:10.1016/j.econlet.2012.04.018

Forman, G. (2000). An Extensive Empirical Study of Feature Selection Metrics for Text Classification. Journal of Machine Learning Research, 3, 1289–1305. doi:10.1162/153244303322753670

Kang, H., Yoo, S. J., & Han, D. (2012). Senti lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications, 39(5), 6000–6010. doi:10.1016/j.eswa.2011.11.107

Koh, N. S., Hu, N., & Clemons, E. K. (2010). Do online reviews reflect a product’s true perceived quality? An investigation of online movie reviews across cultures. Electronic Commerce Research and Applications, 9(5), 374–385. doi:10.1016/j.elerap.2010.04.001

Koncz, P., & Paralic, J. (2011). An approach to feature selection for sentiment analysis. In 2011 15th IEEE International Conference on Intelligent Engineering Systems (pp. 357–362). IEEE. doi:10.1109/INES.2011.5954773

Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. (2013). Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications, 40(10), 4065–4074. doi:10.1016/j.eswa.2013.01.001

Langgeni, D. P., Baizal, Z. K. A., & W, Y. F. A. (2010). Clustering Artikel Berita Berbahasa Indonesia, 2010(semnasIF), 1–10.

Liu, C.-L., Hsaio, W.-H., Lee, C.-H., Lu, G.-C., & Jou, E. (2012). Movie Rating and Review Summarization in Mobile Environment. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(3), 397–407. doi:10.1109/TSMCC.2011.2136334

Liu, Y., Huang, X., An, A., & Yu, X. (2007). ARSA: A Sentiment-Aware Model for Predicting Sales Performance Using Blogs. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’07 (p. 607). New York, New York, USA: ACM Press. doi:10.1145/1277741.1277845

Liu, Y., Wang, G., Chen, H., Dong, H., Zhu, X., & Wang, S. (2011). An Improved Particle Swarm Optimization for Feature Selection. Journal of Bionic Engineering, 8(2), 191–200. doi:10.1016/S1672-6529(11)60020-6

Manning, C. D., Raghavan, P., & Schutze, H. (n.d.). Introduction to Information Retrieval.

Moraes, R., Valiati, J. F., & Gavião Neto, W. P. (2013). Document Level Sentiment Classification: an Empirical Comparison between SVM and ANN. Expert Systems with Applications, 40(2), 621–633. doi:10.1016/j.eswa.2012.07.059

Nugroho, A. S., Witarto, A. B., & Handoko, D. (2003). Support Vector Machine Teori dan Aplikasinya dalam Bioinformatika. IlmuKomputer.Com.

Pang, B., & Lee, L. (2002). A Sentimental Education : Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. Association for Computational Linguistics.

Pang, B., Lee, L., Rd, H., & Jose, S. (2002). Thumbs up ? Sentiment Classification using Machine Learning Techniques. Association for Computational Linguistics, 10(July), 79–86.

Park, S., Ko, M., Kim, J., Liu, Y., & Song, J. (2011). The Politics of Comments : Predicting Political Orientation of News Stories with Commenters ’ Sentiment Patterns.

Tan, S., & Wang, Y. (2011). Weighted SCL model for adaptation of sentiment classification. Expert Systems with Applications, 38(8), 10524–10531. doi:10.1016/j.eswa.2011.02.106

Tan, S., & Zhang, J. (2008). An empirical study of sentiment analysis for chinese documents. Expert Systems with Applications, 34(4), 2622–2629. doi:10.1016/j.eswa.2007.05.028

Tsou, B. K., & Ma, M. (2011). Aspect Based Opinion Polling from Customer Reviews. IEEE Transactions on Affective Computing, 2(1), 37–49. doi:10.1109/T-AFFC.2011.2

Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks / a Publication of the IEEE Neural Networks Council, 10(5), 988–99. doi:10.1109/72.788640

Vercellis, C. (2009). Business Intelligence: Data Mining and Optomization for Decision Making. John Wiley and Sons.

Wang, S., Li, D., Song, X., Wei, Y., & Li, H. (2011). A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Systems with Applications, 38(7), 8696–8702. doi:10.1016/j.eswa.2011.01.077

Wang, S., Li, D., Zhao, L., & Zhang, J. (2013). Sample cutting method for imbalanced text sentiment classification based on BRC. Knowledge-Based Systems, 37, 451–461. doi:10.1016/j.knosys.2012.09.003

Xu, T., Peng, Q., & Cheng, Y. (2012). Identifying the semantic orientation of terms using S-HAL for sentiment analysis. Knowledge-Based Systems, 35, 279–289. doi:10.1016/j.knosys.2012.04.011

Yang, Y., & Pedersen, J. O. (1997). A Comparative Study on Feature Selection in Text Categorization. Proceedings of the Fourteenth International Conference on Machine Learning, 20(15), 412–420.

Zhang, W., & Gao, F. (2011). An Improvement to Naive Bayes for Text Classification. Advanced in Control Engineeringand Information Science, 15, 2160–2164. doi:10.1016/j.proeng.2011.08.404

Zhang, Z., Ye, Q., Zhang, Z., & Li, Y. (2011). Sentiment classification of Internet restaurant reviews written in Cantonese. Expert Systems with Applications, 38(6), 7674–7682. doi:10.1016/j.eswa.2010.12.147

Zhu, J., Xu, C., & Wang, H. (2010). Sentiment classification using the theory of ANNs. The Journal of China Universities of Posts and Telecommunications, 17(July), 58–62. doi:10.1016/S1005-8885(09)60606-3


Refbacks

  • There are currently no refbacks.




Journal of Intelligent Systems (JIS, ISSN 2356-3982)
Copyright © 2020 IlmuKomputer.Com. All rights reserved.