SPOTIFY REVIEWS SENTIMENT CLASSIFICATION USING MACHINE LEARNING TECHNIQUES

SPOTIFY REVIEWS SENTIMENT CLASSIFICATION USING MACHINE LEARNING TECHNIQUES

Files

gs651160202.pdf (3.73 MB)

Date

19/7/2024

Publisher

Srinakharinwirot University

Abstract

††††††† Currently, social media played an increasingly important role as a channel for consumers to express their opinions about various products and services. Sentiment Analysis is thus a crucial tool in understanding consumer sentiment. The objective of this research is to create and compare models for sentiment classification from English language opinions of users of the Spotify app, using data from 54,708 reviews sourced from Kaggle. These reviews are categorized into positive and negative sentiments based on the given scores. The data is divided into a training set (75%) and a test set (25%), and then subjected to feature extraction using TF-IDF and Word2Vec methods. Subsequently, models were then built using various machine learning techniques including Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), XGBoost (XGB), and DistilBERT (DB). The study finds that DistilBERT performs most effectively in sentiment classification, with precision at 92.53%, recall at 89.62%, F1-score at 91.05%, ROC at 90.46%, and accuracy at 90.39%. Additionally, feature importance is studied to understand significant factors affecting sentiment classification, both positive and negative, by measuring Coefficients and SHAP Value. This explanation of model predictions helps to understand important factors in classification and leads to further improvement of model efficiency. The developed models can be utilized as tools for analyzing user sentiment, enabling data-driven development and enhancement of products and services to better meet user needs.
††††††† ปัจจุบันโซเชียลมีเดียมีบทบาทมากขึ้นในการเป็นช่องทางให้ผู้บริโภคแสดงความคิดเห็นต่อสินค้าและบริการต่างๆ การวิเคราะห์ความรู้สึก (Sentiment Analysis) จึงเป็นเครื่องมือสำคัญในการทำความเข้าใจความรู้สึกของผู้บริโภค งานวิจัยนี้มีวัตถุประสงค์เพื่อสร้างและเปรียบเทียบโมเดลในการจำแนกประเภทความรู้สึกจากความคิดเห็นภาษาอังกฤษของผู้ใช้บริการแอพ Spotify โดยใช้ข้อมูลความคิดเห็น 54,708 รายการจากแหล่งข้อมูล Kaggle จำแนกออกเป็นความคิดเห็นเชิงบวก (Positive) และเชิงลบ (Negative) ตามคะแนนที่ให้ไว้ โดยแบ่งชุดข้อมูลออกเป็นชุดข้อมูลฝึก (Train Set) 75% และข้อมูลทดสอบ (Test Set) 25% และนำข้อมูลเข้าสู่การสกัดคุณลักษณะ (Feature Extraction) ด้วยวิธี TF-IDF และ Word2Vec จากนั้นทำการสร้างโมเดลด้วยเทคนิคการเรียนรู้ของเครื่องหลายอัลกอริทึม ได้แก่ Random Forest (RF), Na‘ve Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), XGBoost (XGB) และ Transformer DistilBERT (DB) ผลการศึกษาพบว่า DistilBERT มีประสิทธิภาพสูงสุดในการจำแนกประเภทความรู้สึก โดยมีค่าความเที่ยง (Precision) เป็น 92.53% ค่าเรียกคืน (Recall) เป็น 89.62% ค่าเอฟวัน (F1-score) เป็น 91.05% ค่า ROC เป็น 90.46% และค่าความแม่น (Accuracy) เป็น 90.39% นอกจากนี้ยังมีการศึกษาหาคุณลักษณะที่สำคัญ (Feature Importance) ที่ส่งผลต่อการจำแนกประเภทความรู้สึกทั้งเชิงบวกและเชิงลบด้วยการวัดค่า Coefficients และ SHAP Value เพื่ออธิบายผลการทำนายของโมเดล ซึ่งจะช่วยให้เข้าใจปัจจัยสำคัญในการจำแนกประเภทและนำไปสู่การปรับปรุงประสิทธิภาพของโมเดลต่อไป โมเดลที่ได้สามารถนำไปใช้เป็นเครื่องมือในการวิเคราะห์ความรู้สึกของผู้ใช้บริการ เพื่อนำข้อมูลไปพัฒนาและปรับปรุงผลิตภัณฑ์และบริการให้ตรงกับความต้องการของผู้ใช้มากยิ่งขึ้น

Keywords

การเรียนรู้ของเครื่อง, การจำแนกประเภทความรู้สึก, การสกัดคุณลักษณะ, การประมวลผลภาษาธรรมชาติ, Machine Learning, Sentiment Classification, Feature Extraction, Natural language processing

URI

https://ir-ithesis.swu.ac.th/handle/123456789/2769

Collections

Faculty of Science

Full item page

SPOTIFY REVIEWS SENTIMENT CLASSIFICATION USING MACHINE LEARNING TECHNIQUES

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By