MACHINE LEARNING APPROACH TO PREDICT E-COMMERCE CUSTOMER SATISFACTION SCORE

MACHINE LEARNING APPROACH TO PREDICT E-COMMERCE CUSTOMER SATISFACTION SCORE

Files

gs641130055.pdf (3.2 MB)

Date

19/5/2023

Publisher

Srinakharinwirot University

Abstract

This paper investigates the performance of machine learning (ML) to predict customer satisfaction scores from the sales dataset collected by Olist, the Brazilian e-commerce company. The customer satisfaction score is categorized into four classes: Low, Average, Good and Excellent. The majority of sales orders received an Excellent score. This was inspired by the fact that delivery duration and product rating score obtained from the purchases of other customers are one of the main factors that influenced the satisfaction scores of customers. A feature engineering method was proposed that creates delivery duration and an average product rating score, which are used as the main features in the ML model. Random Forest (RF), Logistic Regression (LR), and K-Nearest Neighbor (K-NN) were employed to predict customer satisfaction score and their performance was compared with the baseline model, which predicted the customer satisfaction score using the average product rating score. The results showed that the RF model yields the best performance with the average precision, recall, and macro F1 equal to 0.34, 0.36, and 0.32, respectively. In addition, RF achieves the best recall equal to 0.43, 0.33 and 0.33 for Low, Average and Good classes. The mean and SD of the product rating are two features with the highest feature importance equal to 0.313 and 0.087, respectively.
งานวิจัยนี้นำเสนอการวิเคราะห์ประสิทธิภาพของการเรียนรู้ของเครื่อง (Machine Learning) เพื่อทำนายคะแนนความพึงพอใจของลูกค้าจากชุดข้อมูลยอดขายที่รวบรวมโดย “Olist” ซึ่งเป็นบริษัทอีคอมเมิร์ซของประเทศบราซิล ในการทดลองนี้จะแบ่งคะแนนความพึงพอใจของลูกค้าเป็น 4 ระดับ: ต่ำ, ปานกลาง, ดี และยอดเยี่ยม ซึ่งคำสั่งซื้อส่วนใหญ่ได้รับคะแนนยอดเยี่ยม การออกแบบการทดลองได้รับแรงบันดาลใจจากข้อเท็จจริงที่ว่าระยะเวลาในการจัดส่งสินค้าและคะแนนของผลิตภัณฑ์ที่มาจากการซื้อของลูกค้ารายอื่นเป็นปัจจัยหลักที่มีอิทธิพลต่อคะแนนความพึงพอใจของลูกค้า จึงได้เสนอวิธีการทางวิศวกรรมคุณลักษณะ (Feature Engineering) ที่สร้างระยะเวลาการจัดส่งและค่าเฉลี่ยของคะแนนผลิตภัณฑ์ ซึ่งใช้เป็นคุณสมบัติหลักในแบบจำลองการเรียนรู้ของเครื่อง โดยงานวิจัยนี้ใช้แบบจำลองแบบป่าสุ่ม (Random Forest), การถดถอยโลจิสติก (Logistic Regression) และวิธีการเพื่อนบ้านใกล้ที่สุด (K-Nearest Neighbor) เพื่อทำนายคะแนนความพึงพอใจของลูกค้าและประสิทธิภาพของแบบจำลองจะถูกเปรียบเทียบกับแบบจำลองพื้นฐาน ซึ่งทำนายคะแนนความพึงพอใจของลูกค้าโดยใช้ค่าเฉลี่ยของคะแนนของผลิตภัณฑ์ จากการทดลองพบว่าแบบจำลองแบบป่าสุ่ม (Random Forest) ให้ประสิทธิภาพที่ดีที่สุดโดยมีค่าเฉลี่ยของความแม่นยำ(Precision), ความครบถ้วน (Recall) และคะแนน F1 (Macro F1) เท่ากับ 0.34, 0.36 และ 0.32 ตามลำดับ นอกจากนี้แบบจำลองแบบป่าสุ่ม (Random Forest) มีประสิทธิภาพในความครบถ้วน (Recall) ที่ดีที่สุดเท่ากับ 0.43, 0.33 และ 0.33 สำหรับระดับความพึงพอใจที่ต่ำ, ปานกลาง และดี ตามลำดับ โดยแบบจำลองนี้มีคุณสมบัติที่สำคัญที่สุด (Feature Importance) ที่มีผลต่อคะแนนความพึงพอใจของของลูกค้าคือ คะแนนของผลิตภัณฑ์ โดยมีค่าเฉลี่ยและค่าส่วนเบี่ยงเบนมาตรฐานเท่ากับ 0.313 และ 0.087 ตามลำดับ

Keywords

ความพึงพอใจของลูกค้า, อีคอมเมิร์ซ, การจำแนกประเภทของข้อมูล, การเรียนรู้ของเครื่อง, การทำนายคะแนน, E-commerce, Classification, Customer satisfaction, Machine learning, Rating prediction

URI

https://ir-ithesis.swu.ac.th/handle/123456789/2230

Collections

Faculty of Science

Full item page

MACHINE LEARNING APPROACH TO PREDICT E-COMMERCE CUSTOMER SATISFACTION SCORE

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By