CREDIT CARD FRAUD DETECTION WITH IMBALANCED DATA USING MACHINE LEARNING MODELS

Please use this identifier to cite or link to this item: http://ir-ithesis.swu.ac.th/dspace/handle/123456789/3406

Title:	CREDIT CARD FRAUD DETECTION WITH IMBALANCED DATA USING MACHINE LEARNING MODELS การตรวจจับการฉ้อโกงบัตรเครดิตที่ใช้ข้อมูลไม่สมดุลด้วยแบบจำลองการเรียนรู้ของเครื่อง
Authors:	NUTHANAN SAREANRAM ณัฐนันท์ สะเริญรัมย์ Nuwee Wiwatwattana นุวีย์ วิวัฒนวัฒนา Srinakharinwirot University Nuwee Wiwatwattana นุวีย์ วิวัฒนวัฒนา nuwee@swu.ac.th nuwee@swu.ac.th
Keywords:	การตรวจจับการฉ้อโกง บัตรเครดิต การเรียนรู้ของเครื่อง ข้อมูลไม่สมดุล การจำแนกประเภท Fraud Detection Credit Card Machine Learning Imbalanced Data Classification
Issue Date:	18
Publisher:	Srinakharinwirot University
Abstract:	Credit card fraud poses a significant financial threat to both consumers and financial institutions, particularly when transaction data are imbalanced, which limits model performance. This study aims to develop an effective fraud detection model using a two-stage approach. First, five machine learning algorithms—Decision Tree, Random Forest, XGBoost, K-Nearest Neighbors, and CatBoost—were compared and optimized to identify the most suitable model for classifying fraudulent transactions. XGBoost achieved the highest performance, with a precision of 0.96, recall of 0.83, and F1-score of 0.89. In the second stage, the model was further evaluated using six data imbalance handling techniques: Random Oversampling, SMOTE, Tomek Links, Edited Nearest Neighbors (ENN), SMOTEENN, and SMOTETomek. The integration of XGBoost with SMOTE and SMOTETomek improved the model’s performance, achieving a precision of 0.97, recall of 0.86, and F1-score of 0.91. Furthermore, the model demonstrated perfect accuracy in identifying non-fraudulent transactions. These results highlight that combining robust classification algorithms with effective data imbalance handling techniques significantly enhances fraud detection capabilities and supports practical implementation in financial systems. การฉ้อโกงบัตรเครดิตเป็นปัญหาที่ส่งผลกระทบทางการเงินอย่างรุนแรงต่อทั้งผู้บริโภคและสถาบันการเงิน การพัฒนาแบบจำลองที่สามารถตรวจจับธุรกรรมผิดปกติได้อย่างแม่นยำจึงมีความสำคัญอย่างยิ่ง โดยเฉพาะเมื่อข้อมูลธุรกรรมมีลักษณะไม่สมดุล (Imbalanced Data) ซึ่งส่งผลให้การเรียนรู้ของแบบจำลองมีข้อจำกัด งานวิจัยนี้มีวัตถุประสงค์เพื่อพัฒนาแบบจำลองให้สามารถตรวจจับธุรกรรมทุจริตได้อย่างมีประสิทธิภาพ โดยแบ่งการดำเนินการออกเป็นสองส่วน คือ (1) การเปรียบเทียบประสิทธิภาพของแบบจำลองการเรียนรู้ของเครื่องจำนวน 5 แบบ ได้แก่ Decision Tree, Random Forest, XGBoost, K-Nearest Neighbors และ CatBoost พร้อมการปรับพารามิเตอร์ให้เหมาะสม และ (2) การนำแบบจำลองที่มีประสิทธิภาพสูงสุดมาทดสอบร่วมกับเทคนิคการจัดการข้อมูลไม่สมดุล 6 เทคนิค ได้แก่ Random Oversampling, SMOTE, Tomek Links, Edited Nearest Neighbors (ENN), SMOTEENN และ SMOTETomek โดยข้อมูลที่ใช้ในการศึกษาเป็นข้อมูลธุรกรรมบัตรเครดิตที่ประกอบด้วยข้อมูลเชิงธุรกรรม ข้อมูลพื้นที่ และข้อมูลผู้ถือบัตร ผลการทดลองในส่วนแรกพบว่าแบบจำลอง XGBoost ให้ผลลัพธ์ดีที่สุดในการจำแนกธุรกรรมทุจริต โดยให้ค่า Precision เท่ากับ 0.96, Recall เท่ากับ 0.83 และ F1-Score เท่ากับ 0.89 ในการทดลองส่วนที่สอง พบว่าเมื่อใช้ XGBoost ร่วมกับเทคนิค SMOTE และ SMOTETomek แบบจำลองมีประสิทธิภาพดีมากขึ้น โดยให้ค่า Precision เท่ากับ 0.97,Recall เท่ากับ 0.86 และ F1-Score เท่ากับ 0.91 ทั้งนี้ แบบจำลองยังสามารถจำแนกธุรกรรมปกติ (Class 0) ได้อย่างแม่นยำ โดยมีค่า Accuracy เท่ากับ 1 ในทุกตัวชี้วัด ผลการวิจัยชี้ให้เห็นว่า การใช้เทคนิคการจัดการข้อมูลไม่สมดุลควบคู่กับแบบจำลองที่มีประสิทธิภาพ ช่วยเพิ่มประสิทธิภาพในการตรวจจับธุรกรรมทุจริตเพิ่มมากขึ้น
URI:	http://ir-ithesis.swu.ac.th/dspace/handle/123456789/3406
Appears in Collections:	Faculty of Science

Files in This Item:

File	Description	Size	Format
gs661160150.pdf		3.16 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets