HEALTH INSURANCE PREMIUM PREDICTION: AN ANALYTICAL STUDY USING MACHINE LEARNING ALGORITHMS

Please use this identifier to cite or link to this item: http://ir-ithesis.swu.ac.th/dspace/handle/123456789/3401

Title:	HEALTH INSURANCE PREMIUM PREDICTION: AN ANALYTICAL STUDY USING MACHINE LEARNING ALGORITHMS การทำนายเบี้ยประกันสุขภาพ: การศึกษาการวิเคราะห์ด้วยอัลกอริทึมการเรียนรู้ของเครื่อง
Authors:	JIDAPA ADAM จิดาภา อาดัม Waraporn Viyanon วราภรณ์ วิยานนท์ Srinakharinwirot University Waraporn Viyanon วราภรณ์ วิยานนท์ waraporn@swu.ac.th waraporn@swu.ac.th
Keywords:	การทำนายเบี้ยประกันสุขภาพ, การเรียนรู้ของเครื่อง, การจูนไฮเปอร์พารามิเตอร์ Health Insurance Premium Prediction Machine Learning Hyperparameter Tuning
Issue Date:	18
Publisher:	Srinakharinwirot University
Abstract:	This research aims to develop and compare machine learning models for predicting health insurance premiums based on physical characteristics are age, Diabetes, BloodPressureProblems, AnyTransplants, AnyChronicDiseases, Height, Weight, KnowAllergies, HistoryOfCancerinFamily and NumberOfMajorSurgeries. Six models were investigated: Support Vector Regression (SVR), Lasso Regression, Ridge Regression, Decision Tree, Random Forest, and XGBoost. Hyperparameter tuning was applied to optimize each model's performance by using GridSearchCV. The experimental results indicate that the Random Forest model yielded the best performance, achieving the lowest MAPE and RMSE, as well as the highest R-squared (R²) value, thereby demonstrating its superior ability to capture and explain the variance in the data. Furthermore, age was identified as the most influential feature in predicting insurance premiums across all models, aligning with the initial hypothesis. Tree-based models showed better prediction performance than linear models, due to the nonlinear relationships in insurance premium data and interactions among features. This study provides a foundation for further research using alternative datasets or more advanced models and holds potential for practical applications in the health insurance industry to improve the accuracy and efficiency of premium estimation. งานวิจัยนี้มีวัตถุประสงค์เพื่อพัฒนาและเปรียบเทียบแบบจำลองการเรียนรู้ของเครื่องสำหรับการทำนายเบี้ยประกันสุขภาพ โดยใช้ข้อมูลลักษณะทางกายภาพคือ อายุ สภาวะของโรคเบาหวาน ปัญหาเรื่องความดันโลหิต การปลูกถ่ายอวัยวะ มีโรคเรื้อรังหรือไม่ ส่วนสูง น้ำหนัก มีการแพ้หรือไม่ ประวัติการเป็นมะเร็งของครอบครัว และจำนวนครั้งการผ่าตัด โดยเลือกใช้แบบจำลอง 6 ประเภท ได้แก่ Support Vector Regression (SVR), Lasso Regression, Ridge Regression, Decision Tree, Random Forest และ XGBoost พร้อมปรับจูนไฮเปอร์พารามิเตอร์เพื่อเพิ่มประสิทธิภาพด้วยวิธี GridSearchCV ผลการทดลองพบว่า Random Forest ให้ผลลัพธ์ดีที่สุด โดยมีค่า MAPE และ RMSE ต่ำสุด และค่า R² สูงสุด แสดงถึงความสามารถในการอธิบายข้อมูลได้ดีที่สุด นอกจากนี้ ตัวแปรที่มีผลต่อแบบจำลองมากที่สุดในทุกกรณีคือ อายุ ซึ่งสอดคล้องกับสมมติฐาน อย่างไรก็ตาม โมเดลประเภท Tree-based จะให้ประสิทธิภาพในการทำนายเบี้ยประกันสุขภาพสูงกว่าโมเดลเชิงเส้น (Linear Models) เนื่องจากข้อมูลเบี้ยประกันมีลักษณะความสัมพันธ์แบบไม่เชิงเส้น (Nonlinear Relationship) และมีปฏิสัมพันธ์ระหว่างตัวแปร (Feature Interaction) งานวิจัยนี้สามารถนำไปต่อยอดโดยใช้ชุดข้อมูลหรือแบบจำลองอื่นเพิ่มเติม และสามารถประยุกต์ใช้ในภาคธุรกิจประกันสุขภาพเพื่อเพิ่มประสิทธิภาพในการคำนวณเบี้ยประกันได้ในอนาคต
URI:	http://ir-ithesis.swu.ac.th/dspace/handle/123456789/3401
Appears in Collections:	Faculty of Science

Files in This Item:

File	Description	Size	Format
gs661160141.pdf		1.98 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets