PREDICTION OF POVERTY LEVEL ON CENSUS DATA USING MACHINE LEARNING

PREDICTION OF POVERTY LEVEL ON CENSUS DATA USING MACHINE LEARNING

Files

gs601130298.pdf (3.17 MB)

Date

16/8/2021

Publisher

Srinakharinwirot University

Abstract

The purpose of this research is to present the utilization of machine learning for analysis of census data by proposing a feature engineering process to create household characteristics in conjunction with the Synthetic Minority Over-sampling Technique (SMOTE) for predicting population poverty. Poverty is divided into four levels: Extreme Poverty, Moderate Poverty, Vulnerable Households, and Non-Vulnerable Households. The machine learning models used in the poverty prediction from the census data, including Multilayer Perceptron, Linear Discriminant Analysis, K-nearest neighbor, Random Forest and Extra Trees. After adjusting the hyperparameters of each model to improve prediction efficiency, the experimental results showed that Random Forest model had the best performance for poverty prediction from census data, with accuracy equal to 0.63, precision equal to 0.43, recall equal to 0.42 and the average macro F1 score equal to 0.43. The experimental results also revealed that SMOTE played a significant role in the optimization of the model of poverty identification. The models presented above possess three most important properties affecting the performance of the model, including age of the population, years in school and average years of education for adults with the feature importance was 0.066, 0.065 and 0.059, respectively.
งานวิจัยนี้นำเสนอการใช้การเรียนรู้ของเครื่องจักรในการวิเคราะห์ข้อมูลสำมะโนประชากร โดยนำเสนอการใช้กระบวนการปรับแต่งคุณลักษณะเฉพาะของข้อมูล (Feature Engineering) เพื่อสร้างคุณลักษณะเฉพาะของครัวเรือน ร่วมกับวิธีการสุ่มเพิ่มตัวอย่างกลุ่มน้อย (Synthetic Minority Over-sampling Technique) เพื่อใช้ในการทำนายความยากจนของประชากร ซึ่งความยากจนถูกแบ่งออกเป็น 4 ระดับคือ ขั้นรุนแรง, ปานกลาง, มีความเสี่ยงจะยากจน, ไม่มีความเสี่ยงจะยากจน โดยโมเดลการเรียนรู้ของเครื่องจักรที่งานวิจัยนี้นำมาใช้ในการทำนายความยากจนจากข้อมูลสำมะโนประชากรประกอบไปด้วย โครงข่ายประสาทเทียมแบบ Multilayer Perceptron ,การวิเคราะห์การจำแนกประเภทเชิงเส้น ,วิธีการเพื่อนบ้านใกล้ที่สุด ,โมเดลป่าสุ่ม ,ต้นไม้ตัดสินใจจำนวนมาก หลังจากที่ได้ทดลองปรับไฮเปอร์พารามิเตอร์ (hyperparameter) ของแต่ละแบบจำลองเพื่อเพิ่มประสิทธิภาพในการทำนายแล้ว จากการทดลองพบว่าโมเดลการเรียนรู้ของเครื่องจักรแบบป่าสุ่มมีประสิทธิภาพดีที่สุดในการทำนายความยากจนจากข้อมูลสำมะโนประชากรโดยให้ค่าความถูกต้อง (Accuracy) เท่ากับ 0.63 , ความแม่นยำ (Precision) เท่ากับ 0.43 , ความครบถ้วน (Recall) เท่ากับ 0.42 และคะแนน F1 (macro F1) เฉลี่ยเท่ากับ 0.43 โดยจากการทดลองพบว่าเทคนิคการสุ่มเพิ่มตัวอย่างกลุ่มน้อยมีส่วนสำคัญในการเพิ่มประสิทธิภาพของโมเดลในการระบุความยากจน โมเดลที่นำเสนอนี้มีคุณสมบัติที่สำคัญที่สุดสามประการที่มีผลต่อประสิทธิภาพของแบบจำลองอันประกอบไปด้วยอายุของประชากร,จำนวนปีในสถานศึกษาและจำนวนปีการศึกษาโดยเฉลี่ยสำหรับผู้ใหญ่ โดยมีค่าความสำคัญ (Feature Importance) เท่ากับ 0.066, 0.065 และ 0.059 ตามลำดับ

Description

MASTER OF SCIENCE (M.Sc.)
วิทยาศาสตรมหาบัณฑิต (วท.ม.)

Keywords

การเรียนรู้ของเครื่องจักร, การทำนายความยากจน, การปรับแต่งคุณลักษณะเฉพาะของข้อมูล, การสุ่มเพิ่มตัวอย่างกลุ่มน้อย, Machine learning, Predicting poverty, Feature engineering, Synthetic Minority Over-sampling Technique

URI

https://ir-ithesis.swu.ac.th/handle/123456789/1239

Collections

Faculty of Science

Full item page

PREDICTION OF POVERTY LEVEL ON CENSUS DATA USING MACHINE LEARNING

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By