STROKE PREDICTION MODEL USING MACHINE LEARNING TECHNIQUES

STROKE PREDICTION MODEL USING MACHINE LEARNING TECHNIQUES

Files

gs651160192.pdf (4.3 MB)

Date

24/5/2024

Publisher

Srinakharinwirot University

Abstract

Stroke is one of the leading causes of death and disability worldwide; therefore, the early diagnosis of stroke is crucial in reducing mortality rates and subsequent disabilities. However, diagnosing a stroke required the limited expertise of medical professionals. The researchers realized the potential of using Machine Learning techniques to create models that can help classify stroke patients based on patient characteristic data, thereby reducing the burden on doctors and enabling faster patient screening. This research involved the study of model creation using Machine Learning techniques, with the dataset used for model creation coming from the Kaggle website. This dataset includes clinical data of 5110 samples, comprised of both normal individuals and stroke patients, and features imbalanced data, which can affect the performance of the model. Various techniques were employed to manage the imbalanced data. The study compared different models created using various algorithms including Logistic Regression, Decision Tree, Random Forest, XGBoost, LightGBM, AdaBoost, and CatBoost. The comparison used performance metrics derived from the Confusion Matrix, including Accuracy, Sensitivity, F1-score, Specificity, ROC Curve, and Balanced Accuracy. However, this research prioritized Balanced Accuracy as the main performance metric due to the imbalanced data set, which required a performance metric that considered the weight of the data categories. The results showed that the model created with the AdaBoost algorithm had the highest performance with a Balanced Accuracy score of 0.72. If researchers want to improve performance, they can do so by increasing the sample size and performing parameter tuning using the GridSearchCV algorithm.
โรคหลอดเลือดสมอง (Stroke) เป็นหนึ่งในสาเหตุการเสียชีวิตและทุพพลภาพที่สำคัญของประชากรทั่วโลก การวินิจฉัยโรคหลอดเลือดสมองในระยะเริ่มแรกมีความสำคัญอย่างมากในการลดอัตราการเสียชีวิตและความพิการที่ตามมา อย่างไรก็ตามการวินิจฉัยโรคหลอดเลือดสมองต้องอาศัยความเชี่ยวชาญของแพทย์ ซึ่งมีอยู่อย่างจำกัด ผู้วิจัยจึงเล็งเห็นการนำเทคนิคการเรียนรู้ของเครื่อง (Machine Learning) ในการสร้างแบบจำลองเพื่อช่วยในการจำแนกผู้ป่วยโรคหลอดเลือดสมอง โดยอาศัยข้อมูลคุณลักษณะของผู้ป่วยในการสร้างแบบจำลองเพื่อลดภาระของแพทย์และทำให้สามารถช่วยลดระยะเวลาคัดกรองผู้ป่วยได้ งานวิจัยนี้เป็นการศึกษาการสร้างแบบจำลองด้วยเทคนิคการเรียนรู้ของเครื่อง โดยชุดข้อมูลที่นำมาใช้ในการสร้างแบบจำลองมาจากเว็บไซต์ Kaggle ซึ่งเป็นข้อมูลทางคลินิกของผู้ป่วยมี 2 ประเภทผู้ คือ ผู้ป่วยปกติและผู้ป่วยโรคหลอดเลือดสมอง จำนวนทั้งหมด 5,110 คน ในการศึกษาข้อมูลชุดนี้มีลักษณะชุดข้อมูลไม่สมดุล (Imbalanced Data) ซึ่งอาจส่งผลต่อประสิทธิภาพของแบบจำลอง ทำให้ต้องนำเทคนิคการจัดการข้อมูลไม่สมดุลของข้อมูลด้วยวิธีต่างๆมาช่วยในการจัดการข้อมูลร่วมด้วย ในการหาแบบจำลองที่มีประสิทธิภาพดีที่สุดจะมาจากทำการเปรียบเทียบการสร้างแบบจำลองด้วยอัลกอริทึมที่หลากหลายได้แก่ Logistic Regression, Decision Tree, Random Forest, XGBoost, LightGBM, AdaBoost และCatBoost การเปรียบเทียบจะใช้ตัววัดประสิทธิภาพที่มาจากผลลัพธ์การทำนายของแบบจำลองด้วย Confusion Matrix ประกอบด้วย ความแม่นยำ (Accuracy), ความอ่อนไหว, F1-score, Specificity, ROC Curve และความแม่นยำสมดุล (Balanced Accuracy) แต่ในงานวิจัยนี้จะให้ความสำคัญกับความแม่นยำสมดุลเป็นตัววัดประสิทธิภาพหลัก เป็นเพราะชุดข้อมูลไม่สมดุลที่มีความต่างของจำนวนประเภทข้อมูลทั้งสอง ทำให้ต้องเลือกใช้ตัววัดประสิทธิภาพที่ให้ความสำคัญกับน้ำหนักของประเภทจำนวนข้อมูล จากผลการสร้างแบบจำลองพบว่าแบบจำลองที่สร้างด้วยอัลกอริทึม AdaBoost ให้ประสิทธิภาพสูงที่สุดด้วยค่าความแม่นยำสมดุลที่ 0.72 และหากผู้ศึกษาต้องการเพิ่มประสิทธิภาพของแบบจำลองสามารถทำได้โดยการเพิ่มตัวอย่างข้อมูล และการปรับจูนพารามิเตอร์ (Parameter-Tuning) ด้วยอัลกอริทึม GridSearchCV

Keywords

การเรียนรู้ของเครื่อง, ชุดข้อมูลไม่สมดุล, ความแม่นยำสมดุล, Machine learning, Imbalanced data, Balanced accuracy

URI

http://ir-ithesis.swu.ac.th/dspace/handle/123456789/2764

Collections

Faculty of Science

Full item page

STROKE PREDICTION MODEL USING MACHINE LEARNING TECHNIQUES

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By