A STUDY OF INTEGRATING APACHE AIRFLOW AND FEATURE STORES TO ENHANCE MACHINE LEARNING DATA MANAGEMENT

Please use this identifier to cite or link to this item: http://ir-ithesis.swu.ac.th/dspace/handle/123456789/3417

Title:	A STUDY OF INTEGRATING APACHE AIRFLOW AND FEATURE STORES TO ENHANCE MACHINE LEARNING DATA MANAGEMENT การศึกษา การผสานอาปาเช่แอร์โฟลว์ และ คลังคุณลักษณะเพื่อส่งเสริมการจัดการข้อมูลสำหรับการเรียนรู้ของเครื่อง
Authors:	VONGSAKORN WONGDECHNGAM วงศกร วงศ์เดชงาม Ruangsak Trakunphutthirak เรืองศักดิ์ ตระกูลพุทธิรักษ์ Srinakharinwirot University Ruangsak Trakunphutthirak เรืองศักดิ์ ตระกูลพุทธิรักษ์ ruangsak@swu.ac.th ruangsak@swu.ac.th
Keywords:	Apache Airflow, FEAST, Feature Store, การจัดการข้อมูลสำหรับการเรียนรู้ของเครื่อง, Feature Engineering Apache Airflow FEAST Feature Store Machine Learning Data Management Feature Engineering
Issue Date:	18
Publisher:	Srinakharinwirot University
Abstract:	This research focuses on the application of Apache Airflow and Feature Store (FEAST) technologies to enhance data management efficiency for Machine Learning systems, specifically in the stages of feature engineering and feature reuse within a Data Pipeline deployed on a cloud environment using Google Cloud Platform. The research compares two Data Pipeline implementations: one utilizing FEAST and another without FEAST (Non-FEAST). Findings from the study indicate that FEAST significantly reduces the complexity involved in the feature engineering process and enhances the efficiency of feature storage and management. Additionally, it facilitates easier and more effective reuse of existing features. The experimental results also demonstrate that the FEAST-based system substantially decreases error rates and improves overall operational efficiency compared to the Non-FEAST approach, highlighting FEAST’s potential for supporting large-scale data management in organizations aiming for agility and efficiency in Machine Learning and Data Engineering operations. การศึกษานี้มุ่งเน้นการประยุกต์ใช้เทคโนโลยี Apache Airflow และ Feature Store (FEAST) เพื่อเพิ่มประสิทธิภาพในการจัดการข้อมูลสำหรับระบบการเรียนรู้ของเครื่อง (Machine Learning) โดยเฉพาะในขั้นตอนการจัดเตรียมข้อมูล (Feature Engineering) และการนำคุณลักษณะ (features) กลับมาใช้งานใหม่ในระบบ Data Pipeline ภายใต้สภาพแวดล้อมคลาวด์ด้วย Google Cloud Platform งานวิจัยนี้ได้เปรียบเทียบการพัฒนา Data Pipeline สองรูปแบบ คือรูปแบบที่ใช้ Feature Store (FEAST) และแบบที่ไม่ใช้ Feature Store (Non-FEAST) โดยผลการศึกษาชี้ให้เห็นว่าการใช้ FEAST ช่วยลดความซับซ้อนของกระบวนการสร้างคุณลักษณะได้อย่างชัดเจน และเพิ่มประสิทธิภาพในการจัดเก็บและจัดการคุณลักษณะของข้อมูล ทำให้กระบวนการนำคุณลักษณะกลับมาใช้ซ้ำสะดวกและมีประสิทธิภาพมากขึ้น นอกจากนี้ ผลการทดลองยังแสดงให้เห็นว่าระบบที่ใช้ FEAST สามารถลดอัตราข้อผิดพลาดและเพิ่มประสิทธิภาพในการดำเนินงานโดยรวมได้ดีกว่าระบบที่ไม่มี FEAST อย่างมีนัยสำคัญ ซึ่งแสดงให้เห็นถึงศักยภาพของ FEAST ในการสนับสนุนการจัดการข้อมูลขนาดใหญ่สำหรับองค์กรที่ต้องการความคล่องตัวและประสิทธิภาพในการดำเนินงานระบบ Machine Learning และ Data Engineering
URI:	http://ir-ithesis.swu.ac.th/dspace/handle/123456789/3417
Appears in Collections:	Faculty of Science

Files in This Item:

File	Description	Size	Format
gs661160177.pdf		3.1 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets