COMPARATIVE STUDY ON TRANSFORMER-BASED MODEL FOR FAKE IMAGE DETECTION

Please use this identifier to cite or link to this item: http://ir-ithesis.swu.ac.th/dspace/handle/123456789/3410

Title:	COMPARATIVE STUDY ON TRANSFORMER-BASED MODEL FOR FAKE IMAGE DETECTION การศึกษาเปรียบเทียบแบบจำลองแบบ Transformer สำหรับการตรวจจับภาพปลอม
Authors:	BOOTSAGORN DECHAPHONG บุษกร เดชาพงษ์ Ratchainant Thammasudjarit รัตน์ชัยนันท์ ธรรมสุจริต Srinakharinwirot University Ratchainant Thammasudjarit รัตน์ชัยนันท์ ธรรมสุจริต eakapan.boonserm@g.swu.ac.th eakapan.boonserm@g.swu.ac.th
Keywords:	ภาพปลอม การตรวจจับภาพปลอม โครงข่ายประสาทเทียมแบบทรานฟอร์เมอร์ การเรียนรู้เชิงลึก Deepfake Fake Image Detection Transformer Neural Network Deep Learning
Issue Date:	18
Publisher:	Srinakharinwirot University
Abstract:	Deepfake technology is advancing quickly, increasing the risk of digital misinformation and fake news. This study has two goals: (1) to develop a model that can detect and differentiate Deepfake-generated images from real ones, and (2) to compare the performance of Transformer-based neural networks—specifically Vision Transformer (ViT), Data-efficient Image Transformer (DeiT), and Swin Transformer—for this classification task. We used 10% of the “140k Real and Fake Faces” dataset from Kaggle, consisting of 14,000 images (7,000 real and 7,000 fake), divided into training, validation, and test sets. Before training, all images were resized to 224×224 pixels, and data augmentation was applied to reduce overfitting. We also used pre-trained models to initialize the training process. Performance was measured using accuracy, precision, recall, and F1-score. Results showed that Swin Transformer achieved the highest accuracy at 0.9915, followed by ViT (0.9720) and DeiT (0.9525), confirming Swin Transformer’s effectiveness in distinguishing real from fake images. Future research should focus on improving image quality, expanding the dataset, and fine-tuning model parameters to further enhance deepfake detection. เทคโนโลยี Deepfake มีการพัฒนาอย่างต่อเนื่อง ส่งผลให้เกิดความเสี่ยงในการบิดเบือนข้อมูลทางดิจิทัลและการเผยแพร่ข้อมูลข่าวปลอมออกสู่สาธารณะ งานวิจัยนี้มีวัตถุประสงค์เพื่อ 1) สร้างแบบจำลองที่สามารถตรวจจับและจำแนกภาพที่สร้างโดย Deepfake จากภาพจริง 2) เปรียบเทียบประสิทธิภาพของโครงข่ายประสาทเทียมแบบ Transformer ได้แก่ Vision Transformer (ViT), Data-efficient Image Transformer (DeiT) และ Swin Transformer ในการจำแนกภาพ Deepfake และภาพจริง ซึ่งงานวิจัยนี้ใช้ชุดข้อมูล “140k Real and Fake Faces” จาก Kaggle โดยสุ่มเลือกข้อมูล 10% ของชุดข้อมูลทั้งหมด คิดเป็น 14,000 ภาพ ซึ่งประกอบด้วยภาพจริง 7,000 ภาพ และภาพปลอม 7,000 ภาพ จากนั้นแบ่งข้อมูลออกเป็นชุดข้อมูลฝึก, ชุดข้อมูลตรวจสอบ และชุดข้อมูลทดสอบ ก่อนดำเนินการฝึกแบบจำลอง ข้อมูลทั้งหมดถูกปรับขนาดภาพเป็น 224×224 พิกเซล และใช้เทคนิคการเพิ่มความหลากหลายของข้อมูล (Data Augmentation) เพื่อลดปัญหาการเกิด overfitting นอกจากนี้ชุดข้อมูลฝึกและชุดข้อมูลตรวจสอบได้ผ่านกระบวนการฝึกฝนกับ pre-trained models ก่อนที่จะเริ่มการฝึกแบบจำลองงานวิจัยนี้ประเมินประสิทธิภาพของแบบจำลองผ่านตัวชี้วัด ได้แก่ ความแม่นยำ (Accuracy), ความถูกต้อง (Precision), การเรียกคืน (Recall) และค่า F1-score ผลการทดลองพบว่า Swin Transformer มีค่าความแม่นยำสูงสุดที่ 0.9915 รองลงมาคือ ViT เท่ากับ 0.9720 และ DeiT เท่ากับ 0.9525 แสดงให้เห็นว่า Swin Transformer สามารถเรียนรู้คุณลักษณะของภาพและจำแนกภาพจริงจากภาพปลอมได้อย่างมีประสิทธิภาพมากที่สุด ทั้งนี้ งานวิจัยในอนาคตควรพิจารณาการพัฒนาแบบจำลองเพิ่มเติม เช่น การปรับปรุงคุณภาพของภาพก่อนการทดสอบ การเพิ่มขนาดและความหลากหลายของชุดข้อมูล และการปรับค่าพารามิเตอร์ของแบบจำลองให้เหมาะสมยิ่งขึ้น เพื่อเพิ่มประสิทธิภาพในการจำแนกภาพ Deepfake ได้อย่างแม่นยำยิ่งขึ้น
URI:	http://ir-ithesis.swu.ac.th/dspace/handle/123456789/3410
Appears in Collections:	Faculty of Science

Files in This Item:

File	Description	Size	Format
gs661160163.pdf		2.87 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets