Object Detection Using Faster R-CNN and YOLO
Comprehensive comparison of Faster R-CNN and YOLOv8 for object detection on a custom dataset with laptop, mouse, keyboard, and utensils.
This project implements and compares two state-of-the-art object detection algorithms - Faster R-CNN and YOLOv8 - for the Duke AIPI 590 Applied Computer Vision Course. The project involves training, evaluating, and comparing these models’ performance on a custom dataset with four object categories: laptop, mouse, keyboard, and utensils.
Dataset
- Total Images: 362 manually annotated images
- Classes: Laptop, Mouse, Keyboard, Utensils
- Annotation Tool: CVAT for precise manual labeling
- Data Quality: Consistent annotations across both YOLOv8 and Faster R-CNN formats
Models Implemented
YOLOv8
- Architecture: Real-time, single-stage object detector
- Training: 30 epochs with 5-epoch update intervals
- Performance: mAP@0.5: 0.858, mAP@0.5:0.95: 0.641
- Inference Speed: ~3.5 ms per image
- Model Size: 5.26 MB
- Deployment: Exported to ONNX format for cross-platform compatibility
Faster R-CNN
- Architecture: Two-stage detector with ResNet-50 backbone and Feature Pyramid Network (FPN)
- Training: 3000 iterations with gradient clipping
- Performance: mAP@0.5: 0.709, mAP@0.5:0.95: 0.521
- Inference Speed: ~60 ms per image
- Model Size: 314.85 MB
Key Results
The comparison reveals important trade-offs between the two approaches:
- YOLOv8: Superior speed and efficiency with competitive accuracy
- Faster R-CNN: Higher accuracy but significantly larger model size and slower inference
- Speed vs. Accuracy: Clear trade-off between real-time performance and detection precision
Technical Contributions
- Custom dataset creation and annotation
- Comprehensive model training and evaluation pipeline
- Performance benchmarking across multiple metrics
- Cross-platform deployment optimization
- Detailed comparative analysis of modern object detection architectures
This project demonstrates practical implementation skills in computer vision and provides valuable insights into the performance characteristics of different object detection paradigms.