DocType - Document Image Classification
A high-performance MobileNetV3-based document classifier that categorizes document images into 7 distinct types. Optimized for production deployment with ONNX format.
๐ฏ Model Overview
This model classifies document images into the following categories:
| Category | Description |
|---|---|
| chart | Charts, graphs, and data visualizations |
| diagram | Flowcharts, diagrams, and technical drawings |
| document_handwritten | Handwritten documents and notes |
| document_printed | Printed text documents |
| map | Maps and geographic visualizations |
| photo | Photographs and natural images |
| screenshot | Screenshots and screen captures |
๐ Performance
Model Metrics
- Architecture: MobileNetV3-Large (transfer learning + fine-tuning)
- Input Size: 320ร320 pixels
- Parameters: ~5.4M (lightweight and efficient)
- Inference Time: ~10-30ms on CPU (depending on hardware)
Training Details
- Dataset Size: 21,000 images (17,500 train / 2,100 val / 1,400 test)
- Training Strategy:
- Phase 1: Transfer learning with frozen base (40 epochs)
- Phase 2: Fine-tuning entire model (20 epochs)
- Data Augmentation: Rotation, shifts, zoom, brightness variation
- Optimizer: Adam (lr=0.001 โ 1e-5 for fine-tuning)
๐ฎ Citation
If you use this model in your research or project, please cite.