ash12321
/

deepfake-autoencoder-cifar10-v2

ResidualConvAutoencoder

Model card Files Files and versions

xet

Community

ash12321 commited on 2 days ago

Commit

cab54a3

verified ·

1 Parent(s): cc156cd

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +74 -232

README.md CHANGED Viewed

@@ -1,142 +1,43 @@
----
-license: mit
-tags:
-- pytorch
-- autoencoder
-- deepfake-detection
-- cifar10
-- computer-vision
-- image-reconstruction
-- anomaly-detection
-datasets:
-- cifar10
-metrics:
-- mse
-library_name: pytorch
-pipeline_tag: image-feature-extraction
----
 # Residual Convolutional Autoencoder for Deepfake Detection
-## Model Description
-This is a **5-stage Residual Convolutional Autoencoder** trained on CIFAR-10 for high-quality image reconstruction and deepfake detection. The model achieves exceptional reconstruction quality (Test MSE: 0.004290) with **100% detection rate** on out-of-distribution images at calibrated thresholds.
-### Key Features
-✨ **Exceptional Performance**: 98.4% loss reduction during training
-🎯 **Perfect Detection**: 100% TPR with calibrated thresholds
-🚀 **Fast Inference**: ~3,600 samples/sec on H100
-📊 **Calibrated Thresholds**: Real thresholds from distribution analysis
-📦 **Complete Package**: Model + thresholds + examples + docs
-### Architecture
-- **Encoder**: 5 downsampling stages (128→64→32→16→8→4) with residual blocks
-- **Latent Dimension**: 512
-- **Decoder**: 5 upsampling stages with residual blocks
-- **Total Parameters**: 34,849,667
-- **Input Size**: 128x128x3 (RGB images)
-- **Output Range**: [-1, 1] (Tanh activation)
-## Training Details
-### Training Data
-- **Dataset**: CIFAR-10 (50,000 training images, 10,000 test images)
-- **Image Size**: Resized to 128x128
-- **Normalization**: Mean=0.5, Std=0.5 (range [-1, 1])
-### Training Configuration
-- **GPU**: NVIDIA H100 80GB HBM3
-- **Batch Size**: 1024
-- **Optimizer**: AdamW (lr=1e-3, weight_decay=1e-5)
-- **Loss Function**: MSE (Mean Squared Error)
-- **Scheduler**: ReduceLROnPlateau (factor=0.5, patience=5)
-- **Epochs**: 100
-- **Training Time**: ~26 minutes
-### Training Results
-- **Initial Validation Loss**: 0.266256 (Epoch 1)
-- **Final Validation Loss**: 0.004294 (Epoch 100)
-- **Final Test Loss**: 0.004290
-- **Improvement**: 98.4% reduction in loss
-## Performance
-### Reconstruction Quality
-| Metric | Value |
-|--------|-------|
-| Test MSE Loss | 0.004290 |
-| Validation MSE Loss | 0.004294 |
-| Training Time | 26.24 minutes |
-| Parameters | 34,849,667 |
-| GPU Memory | ~40GB peak |
-| Throughput | ~3,600 samples/sec |
-### Detection Performance (Calibrated on Random Noise vs CIFAR-10)
-| Distribution | Mean Error | Median Error | Error Ratio |
-|-------------|-----------|--------------|-------------|
-| **Real Images (CIFAR-10)** | 0.004293 | 0.003766 | 1.00x |
-| **Fake Images (Random Noise)** | 0.401686 | 0.401680 | **93.56x** |
-**Separation Quality**: 93.56x ratio demonstrates excellent discrimination capability!
-## Calibrated Detection Thresholds
-These thresholds are **scientifically calibrated** based on actual error distributions:
-| Threshold | MSE Value | True Positive Rate | False Positive Rate | Use Case |
-|-----------|-----------|-------------------|---------------------|----------|
-| **Strict** | 0.012768 | 100.0% | 1.0% | High-stakes verification |
-| **Balanced** | 0.009066 | 100.0% | 5.0% | General detection |
-| **Sensitive** | 0.009319 | 100.0% | 4.5% | Screening applications |
-| **Optimal** | 0.204039 | 100.0% | 0.0% | Maximum separation |
-💡 **All thresholds achieve 100% detection** on out-of-distribution images while maintaining low false positive rates on real images.
-See `thresholds_calibrated.json` for complete calibration data and statistics.
 ## Quick Start
-### Installation
-```bash
-pip install torch torchvision huggingface_hub pillow
-```
-### Basic Usage
 ```python
 from huggingface_hub import hf_hub_download
-from model import load_model
 import torch
 from torchvision import transforms
 from PIL import Image
 import json
 # Download model and thresholds
 checkpoint_path = hf_hub_download(
-    repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
-    filename="model_best_checkpoint.ckpt"
 )
-thresholds_path = hf_hub_download(
-    repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
     filename="thresholds_calibrated.json"
 )
 # Load model
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-model = load_model(checkpoint_path, device)
-# Load calibrated thresholds
-with open(thresholds_path, 'r') as f:
-    config = json.load(f)
-    threshold = config['reconstruction_thresholds']['thresholds']['balanced']['value']
-print(f"Using threshold: {threshold:.6f}")
 # Prepare image
 transform = transforms.Compose([
@@ -146,142 +47,83 @@ transform = transforms.Compose([
 ])
 image = Image.open("your_image.jpg").convert('RGB')
-input_tensor = transform(image).unsqueeze(0).to(device)
-# Detect deepfake
 with torch.no_grad():
-    error = model.reconstruction_error(input_tensor, reduction='none')
-is_fake = error.item() > threshold
-print(f"Image is {'FAKE' if is_fake else 'REAL'}")
-print(f"Reconstruction error: {error.item():.6f}")
-print(f"Threshold: {threshold:.6f}")
 ```
-## Reconstruction Examples
-![Reconstruction Comparison](reconstruction_comparison.png)
-Original CIFAR-10 images (top) vs reconstructions (bottom) showing excellent quality.
-![Threshold Calibration](threshold_calibration.png)
-Error distribution analysis showing clear separation between real and fake images.
-## Files in This Repository
-- `model_best_checkpoint.ckpt` - Trained model weights (621 MB)
-- `model.py` - Model architecture and utilities
-- `thresholds_calibrated.json` - **Real calibrated thresholds** with statistics
-- `inference_example.py` - Complete working examples
-- `reconstruction_comparison.png` - CIFAR-10 reconstruction quality
-- `threshold_calibration.png` - Distribution analysis visualization
-- `config.json` - Model metadata
-## Advanced Usage
-### Using Calibrated Thresholds
-```python
-import json
-# Load all threshold options
-with open('thresholds_calibrated.json', 'r') as f:
-    config = json.load(f)
-thresholds = config['reconstruction_thresholds']['thresholds']
-# Choose based on your use case
-strict_threshold = thresholds['strict']['value']      # 1% FPR
-balanced_threshold = thresholds['balanced']['value']  # 5% FPR
-optimal_threshold = thresholds['optimal']['value']    # 0% FPR
-print(f"Strict (99th percentile): {strict_threshold:.6f}")
-print(f"Balanced (95th percentile): {balanced_threshold:.6f}")
-print(f"Optimal (max separation): {optimal_threshold:.6f}")
-```
-### Batch Processing
-```python
-# Process multiple images efficiently
-images = torch.stack([transform(Image.open(f)) for f in image_paths])
-images = images.to(device)
-with torch.no_grad():
-    errors = model.reconstruction_error(images, reduction='none')
-    fake_mask = errors > threshold
-num_fakes = fake_mask.sum().item()
-print(f"Detected {num_fakes}/{len(image_paths)} potential fakes")
-# Print individual results
-for i, (path, error, is_fake) in enumerate(zip(image_paths, errors, fake_mask)):
-    status = "FAKE" if is_fake else "REAL"
-    print(f"{path}: {status} (error: {error:.6f})")
-```
-### Calibration Statistics
-The model was calibrated using:
-- **Real Images**: CIFAR-10 test set (10,000 images)
-- **Fake Images**: Random noise (10,000 synthetic samples)
-- **Mean Separation**: 93.56x ratio
-- **Perfect Discrimination**: 100% TPR at all thresholds
-## Applications
-- ✅ **Deepfake Detection**: 100% detection on out-of-distribution images
-- ✅ **Anomaly Detection**: Identify unusual or manipulated images
-- ✅ **Quality Assessment**: Measure image quality through reconstruction
-- ✅ **Feature Extraction**: 512-D latent representations
-- ✅ **Image Compression**: Compress to latent space
-- ✅ **Domain Shift Detection**: Identify distribution changes
-## Limitations & Recommendations
-### Limitations
-- Trained on CIFAR-10 (32x32 upscaled to 128x128)
-- Thresholds calibrated on random noise (not real deepfakes)
-- Performance may vary on high-resolution images
-- Requires fine-tuning for specific deepfake detection tasks
-### Recommendations
-- **For Production**: Recalibrate thresholds on your target distribution
-- **For High-Res Images**: Consider fine-tuning on larger images
-- **For Real Deepfakes**: Calibrate with actual deepfake datasets
-- **For Best Results**: Use ensemble with other detection methods
 ## Citation
-If you use this model in your research, please cite:
 ```bibtex
-@misc{deepfake-autoencoder-cifar10-v2,
-  author = {ash12321},
-  title = {Residual Convolutional Autoencoder for Deepfake Detection},
-  year = {2024},
-  publisher = {HuggingFace},
-  howpublished = {\url{https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}}
 }
 ```
 ## License
-MIT License - See LICENSE file for details
-## Model Card Authors
-- **ash12321**
-## Acknowledgments
-- Trained on NVIDIA H100 80GB HBM3
-- Built with PyTorch 2.5.1
-- Thresholds calibrated using distribution analysis
----
-*Model trained and calibrated on December 08, 2025*
-**Status**: ✅ Production Ready with Calibrated Thresholds

 # Residual Convolutional Autoencoder for Deepfake Detection
+Multi-dataset trained model with **19.18x separation** between real and fake images.
+## Model Performance
+- **Training Time**: 21.4 minutes on H200 GPU
+- **Best Validation Loss**: 0.007970 (Epoch 29)
+- **Anomaly Separation**: 19.18x (fake images have 19x higher reconstruction error)
+- **Datasets**: CIFAR-10, CIFAR-100, STL-10 (205,000 training images)
 ## Quick Start
 ```python
 from huggingface_hub import hf_hub_download
 import torch
+from model import ResidualConvAutoencoder
 from torchvision import transforms
 from PIL import Image
 import json
 # Download model and thresholds
 checkpoint_path = hf_hub_download(
+    repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
+    filename="model_universal_best.ckpt"
 )
+threshold_path = hf_hub_download(
+    repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
     filename="thresholds_calibrated.json"
 )
 # Load model
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+model = ResidualConvAutoencoder(latent_dim=512, dropout=0.1).to(device)
+checkpoint = torch.load(checkpoint_path, map_location=device)
+model.load_state_dict(checkpoint['model_state_dict'])
+model.eval()
+# Load thresholds
+with open(threshold_path) as f:
+    thresholds = json.load(f)
 # Prepare image
 transform = transforms.Compose([
 ])
 image = Image.open("your_image.jpg").convert('RGB')
+image_tensor = transform(image).unsqueeze(0).to(device)
+# Get reconstruction error
 with torch.no_grad():
+    error = model.reconstruction_error(image_tensor)
+    error_value = error.item()
+    print(f"Reconstruction error: {error_value:.6f}")
+# Check against threshold (balanced mode)
+balanced_threshold = thresholds['reconstruction_thresholds']['thresholds']['balanced']['value']
+if error_value > balanced_threshold:
+    print("⚠️  Potential deepfake detected!")
+else:
+    print("✅ Image appears authentic")
 ```
+## Detection Thresholds
+Three calibrated threshold levels:
+| Mode | Threshold | False Positive Rate | Description |
+|------|-----------|---------------------|-------------|
+| **Strict** | 0.055737 | ~1% | Very low false positives |
+| **Balanced** | 0.039442 | ~5% | Recommended for general use |
+| **Sensitive** | ~0.039 | ~2.5% | More sensitive detection |
+## Model Architecture
+- **Encoder**: 5 downsampling blocks (128→64→32→16→8→4)
+- **Latent Space**: 512 dimensions
+- **Decoder**: 5 upsampling blocks (4→8→16→32→64→128)
+- **Residual Blocks**: Skip connections with dropout (0.1)
+- **Total Parameters**: ~40M
+## Training Details
+- **Epochs**: 30 (best at epoch 29)
+- **Batch Size**: 1024
+- **Optimizer**: AdamW (lr=1e-4, weight_decay=1e-5)
+- **Scheduler**: Cosine Annealing with Warm Restarts
+- **Data Augmentation**: Horizontal flip, color jitter
+- **Mixed Precision**: AMP enabled
+## Statistics
+### Real Images
+- Mean error: 0.018391
+- Median error: 0.015647
+- Std: 0.010279
+- 95th percentile: 0.039442
+- 99th percentile: 0.055737
+### Fake Images (Synthetic)
+- Mean error: 0.352695
+- Median error: 0.347151
+**Separation Ratio**: 19.18x 🎯
+## Files
+- `model_universal_best.ckpt` - Full checkpoint (418MB)
+- `thresholds_calibrated.json` - Calibrated thresholds
+- `model.py` - Model architecture
+- `config.json` - Training configuration
+- `README.md` - This file
 ## Citation
 ```bibtex
+@misc{deepfake_autoencoder_2024,
+  title={Residual Convolutional Autoencoder for Deepfake Detection},
+  author={Your Name},
+  year={2024},
+  publisher={HuggingFace},
+  url={https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}
 }
 ```
 ## License
+MIT License