Residual Convolutional Autoencoder for Deepfake Detection
Multi-dataset trained model with 19.18x separation between real and fake images.
Model Performance
- Training Time: 21.4 minutes on H200 GPU
- Best Validation Loss: 0.007970 (Epoch 29)
- Anomaly Separation: 19.18x (fake images have 19x higher reconstruction error)
- Datasets: CIFAR-10, CIFAR-100, STL-10 (205,000 training images)
Quick Start
from huggingface_hub import hf_hub_download
import torch
from model import ResidualConvAutoencoder
from torchvision import transforms
from PIL import Image
import json
# Download model and thresholds
checkpoint_path = hf_hub_download(
repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
filename="model_universal_best.ckpt"
)
threshold_path = hf_hub_download(
repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
filename="thresholds_calibrated.json"
)
# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = ResidualConvAutoencoder(latent_dim=512, dropout=0.1).to(device)
checkpoint = torch.load(checkpoint_path, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Load thresholds
with open(threshold_path) as f:
thresholds = json.load(f)
# Prepare image
transform = transforms.Compose([
transforms.Resize((128, 128)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])
image = Image.open("your_image.jpg").convert('RGB')
image_tensor = transform(image).unsqueeze(0).to(device)
# Get reconstruction error
with torch.no_grad():
error = model.reconstruction_error(image_tensor)
error_value = error.item()
print(f"Reconstruction error: {error_value:.6f}")
# Check against threshold (balanced mode)
balanced_threshold = thresholds['reconstruction_thresholds']['thresholds']['balanced']['value']
if error_value > balanced_threshold:
print("β οΈ Potential deepfake detected!")
else:
print("β
Image appears authentic")
Detection Thresholds
Three calibrated threshold levels:
| Mode | Threshold | False Positive Rate | Description |
|---|---|---|---|
| Strict | 0.055737 | ~1% | Very low false positives |
| Balanced | 0.039442 | ~5% | Recommended for general use |
| Sensitive | ~0.039 | ~2.5% | More sensitive detection |
Model Architecture
- Encoder: 5 downsampling blocks (128β64β32β16β8β4)
- Latent Space: 512 dimensions
- Decoder: 5 upsampling blocks (4β8β16β32β64β128)
- Residual Blocks: Skip connections with dropout (0.1)
- Total Parameters: ~40M
Training Details
- Epochs: 30 (best at epoch 29)
- Batch Size: 1024
- Optimizer: AdamW (lr=1e-4, weight_decay=1e-5)
- Scheduler: Cosine Annealing with Warm Restarts
- Data Augmentation: Horizontal flip, color jitter
- Mixed Precision: AMP enabled
Statistics
Real Images
- Mean error: 0.018391
- Median error: 0.015647
- Std: 0.010279
- 95th percentile: 0.039442
- 99th percentile: 0.055737
Fake Images (Synthetic)
- Mean error: 0.352695
- Median error: 0.347151
Separation Ratio: 19.18x π―
Files
model_universal_best.ckpt- Full checkpoint (418MB)thresholds_calibrated.json- Calibrated thresholdsmodel.py- Model architectureconfig.json- Training configurationREADME.md- This file
Citation
@misc{deepfake_autoencoder_2024,
title={Residual Convolutional Autoencoder for Deepfake Detection},
author={Your Name},
year={2024},
publisher={HuggingFace},
url={https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}
}
License
MIT License