Residual Convolutional Autoencoder for Deepfake Detection

Multi-dataset trained model with 19.18x separation between real and fake images.

Model Performance

Training Time: 21.4 minutes on H200 GPU
Best Validation Loss: 0.007970 (Epoch 29)
Anomaly Separation: 19.18x (fake images have 19x higher reconstruction error)
Datasets: CIFAR-10, CIFAR-100, STL-10 (205,000 training images)

Quick Start

from huggingface_hub import hf_hub_download
import torch
from model import ResidualConvAutoencoder
from torchvision import transforms
from PIL import Image
import json

# Download model and thresholds
checkpoint_path = hf_hub_download(
    repo_id="ash12321/deepfake-autoencoder-cifar10-v2", 
    filename="model_universal_best.ckpt"
)
threshold_path = hf_hub_download(
    repo_id="ash12321/deepfake-autoencoder-cifar10-v2", 
    filename="thresholds_calibrated.json"
)

# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = ResidualConvAutoencoder(latent_dim=512, dropout=0.1).to(device)
checkpoint = torch.load(checkpoint_path, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Load thresholds
with open(threshold_path) as f:
    thresholds = json.load(f)

# Prepare image
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

image = Image.open("your_image.jpg").convert('RGB')
image_tensor = transform(image).unsqueeze(0).to(device)

# Get reconstruction error
with torch.no_grad():
    error = model.reconstruction_error(image_tensor)
    error_value = error.item()
    print(f"Reconstruction error: {error_value:.6f}")

# Check against threshold (balanced mode)
balanced_threshold = thresholds['reconstruction_thresholds']['thresholds']['balanced']['value']
if error_value > balanced_threshold:
    print("⚠️  Potential deepfake detected!")
else:
    print("✅ Image appears authentic")

Detection Thresholds

Three calibrated threshold levels:

Mode	Threshold	False Positive Rate	Description
Strict	0.055737	~1%	Very low false positives
Balanced	0.039442	~5%	Recommended for general use
Sensitive	~0.039	~2.5%	More sensitive detection

Model Architecture

Encoder: 5 downsampling blocks (128→64→32→16→8→4)
Latent Space: 512 dimensions
Decoder: 5 upsampling blocks (4→8→16→32→64→128)
Residual Blocks: Skip connections with dropout (0.1)
Total Parameters: ~40M

Training Details

Epochs: 30 (best at epoch 29)
Batch Size: 1024
Optimizer: AdamW (lr=1e-4, weight_decay=1e-5)
Scheduler: Cosine Annealing with Warm Restarts
Data Augmentation: Horizontal flip, color jitter
Mixed Precision: AMP enabled

Statistics

Real Images

Mean error: 0.018391
Median error: 0.015647
Std: 0.010279
95th percentile: 0.039442
99th percentile: 0.055737

Fake Images (Synthetic)

Mean error: 0.352695
Median error: 0.347151

Separation Ratio: 19.18x 🎯

Files

model_universal_best.ckpt - Full checkpoint (418MB)
thresholds_calibrated.json - Calibrated thresholds
model.py - Model architecture
config.json - Training configuration
README.md - This file

Citation

@misc{deepfake_autoencoder_2024,
  title={Residual Convolutional Autoencoder for Deepfake Detection},
  author={Your Name},
  year={2024},
  publisher={HuggingFace},
  url={https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}
}

License

MIT License