ash12321's picture
Upload README.md with huggingface_hub
cab54a3 verified

Residual Convolutional Autoencoder for Deepfake Detection

Multi-dataset trained model with 19.18x separation between real and fake images.

Model Performance

  • Training Time: 21.4 minutes on H200 GPU
  • Best Validation Loss: 0.007970 (Epoch 29)
  • Anomaly Separation: 19.18x (fake images have 19x higher reconstruction error)
  • Datasets: CIFAR-10, CIFAR-100, STL-10 (205,000 training images)

Quick Start

from huggingface_hub import hf_hub_download
import torch
from model import ResidualConvAutoencoder
from torchvision import transforms
from PIL import Image
import json

# Download model and thresholds
checkpoint_path = hf_hub_download(
    repo_id="ash12321/deepfake-autoencoder-cifar10-v2", 
    filename="model_universal_best.ckpt"
)
threshold_path = hf_hub_download(
    repo_id="ash12321/deepfake-autoencoder-cifar10-v2", 
    filename="thresholds_calibrated.json"
)

# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = ResidualConvAutoencoder(latent_dim=512, dropout=0.1).to(device)
checkpoint = torch.load(checkpoint_path, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Load thresholds
with open(threshold_path) as f:
    thresholds = json.load(f)

# Prepare image
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

image = Image.open("your_image.jpg").convert('RGB')
image_tensor = transform(image).unsqueeze(0).to(device)

# Get reconstruction error
with torch.no_grad():
    error = model.reconstruction_error(image_tensor)
    error_value = error.item()
    print(f"Reconstruction error: {error_value:.6f}")

# Check against threshold (balanced mode)
balanced_threshold = thresholds['reconstruction_thresholds']['thresholds']['balanced']['value']
if error_value > balanced_threshold:
    print("⚠️  Potential deepfake detected!")
else:
    print("βœ… Image appears authentic")

Detection Thresholds

Three calibrated threshold levels:

Mode Threshold False Positive Rate Description
Strict 0.055737 ~1% Very low false positives
Balanced 0.039442 ~5% Recommended for general use
Sensitive ~0.039 ~2.5% More sensitive detection

Model Architecture

  • Encoder: 5 downsampling blocks (128β†’64β†’32β†’16β†’8β†’4)
  • Latent Space: 512 dimensions
  • Decoder: 5 upsampling blocks (4β†’8β†’16β†’32β†’64β†’128)
  • Residual Blocks: Skip connections with dropout (0.1)
  • Total Parameters: ~40M

Training Details

  • Epochs: 30 (best at epoch 29)
  • Batch Size: 1024
  • Optimizer: AdamW (lr=1e-4, weight_decay=1e-5)
  • Scheduler: Cosine Annealing with Warm Restarts
  • Data Augmentation: Horizontal flip, color jitter
  • Mixed Precision: AMP enabled

Statistics

Real Images

  • Mean error: 0.018391
  • Median error: 0.015647
  • Std: 0.010279
  • 95th percentile: 0.039442
  • 99th percentile: 0.055737

Fake Images (Synthetic)

  • Mean error: 0.352695
  • Median error: 0.347151

Separation Ratio: 19.18x 🎯

Files

  • model_universal_best.ckpt - Full checkpoint (418MB)
  • thresholds_calibrated.json - Calibrated thresholds
  • model.py - Model architecture
  • config.json - Training configuration
  • README.md - This file

Citation

@misc{deepfake_autoencoder_2024,
  title={Residual Convolutional Autoencoder for Deepfake Detection},
  author={Your Name},
  year={2024},
  publisher={HuggingFace},
  url={https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}
}

License

MIT License