HATSAT / README.md
BorisEm's picture
Change title
4993aa4

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: HAT Super-Resolution for Satellite Images
emoji: 🛰️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.46.1
app_file: app.py
pinned: false

HATSAT - Super-Resolution for Satellite Images

This Hugging Face Space demonstrates a fine-tuned Hybrid Attention Transformer (HAT) model for satellite image super-resolution. The model performs 4x upscaling of satellite imagery, enhancing the resolution while preserving important geographical and structural details.

Model Details

  • Architecture: HAT (Hybrid Attention Transformer)
  • Upscaling Factor: 4x
  • Input Channels: 3 (RGB)
  • Training: Fine-tuned on satellite imagery dataset
  • Base Model: Pre-trained HAT model from ImageNet

Model Configuration

  • Window Size: 16
  • Embed Dimension: 180
  • Depths: [6, 6, 6, 6, 6, 6]
  • Number of Heads: [6, 6, 6, 6, 6, 6]
  • Compress Ratio: 3
  • Squeeze Factor: 30
  • Overlap Ratio: 0.5

Usage

  1. Upload a satellite image (RGB format)
  2. The model will automatically upscale it by 4x
  3. Download the enhanced high-resolution result

Training Details

The model was fine-tuned using:

  • Loss Function: L1Loss
  • Optimizer: Adam (lr=2e-5)
  • Training Iterations: 20,000
  • Scheduler: MultiStepLR with milestones at [10000, 50000, 100000, 130000, 140000]

Applications

This model is particularly useful for:

  • Enhancing low-resolution satellite imagery
  • Geographic analysis and mapping
  • Environmental monitoring
  • Urban planning and development
  • Agricultural monitoring

Technical Implementation

The model implements several key architectural components:

  • Hybrid Attention Blocks (HAB): Combining window-based and overlapping attention
  • Overlapping Cross-Attention Blocks (OCAB): For enhanced feature extraction
  • Residual Hybrid Attention Groups (RHAG): Stacked attention layers with residual connections
  • Channel Attention Blocks (CAB): For feature refinement

Performance

The model has been trained for 20,000 iterations with careful monitoring of PSNR and SSIM metrics on satellite imagery validation data.

Acknowledgments

This model is a fine tuned version of HAT (Hybrid Attention Transformer) and trained on the SEN2NAIPv2 dataset.

Base Model: HAT

Training Dataset: SEN2NAIPv2

Citation

If you use this model in your research, please cite both the original HAT paper and the SEN2NAIPv2 dataset:

@article{chen2023hat,
  title={Activating More Pixels in Image Super-Resolution Transformer},
  author={Chen, Xiangyu and Wang, Xintao and Zhou, Jiantao and Qiao, Yu and Dong, Chao},
  journal={arXiv preprint arXiv:2205.04437},
  year={2022}
}

@misc{sen2naipv2,
  title={SEN2NAIPv2: A Large-Scale Dataset for Satellite Image Super-Resolution},
  author={TACO Foundation},
  year={2024},
  url={https://huggingface.co/datasets/tacofoundation/SEN2NAIPv2}
}