CroPond-7B
CroPond-7B is a vision-language model specialized in cross-view point correspondence. Built upon Qwen2.5-VL-7B-Instruct and mainly trained on the CrossPoint-378K dataset, CroPond achieves state-of-the-art performance on cross-view correspondence tasks.
Evaluation
For detailed evaluation instructions, please visit the GitHub repository.
Citation
@article{wang2025crosspoint,
title={Towards Cross-View Point Correspondence in Vision-Language Models},
author={Wang, Yipu and Ji, Yuheng and Liu, Yuyang and Zhou, Enshen and Yang, Ziqiang and Tian, Yuxuan and Qin, Ziheng and Liu, Yue and Tan, Huajie and Chi, Cheng and Ma, Zhiyuan and Zeng, Daniel Dajun and Zheng, Xiaolong},
journal={arXiv preprint arXiv:2512.04686},
year={2025}
}
- Downloads last month
- 43