left|,circlearrowright,text{BUS},right|: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles Paper • 2511.01340 • Published Nov 3 • 12
When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs Paper • 2509.16633 • Published Sep 20 • 2
COFAR: Commonsense and Factual Reasoning in Image Search Paper • 2210.08554 • Published Oct 16, 2022 • 1
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering Paper • 2306.16713 • Published Jun 29, 2023 • 1
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant Paper • 2410.19144 • Published Oct 24, 2024 • 1
Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs Paper • 2508.17334 • Published Aug 24 • 2