Loading...
🤗
Dataset🤗
Leaderboard LAION Blog
VisIT-Bench is a new vision-language instruction following benchmark inspired by real-world use cases. Testing 70 diverse “wish-list” skills with an automated ranking system, it advances the ongoing assessment of multimodal chatbot performance.
Why VisIT-Bench 🤔?
Though recent VLMs have shown promise in following instructions, their evaluation for real-world human-chatbot instructions is often limited. Typically, VLMs are evaluated through qualitative comparison of outputs, which makes it challenging to quantify progress and potential shortcomings. VisIT-Bench helps address this problem by offering a comprehensive testbed for measuring model performance across a diverse set of instruction-following tasks, inspired by real world scenarios.🌍
@misc{bitton2023visitbench,
title={VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use},
author={Yonatan Bitton and Hritik Bansal and Jack Hessel and Rulin Shao and Wanrong Zhu and Anas Awadalla and Josh Gardner and Rohan Taori and Ludwig Schmidt},
year={2023},
eprint={2308.06595},
archivePrefix={arXiv},
primaryClass={cs.CL}
}