EEOB Publication - Berger-Wolf

VLM4Bio: ABenchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
M. Maruf, Arka Daw, Kazi Sajeed Mehrab, Harish Babu Manogaran, Abhilash Neog, Medha Sawhney, Mridul Khurana, James P. Balhoff, Yasin Bakıs, Bahadir Altintas, Matthew J Thompson, Elizabeth G Campolongo, Josef C. Uyeda, Hilmar Lapp, Henry L. Bart Jr., Paula M. Mabee, Yu Su, Wei-Lun Chao, Charles Stewart, TanyaBerger-Wolf, Wasila Dahdul, Anuj Karpatne. 2408.16176v1
Abstract
Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). Weaskifpre-trained VLMscanaidscientists in answering a range of biologically relevant questions without any additional fine-tuning. In this paper, we evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio, consisting of 469K questionanswer pairs involving 30K images from three groups of organisms: fishes, birds, and butterflies, covering five biologically relevant tasks. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs, shedding new light on the capabilities of current SOTA VLMsin answering biologically relevant questions using images