EEOB publication - Berger-Wolf

BIOCLIP: A Vision Foundation Model for the Tree of Life
Samuel Stevens, Jiaman Wu, Matthew J Thompson1, Elizabeth G Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, and Yu Su. arXiv:2311.18803v2 [cs.CV] 4 Dec 2023
Abstract
Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an ex- plosion of computational methods and tools, particularly computer vision, for extracting biologically relevant infor- mation from images for science and conservation. Yet most of these are bespoke approaches designed for a specific task and are not easily adaptable or extendable to new ques- tions, contexts, and datasets. A vision model for general or- ganismal biology questions on images is of timely need. To approach this, we curate and release TREEOFLIFE-10M, the largest and most diverse ML-ready dataset of biology images. We then develop BIOCLIP, a foundation model for the tree of life, leveraging the unique properties of bi- ology captured by TREEOFLIFE-10M, namely the abun- dance and variety of images of plants, animals, and fungi, together with the availability of rich structured biological knowledge. We rigorously benchmark our approach on di- verse fine-grained biology classification tasks, and find that BIOCLIP consistently and substantially outperforms exist- ing baselines (by 17% to 20% absolute). Intrinsic evalua- tion reveals that BIOCLIP has learned a hierarchical rep- resentation conforming to the tree of life, shedding light on its strong generalizability.