EEOB Publication - Berger-Wolf

BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos
Isla Duporge, Dan Rubenstein, Julie Barreau, Maksim Kholiavchenko, Meg Crofoot, Roi Harel, Tanya Berger-Wolf, Scott Wolf, Stephen Lee, Jenna Kline, Michelle Ramirez, Charles Stewart. 2024. https://arxiv.org/pdf/2405.17698
Abstract
Using unmanned aerial vehicles (UAVs) to track multiple individuals simultaneously in their natural environment is a powerful approach for better understanding group primate behavior. Previous studies have demonstrated that it is possible to automate the classification of primate behavior from video data, but these studies have been carried out in captivity or from ground-based cameras. However, to understand group behavior and the self-organization of a collective, the whole troop needs to be seen at a scale where behavior can be seen in relation to the natural environment in which ecological decisions are made. This study presents a novel dataset for baboon detection, tracking, and behavior recognition from drone videos. The foundation of our dataset is videos from drones flying over the Mpala Research Centre in Kenya. The baboon detection dataset was created by manually annotating all baboons in drone videos with bounding boxes. A tiling method was subsequently applied to create a pyramid of images at various scales from the original 5.3K resolution images, resulting in approximately 30K images used for baboon detection. The baboon tracking dataset is derived from the baboon detection dataset, where all bounding boxes are consistently assigned the same ID throughout the video. This process resulted in half an hour of very dense tracking data. The baboon behavior recognition dataset was generated by converting tracks into mini-scenes, a video subregion centered on each animal, each mini-scene was manually annotated with 12 distinct behavior types, and one additional category for occlusion, resulting in over 20 hours of data. Benchmark results show mean average precision (mAP) of 92.62% for the YOLOv8-X detection model, multiple object tracking precision (MOTA) of 63.81% for the BotSort tracking algorithm, and micro top-1 accuracy of 63.97% for the X3D behavior recognition model. Using deep learning to rapidly and accurately classify wildlife behavior from drone footage facilitates non-invasive data collection on behavior enabling the behavior of a whole group to be systematically and accurately recorded. The dataset can be accessed at https://baboonland.xyz.