Instituto de Astrofísica e Ciências do Espaço

Deep tracks: Using deep learning and procedurally simulated data for automated vertebrate footprints classification

C. S. Marques, A. Mota, M. Belvedere, D. Castanera, I. Díaz-Martínez, E. Malafaia, S. Pereira, L. M. Rosalino, V. F. Santos, L. Sciscio, E. Dufourq

Abstract
The study of vertebrate footprints provides useful information on animal behavior, locomotion, and ecology. However, automatically classifying these records using photographs is difficult due to the significant morphological variation in footprints and the lack of readily available labeled datasets. To address this issue, this study developed Deep Tracks, a novel Unity application to procedurally create a dataset of simulated footprint images. Two datasets were used to evaluate the influence and impact of the simulated dataset on real footprint classification: (1) a dataset comprising 40,000 simulated footprints, (2) approximately 1,500 real vertebrate footprints from 10 different vertebrate groups. Both simulated and real footprints belong to the following clades: Mammalia (coyotes, foxes, bears, otters, squirrels, raccoons, deer), avian Dinosauria (turkeys) and non-avian Dinosauria (theropods, sauropods). Convolutional Neural Networks (CNNs) were used to classify the different datasets either from the simulated or real footprints. An initial comparison of five different architectures (DenseNet-121, ResNet-18, ResNet-50, EfficientNet-b0, and InceptionNet-v3) was done using the simulated dataset, with EfficientNet-b0 presenting better metrics results. Seven experimental configurations were designed to evaluate different strategies for incorporating the real data into the model development. The first configuration involved training and testing exclusively on real footprints, without any simulated data. The second configuration trained the model on real data, but tested it on simulated footprints. The third configuration used transfer learning to fine-tune a CNN, initially trained on simulated data, for classifying real footprint images. The remaining four configurations incorporated simulated data into the training process alongside a fixed percentage of real data — 20%, 50%, 80%, or 100%. The application of fine-tuning led to an accuracy improvement of over 30% in classifying real footprints, compared to a CNN trained solely on real data. These results highlight the significance of advanced data augmentation techniques in improving both accuracy and reliability in vertebrate footprint classification, particularly in scenarios with limited real data availability.

Keywords
Simulated data / Vertebrate footprints / Transfer learning / Deep learning / Automatic classification

Ecological Informatics
Volume 93, Number 103523
2025 December

>> ADS>> DOI