Deep Learning Analytics Places 2nd in International Machine Learning Challenge
In 2018, the iNaturalist organization, with Microsoft and Google as sponsors at CVPR, hosted an international machine learning challenge to benchmark the state of the art in species identification from a photo of a living thing. Deep Learning Analytics finished 2nd, globally, behind only one larger team from China’s Dalian University of Technology, which produced a 1.1% better Top 3 performance (12.9%) than the Deep Learning Analytics entry (14.0%). Data scientists from Baidu (an international powerhouse in machine learning competitive with Google on many AI tasks) ranked 3rd in the competition. In total, 59 teams from all over the world entered.
iNaturalist 2018 competition organizers aimed to benchmark the state of the art performance for the difficult problem of species identification from a photo. The iNaturalist 2018 challenge lasted for three months. Entrants were ranked according to their Top 3 error, i.e. the percentage of an entrant’s ranked top three guesses of 8,142 species for a photo did not contain the correct species (Top3 error for random guessing is 99.96%).
A similar challenge was hosted in 2017, where Google (GMV, for Google Mountain View) won with 4.9% Top5 error. The 2018 iNaturalist Challenge increased in difficulty from 2017's iNaturalist Challenge in four ways. The 2018 iNaturalist Challenge had
Fewer total training images (~437,000 images in 2018 and ~675,000 in 2017)
More species (~5000 in 2017 and ~8000 in 2018)
A bigger imbalance in training data across species (75% of the species labels in 2018 had only 30 or fewer training examples) and
Scoring in 2018 only allowed the Top3 predictions (Top5 predictions were allowed in 2017).
To be competitive on such a challenge, the machine learning method must perform well on many of the least represented species without sacrificing outstanding performance on the most represented species in the training data.
Deep Learning Analytics’ entry in the iNaturalist 2018 challenge (1) establishes that in June 2018, the state of the art in the world for species identification from a photo trained on iNaturalist 2018 training data is approximately a top 3 error of 13% (2) Deep Learning Analytics is very competitive with that state of the art and (3) there is no other organization in the United States that demonstrated similar performance on this data.