2026
Petti, Daniel; Li, Changying; Liu, Ninghao
Contrastive multi-view representation learning for multi-camera plant phenotyping: A cotton field study Journal Article
In: Plant Phenomics, vol. 8, no. 2, pp. 100193, 2026, ISSN: 2643-6515.
Abstract | Links | BibTeX | Tags: Contrastive learning, Cotton yield estimation, High-throughput phenotyping, Machine vision, Self-Supervised Learning, UGV
@article{Petti2026,
title = {Contrastive multi-view representation learning for multi-camera plant phenotyping: A cotton field study},
author = {Daniel Petti and Changying Li and Ninghao Liu},
url = {https://www.sciencedirect.com/science/article/pii/S2643651526000300},
doi = {https://doi.org/10.1016/j.plaphe.2026.100193},
issn = {2643-6515},
year = {2026},
date = {2026-01-01},
journal = {Plant Phenomics},
volume = {8},
number = {2},
pages = {100193},
abstract = {Attempts to deploy computer vision in agricultural tasks often suffer from a shortage of annotated data. One strategy to alleviate the impact of limited data is Self-Supervised Learning (SSL), which involves pre-training a model on a pretext task that utilizes automatically generated annotations. The primary objective of this study is to leverage a multi-camera view dataset of cotton boll images for contrastive learning in order to enable phenotyping tasks with minimal data annotation. This dataset was collected in the field using six camera views. The efficacy of two contrastive learning frameworks (SimCLR and MoCo) in producing representations when positive examples originate from different cameras was investigated, and a comprehensive study of how the camera positions affect performance was conducted. After self-supervised pre-training, linear evaluation and semi-supervised learning experiments were performed on boll detection and plot status downstream tasks. In general, using multiple camera views with SimCLR and MoCo improves cotton boll detection mean average precision by 14% compared to vanilla SimCLR and MoCo. Through careful investigation using synthetic data, it was determined that relative camera poses with an intermediate amount of overlap seem more likely to perform well. Neither MoCo nor SimCLR was consistently superior to the other in this context. The representations embed meaningful features about the cotton plants, such as overall boll density, but also less meaningful ones, such as lighting variations. This technique could potentially accelerate the development of phenotyping algorithms based on data collected from field robots.},
keywords = {Contrastive learning, Cotton yield estimation, High-throughput phenotyping, Machine vision, Self-Supervised Learning, UGV},
pubstate = {published},
tppubtype = {article}
}
Attempts to deploy computer vision in agricultural tasks often suffer from a shortage of annotated data. One strategy to alleviate the impact of limited data is Self-Supervised Learning (SSL), which involves pre-training a model on a pretext task that utilizes automatically generated annotations. The primary objective of this study is to leverage a multi-camera view dataset of cotton boll images for contrastive learning in order to enable phenotyping tasks with minimal data annotation. This dataset was collected in the field using six camera views. The efficacy of two contrastive learning frameworks (SimCLR and MoCo) in producing representations when positive examples originate from different cameras was investigated, and a comprehensive study of how the camera positions affect performance was conducted. After self-supervised pre-training, linear evaluation and semi-supervised learning experiments were performed on boll detection and plot status downstream tasks. In general, using multiple camera views with SimCLR and MoCo improves cotton boll detection mean average precision by 14% compared to vanilla SimCLR and MoCo. Through careful investigation using synthetic data, it was determined that relative camera poses with an intermediate amount of overlap seem more likely to perform well. Neither MoCo nor SimCLR was consistently superior to the other in this context. The representations embed meaningful features about the cotton plants, such as overall boll density, but also less meaningful ones, such as lighting variations. This technique could potentially accelerate the development of phenotyping algorithms based on data collected from field robots.