Picture-to-amount (PITA): Predicting relative ingredient amounts from food images

Jiatong Li, Fangda Han, Ricardo Guerrero, Vladimir Pavlovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Increased awareness of the impact of food consumption on health and lifestyle today has given rise to novel data-driven food analysis systems. Although these systems may recognize the ingredients, a detailed analysis of their amounts in the meal, which is paramount for estimating the correct nutrition, is usually ignored. In this paper, we study the novel and challenging problem of predicting the relative amount of each ingredient from a food image. We propose PITA, the Picture-to-Amount deep learning architecture to solve the problem. More specifically, we predict the ingredient amounts using a domain-driven Wasserstein loss from image-to-recipe cross-modal embeddings learned to align the two views of food data. Experiments on a dataset of recipes collected from the Internet show the model generates promising results and improves the baselines on this challenging task. A demo of our system and our data is available at: foodai.cs.rutgers.edu.

Original languageEnglish (US)
Title of host publicationProceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages10343-10350
Number of pages8
ISBN (Electronic)9781728188089
DOIs
StatePublished - 2020
Externally publishedYes
Event25th International Conference on Pattern Recognition, ICPR 2020 - Virtual, Milan, Italy
Duration: Jan 10 2021Jan 15 2021

Publication series

NameProceedings - International Conference on Pattern Recognition

Conference

Conference25th International Conference on Pattern Recognition, ICPR 2020
Country/TerritoryItaly
CityVirtual, Milan
Period1/10/211/15/21

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Picture-to-amount (PITA): Predicting relative ingredient amounts from food images'. Together they form a unique fingerprint.

Cite this