- This work leverages two deep networks to jointly perform object detection and action recognition, which would be used for matching to object and motion nodes found in functional unit structures in a FOON.
- Follow-up work by Ahmad explored more fine-grained object-state recognition for cooking tasks.
- This work is built upon a state taxonomy for cooking-related images.
Jelodar, A. B., Paulius, D., and Sun, Y. (2018) “Long Activity Video Understanding using Functional Object-Oriented Network”. In: IEEE Transactions on Multimedia, vol. 21(7), pp. 1813–1824.