Therbligs in Action: Video Understanding through Motion Primitives
Therbligs in Action: Video Understanding through Motion Primitives
Eadom Dessalene, Michael Maynord, Cornelia Ferm¨uller, Yiannis Aloimonos
University of Maryland, College Park
College Park, MD 20742, USA
{edessale,maynord,fermulcm,jyaloimo@umd.edu}
In this paper we introduce a rule-based, compositional,
and hierarchical modeling of action using Therbligs as our
atoms. Introducing these atoms provides us with a con-
sistent, expressive, contact-centered representation of ac-
tion. Over the atoms we introduce a differentiable method of
rule-based reasoning to regularize for logical consistency.
Our approach is complementary to other approaches in that
the Therblig-based representations produced by our archi-
tecture augment rather than replace existing architectures’
representations. We release the first Therblig-centered an-
notations over two popular video datasets – EPIC Kitchens
100 and 50-Salads. We also broadly demonstrate bene-
fits to adopting Therblig representations through evalua-
tion on the following tasks: action segmentation, action
anticipation, and action recognition – observing an aver-
age 10.5%/7.53%/6.5% relative improvement, respectively,
over EPIC Kitchens and an average 8.9%/6.63%/4.8% rel-
ative improvement, respectively, over 50 Salads. Code and
data will be made publicly available.