Egocentric Object Manipulation Graphs
We introduce Egocentric Object Manipulation Graphs (Ego-OMG) – a novel repre-
sentation for activity modeling and anticipation of near future actions integrating
three components: 1) semantic temporal structure of activities, 2) short-term dy-
namics, and 3) representations for appearance. Semantic temporal structure is
modeled through a graph, embedded through a Graph Convolutional Network,
whose states model characteristics of and relations between hands and objects.
These state representations derive from all three levels of abstraction, and span
segments delimited by the making and breaking of hand-object contact. Short-term
dynamics are modeled in two ways: A) through 3D convolutions, and B) through
anticipating the spatiotemporal end points of hand trajectories, where hands come
into contact with objects. Appearance is modeled through deep spatiotemporal
features produced through existing methods. We note that in Ego-OMG it is simple
to swap these appearance features, and thus Ego-OMG is complementary to most
existing action anticipation methods. We evaluate Ego-OMG on the EPIC Kitchens
Action Anticipation Challenge. The consistency of the egocentric perspective
of EPIC Kitchens allows for the utilization of the hand-centric cues upon which
Ego-OMG relies. We demonstrate state-of-the-art performance, outranking all
other previous published methods by large margins and ranking first on the unseen
test set and second on the seen test set of the EPIC Kitchens Action Anticipation
Challenge. We attribute the success of Ego-OMG to the modeling of semantic
structure captured over long timespans. We evaluate the design choices made
through several ablation studies. Code will be released upon acceptance.
Read more here: https://arxiv.org/pdf/2006.03201