

In the end, they hope to increase the
recall so the precision recall curve will
increase. This allows them to approach
more difficult cases. They could bring
in very powerful systems and classifiers
based on deep learning and
convolutional networks.
On the practical side, their method has
the advantage of working very quickly.
Even in math labs, it is about three or
four frames per second, and this could
easily be done in real time in C++
implementation. It also does not use
any pre-learned features while still
obtaining state of the art results when
compared to other approaches which
may look more complex.
In essence, they try to focus on the
problem, not on a specific system or
specific technique. They want to find
out what they can learn by simply
watching videos. The next step would
be to achieve better performance and
more complex datasets. Because the
datasets are constantly changing in this
field, they want to employ all sorts of
neural nets and test it on the most
complex datasets available. That
means bringing the state of the art to
the next level. Marius feels extremely
optimistic about this idea. Together
with Emanuela, he plans to dedicate at
least three to five years to this idea.
When asked why she loves this work
so much, Emanuela responds, “I think
a solution to this problem would
describe how we are understanding
the world. It’s how children learn their
world basically. They see the object
move in front of them, and they learn
how to recognize it the next time.”
She explains how this method teaches
ways to find an object in the next
frame without knowing what the
object is. Whether a car or something
else, it doesn’t matter. You just need to
know that it is an object, and you learn
how it acts in real life.
Marius compares their approach to the
analogy of a fisherman. A fisherman
must use certain cues on how to catch
a fish without knowing everything
about the fish or where to find it.
Imagine a fisherman in a river for the
first time looking into the water
observing for movements. Perhaps he
throws something into the river to help
find the fish. After he catches the fish,
he looks at the fish and learns even
more about it.
Through this learning process, the
fisherman gains a better understanding
of the fish, its behaviors, and cues to
look for to find it. In the end, it
becomes an easy task knowing much
more about how to catch the fish.
In a similar way, their method gives
them high precision and good quality
features that they can harvest. With
every iteration, it improves. At the end,
they obtain really hard positive and
negative cases and reach the human
level performance and beyond.
To find out more about Marius
and Emanuela’s work, visit their
poster today (Friday) at ICCV
2017.
16
Friday
“
They want to find out
what they can learn by
simply watching videos
”
Marius and Ema
“
Their method gives them
high precision and good
quality features that they
can harvest
”