Computer Vision News

Introduction Pose estimation is a challenging task in the field of computer vision. It is needed for tasks such as camera localization, body pose estimation (human and animal), object tracking and more. There are a number of approaches trying to solve these problems using one-shot pose estimation -- that is, each image is classified independently of other images, meaning this type of modeling completely ignores the data available due to the sequential nature of video images. This disregard of temporal data can result in very noisy estimates and a confusion of image features which may be visually similar, but are in fact spatially significantly dissimilar -- such as confusion between the right leg and the left -- in the case of body part localization. Temporal filters are a widely used approach to improving classification in cases such as these. Kalman filters are the most popular, due to their simplicity and applicability to the widest variety of cases. Moreover, the extended Kalman filter can handle nonlinear system motion models in both its measuring models and in transition between states -- that is, from one point in time to the next. In many cases and tasks, however, such measuring models cannot be predefined, and in these cases the application of Kalman filters is severely limited. In common computer vision tasks, objects and body parts don’t conform to simple motion models. In such scenarios, Kalman filters, which use a constant velocity or constant acceleration model of motion can only arrive at a rough estimate of real-world motion. To overcome these limitations, researchers attempt to have networks learn motion models directly from the training data, using methods such as SVM or LSTM. These machine learning methods can indeed help the model and at the same time enrich the basic motion model. However, using learned motion models requires the network to arrive at models based on the consistent movement constraints observed in the training data over time, which means massive training data is needed to cover all possible movement scenarios of an object type. The authors’ innovation is the LSTM Kalman filter (LSTM-KF) , a new architecture (shown in the illustration below) capable of learning a motion model and all the parameters of a Kalman filter. Thus, making it possible to achieve all the advantages of machine learning, using a much smaller amount of data. Computer Vision News Long Short-Term Memory Kalman Filters 5 Research Computer Vision News … to achieve all the advantages of machine learning, using a much smaller amount of data.

Computer Vision News - April 2019