Personalized online video recommendations utilize a variety of methods for user information retrieval and content discovery. Recommender systems allow users to face the huge amount of information displayed on a given website (like online stores or video hosting sites) and to navigate through its variety of products in an efficient and satisfying way. Recommender systems thus help users (primarily signed ones) being offered high-quality video matches, relevant to their interest.
YouTube, founded in 2005, has rapidly grown to be no arguably the world’s most popular video hosting site today. In this article we will use YouTube as an example of platform with great opportunities for video recommender systems. Users go to YouTube for multiple reasons: from searching a specific video (direct navigation), looking for videos related to some topic (goal-oriented browsing) or just for general entertainment, with no specific goal in mind (unarticulated want). In the latter case, recommender systems show their full potential.
From the business point of view, users need to be entertained by having them interact with the diverse content on the site, choosing one which best matches their preferences. It is therefore imperative that the recommender systems will be frequently updated, on one hand displaying the broad spectrum of videos available on the site and on the other hand reflecting user’s activity and history. The success of the recommender system on YouTube shows that over 60% of all video clicks are done from the home page, where video recommendation thumbnails are first offered.
On YouTube, the system used in the past was a top-N recommender, rather than a predictor system. There are many challenges involved in video prediction, one of which is the problem of little or no user feedback while watching a content, so that advance predictive algorithms need to correlate user profile with suggested video. This makes the video recommendation task more challenging than other corresponding situations, like when a movie is rented or an item is purchased online and thus user’s actions are a clear declaration of his or her intentions. Another difficulty is the lack of sufficient metadata attached to uploaded videos. With no metadata, like category, genre and so on, the system encounters difficulty in ranking it against user preferences and activity.
Videos with little metadata pose an obvious challenge to the recommender system. Advanced computer vision, image and signal processing as well as machine learning techniques are maturing towards scene and video analysis. These algorithms analyze the video after upload and automatically suggest a confined set of adequate tags. In this way, the system either encourages users to include these tags or assigns them automatically with no user intervention. The video can thus be offered to public view, with much more confidence in the recommender system.
Currently, software’s ability to categorize objects is confined to specific sub-domain, in which dedicated classifiers are used to detect a given activity. With no ability to know the category of uploaded video, video categorization by scene analysis remains an extremely challenging task. However, some tags can still be assigned automatically to video such as: amateur video, noisy, loud, well-lit, human voice or figures included, estimated number of humans, musical and so on. These tags place already the videos under the search mechanism. Relying on future human ranking, comment and interaction, both tags and user activity can thus join the ranking.
Automatic tags assignment can be done by extracting of a set of relevant features and then processing them by machine learning techniques, to assign a category or a probabilistic maps of tags. To reduce computation load, only key frames can be proceeded, as most YouTube videos are shorter than ten minutes; in each frame, features used for recognition – for instance of human faces, animals, color and texture – can be extracted, classified, and integrated on the temporal domain. With a parallel work of several classifiers, we can reach a set of approximate tags to initiate a new entry with little or no metadata and pass it on to the recommender system for later use.
Tags and categories in a recommender system are used to estimate the “distance” of a user from a set of videos. From the users’ perspective, it is important to let them understand why a video was recommended to them. Both the user profile and the current viewed item can form a seeds for expanding the recommended set of videos. One way to associate similar or related videos is using a measure of video co-visitation. This measure counts frequency two videos were co-watched within a user session. A measure of video relatedness is then derived for the co-visitation between the seed video and any candidate video.
All videos associated with the present one can be brought forward as potential candidates in the recommender systems and scored by their various distance measures. For N-top recommendation, ranking takes place by combining video scores (e.g. linear combination) to give the final rank in the N-Top recommendation list.