Video "tag" generation system for a Recommender System

Hi All,

Need some quick advice. I’m making a proof of concept recommender system for my day job. Let’s just confine the content I’m working with to videos.

This thread is about a side-quest I must take; which is ingesting a video and producing tags that describe the vibe of the video or anything that describes the content in general.

For eg tags for the following video could be [‘racing’, ‘sports’, ‘cars’]


How would I go about creating a system that does basic video content analysis so that I have tags to work with for content based filtering?

Or since I don’t have millions of dollars to burn, I should just focus on collaborative based filtering?

Either do it from the metadata or find a open source model that does it for you. Training an image (or multiple images) to text model from scratch would probably tale too much time and compute for a „side quest“.

1 Like